Use Case:

COVID-19 Article & Publisher Sentiment Analysis

Summary:

Coronavirus Headline Sentiment Analysis

Data From: Kaggle

Polyture Version: 0.13.10

Contact: team@polyture.com

The coronavirus was officially declared a pandemic on March 11th 2020. In the days immediately before and after that time, there was a massive influx of written media reports on the issue. In this case study, we examine news headlines from various publications with sentiment analysis to reveal the trend of opinion during that time. 

Polyture has its own Data Warehousing, Data Cleaning / Transformations, and Visualizations. 

Because of this, we were able to perform this analysis in just 15 minutes. 

Steps Taken:

  • Uploading the dataset into the data warehouse
  • Running sentiment analysis on the data
  • Cleaning data and creating a new table with only data from major publications
  • Utilizing three visualization nodes to create: 
    • Time Series Title SA
      • A time series graph of title sentiment analysis over time
    • Body Histogram
      • A histogram of the article body text sentiment analysis
    • Mean Publication Bias
      • A bar chart showing mean sentiment for three major publishing outlets

Walk Through:

Upload and Warehousing

The first step is to import the data into the Data Warehouse

Polyture allows you to upload multiple data types, connect to multiple APIs, and provides you with a unique view of all data sources.

Import Data / Select Data Type

We then grab a CSV data source node from the Import panel, and select the correct file.

Notice that Polyture already provides “Quick Insights”, such as an instant sentiment analysis histogram, in a panel on the right hand side of the import options.

This panel also allows users to change data type. To analyze time series, we must change the date column from a string type to a date/time type.

Perform Sentiment Analysis

Next we use a Sentiment Analysis node and attach it to the CSV data source.

In the options panel on the bottom right, we select the columns we wish to run sentiment analysis on; in this case, Article Title and Article Body.

This panel also allows users to change data type. To analyze time series, we must change the date column from a string type to a date/time type.

Extract Major Publisher Data

For the mean publication bias graph, we only want to see the three major publishers, which are the Associated Press, Reuters and the Editorial Desk.

We achieve this by using three instances of the Transform tool “Row Filter” to extract the major publisher data into three new tables.

Join Major Publisher Data

We now need to join the three tables into one by using the “Row Append” node.

Prepare Graph Nodes

Our data is now ready for graphing.

Next, we drag out and connect three “Dashboard Graph” nodes.

Open Graph Editor Panel

Clicking on each graph node reveals the quick insights panel as well as the Graph Editor.

Select Graph Type & Parameters

Inside of the Graph Editor, we can:

  • Set the title of the graph
  • Select graph type (in this case Time-Series)
  • Select the columns to use
  • Render a preview of the graph
  • Select any graph specific transforms

Repeat for Remaining Graphs

We repeat this process twice more to quickly create a histogram of the article body sentiment analysis, and a bar chart of the mean publication sentiment.

To see an aggregate view of all the graphs, we click on the yellow button labelled “Dashboard”.

Resize and Space the Graphs in Dashboard View

In the Dashboard view, we can resize the graphs to achieve the best possible visual presentation.

Results:

From the time series title sentiment analysis graph, we can see the sentiment becomes more consistently negative as the date approached and passed the March 11th WHO declaration of a pandemic. 

From the article body histogram, we see a spike at -0.01 to 0.12 which shows there are a good number of articles that have objective sentiment, but there appears to be very slightly more negative than positive articles.

From the mean publication bias, we can see that the Associated Press article title sentiment was the least negative, with Reuters sentiment slightly more negative, and “The Editorial Board” sentiment being by far the most negative.