Analyzing the structure and effectiveness of news headlines using NLP

Analyzing the structure and effectiveness of news headlines using NLP

Analyzing the structure and effectiveness of news headlines using NLP

We wanted to gather and analyze news content in order to look for similarities and differences in the way two journalists write headlines for their respective news articles and blog posts. The two reporters we selected operate in, and write about, two very different industries/topics and have two very different writing styles:

Note: For a more technical, in-depth and interactive representation of this project, check out the Jupyter notebook we created. This includes sample code and more in depth descriptions of our approach.

In linguistics, a parse tree is a rooted tree that represents the syntactic structure of a sentence, according to some pre-defined grammar. Here’s an example;

For example with a simple sentence like “The cat sat on the mat”, a parse tree might look like this;

Thankfully parsing our extracted headlines isn’t too difficult. We used the Pattern Library for Python to parse the headlines and generate our parse trees.

In total we gathered about 700 article headlines for both journalists using the AYLIEN News API which we then analyzed using Python. If you’d like to give it a go yourself, you can grab the Pickled data files directly from the GitHub repository (link), or by using the data collection notebook we prepared for this project.

Read Also:
Why nature is our best guide for understanding artificial intelligence

First we loaded all the headlines for Akin Oyedele, then we created parse trees for all 700 of them, and finally we stored them together with some basic information about the headline in the same Python object.

Then using a sequence similarity metric, we compared all of these headlines two by two, to build a similarity matrix.

To visualize headline similarities for Akin we generated a 2D scatter plot with the hope of grouping similarly structured headlines close together in a graph in groups of sorts.

To achieve this, we first reduced the dimensionality of our similarity matrix using tSNE and applied K-Means clustering to find groups of similar headlines.

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Predictive analytics – knowledge is power

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
The New IT: Driving Business Innovation With Tech
Read Also:
How Facebook Leverages Artificial Intelligence -

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Data Pros Focused On ROI Of Initial Big Data Investments

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Zillow Uses Analytics, Machine Learning To Disrupt With Data

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Why nature is our best guide for understanding artificial intelligence

Leave a Reply

Your email address will not be published. Required fields are marked *