Short story on scaling an NLP problem without using a ton of hardware.

Short story on scaling an NLP problem without using a ton of hardware.

Short story on scaling an NLP problem without using a ton of hardware.

The cornerstone of the small work done for getting the info for these great charts with IEPY, was to be able to catch mentioned companies.

Click here for an interactive version of this chart and other great interactive charts

The basic idea of relation extraction is to be able to detect mentioned things in text (so called Mentions, or Entity-Occurrences), and later decide if in the text is expressed or not the target relation between each couple of those things. In our case, we needed to find where companies were mentioned, and later determine if in a given sentence it was said that Company-A was funding Company-B or not.

In order to detect those funding we needed to be sure of capturing every mention of a company. And although the NER used catched most of them, there are always some folks that name their company #waywire or 8th Story, words that are not very easily trackable with a NER.

Read Also:
Artificial intelligence creeps into daily life

A good solution is to build a Gazetteer containing all the company names we can get. The idea of working with Gazettes, is that when using them, each time one of the Gazette entries is seen on a text, it’s automatically considered as a mention of a given object, ie, an Entity-Occurrence.

From an encyclopedic source; we got more than 300K entries.Great!

The next challenge was that… well, in the text to process, a company could be mentioned on a different way than the official one stated on the encyclopedic source. For instance, would be more natural to find mentions of “Yiftee” than “Yiftee Inc.”

So, after incorporating a basic schema for the alternative names (ie, substrings of the original long name), the number of entries grew up to 600K.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Artificial intelligence creeps into daily life

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Evolution of the Data Scientist Through the Decade
Read Also:
Why great chief data officers are hard to find

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Why great chief data officers are hard to find

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Artificial intelligence creeps into daily life

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How Big Data and Creative Thinking Are Helping Marketers Grow Their Brands Globally

Leave a Reply

Your email address will not be published. Required fields are marked *