Short story on scaling an NLP problem without using a ton of hardware.

Short story on scaling an NLP problem without using a ton of hardware.

Short story on scaling an NLP problem without using a ton of hardware.

The cornerstone of the small work done for getting the info for these great charts with IEPY, was to be able to catch mentioned companies.

Click here for an interactive version of this chart and other great interactive charts

The basic idea of relation extraction is to be able to detect mentioned things in text (so called Mentions, or Entity-Occurrences), and later decide if in the text is expressed or not the target relation between each couple of those things. In our case, we needed to find where companies were mentioned, and later determine if in a given sentence it was said that Company-A was funding Company-B or not.

In order to detect those funding we needed to be sure of capturing every mention of a company. And although the NER used catched most of them, there are always some folks that name their company #waywire or 8th Story, words that are not very easily trackable with a NER.

Read Also:
The Top 5 Big Data Use Cases Your CEO Will Love -Big Data Analytics News

A good solution is to build a Gazetteer containing all the company names we can get. The idea of working with Gazettes, is that when using them, each time one of the Gazette entries is seen on a text, it’s automatically considered as a mention of a given object, ie, an Entity-Occurrence.

From an encyclopedic source; we got more than 300K entries.Great!

The next challenge was that… well, in the text to process, a company could be mentioned on a different way than the official one stated on the encyclopedic source. For instance, would be more natural to find mentions of “Yiftee” than “Yiftee Inc.”

So, after incorporating a basic schema for the alternative names (ie, substrings of the original long name), the number of entries grew up to 600K.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Forrester: Marketers need to say goodbye to campaigns, hello to AI-driven conversations with customers
Read Also:
Big Data or Not Big Data: What is question?

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Hadoop Big Data Analytics Use Cases: Financial Services Banking on Disruption

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Big Data or Not Big Data: What is question?

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Big Data or Not Big Data: What is question?

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
A prescription for organizations taking on digital transformation

Leave a Reply

Your email address will not be published. Required fields are marked *