The cornerstone of the small work done for getting the info for these great charts with IEPY, was to be able to catch mentioned companies.
Click here for an interactive version of this chart and other great interactive charts
The basic idea of relation extraction is to be able to detect mentioned things in text (so called Mentions, or Entity-Occurrences), and later decide if in the text is expressed or not the target relation between each couple of those things. In our case, we needed to find where companies were mentioned, and later determine if in a given sentence it was said that Company-A was funding Company-B or not.
In order to detect those funding we needed to be sure of capturing every mention of a company. And although the NER used catched most of them, there are always some folks that name their company #waywire or 8th Story, words that are not very easily trackable with a NER.
A good solution is to build a Gazetteer containing all the company names we can get. The idea of working with Gazettes, is that when using them, each time one of the Gazette entries is seen on a text, it’s automatically considered as a mention of a given object, ie, an Entity-Occurrence.
From an encyclopedic source; we got more than 300K entries.Great!
The next challenge was that… well, in the text to process, a company could be mentioned on a different way than the official one stated on the encyclopedic source. For instance, would be more natural to find mentions of “Yiftee” than “Yiftee Inc.”
So, after incorporating a basic schema for the alternative names (ie, substrings of the original long name), the number of entries grew up to 600K.
Chief Analytics Officer Europe
15% off with code 7WDCAO17
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017