LinkedIn Knowledge Graph – KDnuggets Interview

LinkedIn Knowledge Graph – KDnuggets Interview

LinkedIn Knowledge Graph – KDnuggets Interview

We interview LinkedIn about their recently published LinkedIn Knowledge Graph which connects their many millions of members, jobs, companies, and more.

LinkedIn recently published The LinkedIn Knowledge Graph (LKG) . It is an impressive achievement, connecting 450M members, 190M historical job listings, 9M companies, 200+ countries, 35K skills in 19 languages, 28K schools, 1.5K fields of study, 600+ degrees, 24K titles in 19 languages, and 500+ certificates, among other entities, as of Oct 6, 2016.

I had an opportunity to ask LinkedIn a few questions, and here are the answers from Bee-Chung Chen , Senior Staff Engineer & Applied Researcher at LinkedIn and Deepak Agarwal , VP of Engineering, Head of Relevance at LinkedIn, two of the leaders of the LKG project.

"Data Scientist" is the canonical form of a title entity in the taxonomy. A member or a job with title string "Data Mining Scientist" is standardized to title "Data Scientist" by our title standardizer (a supervised binary classifier) based on title string features and other member/job metadata (e.g., the skills of the member or the skills required by the job).

Read Also:
How Governments Around The World Are Turning To Data

However, not all similar title strings can be mapped to the same entity by this supervised method, e.g., "Predictive Analytics Specialist" is not standardized to "Data Scientist", partially because collecting high-quality and high-volume training data for this task is challenging.

To augment the binary decision in such an entity-level standardization task, we also provide the similarity among these three title strings in the following two ways simultaneously. First, LinkedIn title taxonomy has a hierarchical structure: title → super title → function, which enables a higher-level similarity. For example, these three title strings can all belong to the same super title and/or the same function.

Downstream data mining applications can select the most suitable title granularity level.



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Why Big Data is in Trouble

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
The Secret History of Agile Innovation
Read Also:
Why You Should Already Have a Data Governance Strategy

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Why Big Data is in Trouble

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
25 Data Management Vendors Worth Watching

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
The Secret History of Agile Innovation

Leave a Reply

Your email address will not be published. Required fields are marked *