LinkedIn recently published The LinkedIn Knowledge Graph (LKG) . It is an impressive achievement, connecting 450M members, 190M historical job listings, 9M companies, 200+ countries, 35K skills in 19 languages, 28K schools, 1.5K fields of study, 600+ degrees, 24K titles in 19 languages, and 500+ certificates, among other entities, as of Oct 6, 2016.
I had an opportunity to ask LinkedIn a few questions, and here are the answers from Bee-Chung Chen , Senior Staff Engineer & Applied Researcher at LinkedIn and Deepak Agarwal , VP of Engineering, Head of Relevance at LinkedIn, two of the leaders of the LKG project.
"Data Scientist" is the canonical form of a title entity in the taxonomy. A member or a job with title string "Data Mining Scientist" is standardized to title "Data Scientist" by our title standardizer (a supervised binary classifier) based on title string features and other member/job metadata (e.g., the skills of the member or the skills required by the job).
However, not all similar title strings can be mapped to the same entity by this supervised method, e.g., "Predictive Analytics Specialist" is not standardized to "Data Scientist", partially because collecting high-quality and high-volume training data for this task is challenging.
To augment the binary decision in such an entity-level standardization task, we also provide the similarity among these three title strings in the following two ways simultaneously. First, LinkedIn title taxonomy has a hierarchical structure: title → super title → function, which enables a higher-level similarity. For example, these three title strings can all belong to the same super title and/or the same function.
Downstream data mining applications can select the most suitable title granularity level.