Just over 100 years ago, the German psychologist William Stern introduced the intelligence quotient test as a way of evaluating human intelligence. Since then, IQ tests have become a standard feature of modern life and are used to determine children’s suitability for schools and adults’ ability to perform jobs.
These tests usually contain three categories of questions: logic questions such as patterns in sequences of images, mathematical questions such as finding patterns in sequences of numbers and verbal reasoning questions, which are based around analogies, classifications, as well as synonyms and antonyms.
In recent, years, computer scientists have used data mining techniques to analyze huge corpuses of texts to find the links between words they contain. In particular, this gives them a handle on the statistics of word patterns, such as how often a particular word appears near other words. From this it is possible to work out how words relate to each other, albeit in a huge parameter space.
The end result is that words can be thought of as vectors in this high-dimensional parameter space. the advantage is that they can then be treated mathematically: compared, added, subtracted like other vectors. This leads to vector relations like this one: king – man + woman = queen.
This approach has been hugely successful. Google uses it for automatic language translation by assuming that word sequences in different language represented by similar vectors are equivalent in meaning. So they are translations of each other.
But this approach has a well-known shortcoming: it assumes that each word has a single meaning represented by a single vector. Not only is that often not the case, verbal tests tend to focus on words with more than one meaning as a way of making questions harder.
Huazheng and pals tackle this by taking each word and looking for other words that often appear nearby in a large corpus of text. They then use an algorithm to see how these words are clustered. The final step is to look up the different meanings of a word in a dictionary and then to match the clusters to each meaning.
But this approach has a well-known shortcoming: it assumes that each word has a single meaning represented by a single vector. Not only is that often not the case, verbal tests tend to focus on words with more than one meaning as a way of making questions harder. <
Data Innovation Summit 2017
30% off with code 7wData
Big Data Innovation Summit London
$200 off with code DATA200
Enterprise Data World 2017
$200 off with code 7WDATA
Data Visualisation Summit San Francisco
$200 off with code DATA200
Chief Analytics Officer Europe
15% off with code 7WDCAO17