An Inside Update on Natural Language Processing

An Inside Update on Natural Language Processing

An Inside Update on Natural Language Processing

This article is an interview with computational linguist Jason Baldridge. It’s a good read for data scientists, researchers, software developers, and professionals working in media, consumer insights, and market intelligence. It’s for anyone who’s interested in, or needs to know about, natural language processing (NLP).

Jason and NLP go way back. As a linguistics graduate student at the University of Edinburgh, in 2000, Jason co-created the OpenNLP text-processing framework, now part of Apache. He joined the University of Texas linguistics faculty in 2005 and, a few years back, helped build a text-analytics system for social-media agency Converseon. Jason’s Austin start-up, People Pattern, applies NLP and machine learning for social-audience insights; he co-founded the company in 2013 and serves as chief scientist. Finally, he’ll keynote on “Personality and the Science of Sharing” and teach a tutorial at the 2016 Sentiment Analysis Symposium.

In sum, Jason is an all-around cool guy, and he deserves special recognition for providing the most thorough Q&A responses I have ever received in response to an interview request. The interview? This one, covering AI, neural networks, computational linguistics, Java vs. Scala, and accuracy evaluation with a detour into Portuguese-English translation challenges, that is —

Read Also:
Healthcare data: A beast best tamed by machine learning?

Seth Grimes> Let’s jump in the deep end. What’s the state of NLP, of natural language processing?

Jason Baldridge> There’s work to be done.

The first thing to keep in mind is that many of the most interesting NLP tasks are AI-complete. That means we are likely to need representations and architectures that recognize, capture, and learn knowledge about people and the world in order to exhibit human-level competence in these tasks. Do we need to represent word senses, predicate-argument relations, discourse models, etc? Almost certainly. An optimistic deep learning person might say “the network will learn all that,” but I’m skeptical that a generic model structure will learn all these things from the data that is available to it.

Jason> No, they are a great set of tools and techniques that are providing large improvements for many tasks. But they aren’t magic and they won’t suddenly solve every problem we throw at them, out-of-the-box. When it comes to language, the only competent device we know of for processing human language fully — the human brain — is the result of hundreds of millions of years of evolution. That process has afforded it with a complex architecture that dwarfs the relative puny networks that are used for language and vision tasks today.

Read Also:
Ending the Data Battle Between Business and IT

Humans learn language from a surprisingly small amount of data, and they go through different phases in that process, including memorization to generalization (including overgeneralization, e.g., “Mommy goed to the store”). Having said that, I love the boldness and confidence of the neural optimists, but I think we will need to figure out the architectures and the reward mechanisms by which a very deep network processes, represents, stores, and generalizes information and how it relates to language. That will imply choices about how lexicons are stored, how morphological and syntactic regularities are captured, and so on.

Is there academic computational-linguistics work that you’d call out as interesting, surfaced in NLP software tools or not?

The vectorization of words and phrases is one of the big overall trends these days, with the use of those vectors as the inputs for NLP tasks. The good part is that vectors are learned on large, unlabeled corpora. This injects knowledge into supervised learning tasks that have much less data.

Read Also:
4 Infrastructure Requirements for Any Big Data Initiative

For example, “pope,” “catholic,” and “vatican” will have similar vectors, so training examples that have just one of these words will still contribute toward better learning of shared parameters. Without this, a classifier based on bags-of-words sees these words as being as separate as “apple,” “hieroglyph,” and “bucket.

Chief Data Officer Europe
20 Feb

15% off with code CDO7W17

Read Also:
How This Company Is Using Deep Learning to Change the Retail Game
Predictive Analytics Innovation summit San Diego
22 Feb

$200 off with code DATA200

Read Also:
A Pocket Guide to Data Science
Read Also:
Big Data's Hidden Scourge: Data Drift
Big Data Paris 2017
6 Mar
Big Data Paris 2017

15% off with code BDP17-7WDATA

Read Also:
Can Big Data Help Us Travel Better?

Leave a Reply

Your email address will not be published. Required fields are marked *