Diving Into Natural Language Processing Blog

Diving Into Natural Language Processing

by 7wData
March 26, 2017

This is the third installment of a new series called Deep Learning Research Review. Every couple weeks or so, I’ll be summarizing and explaining research papers in specific subfields of Deep Learning. This week focuses on applying Deep Learning to Natural Language Processing. The last post was about reinforcement learning and the post before was on generative adversarial networks.

Natural Language Processing (NLP) is all about creating systems that process or “understand” language in order to perform certain tasks. These tasks could include:

The traditional approach to NLP involved a lot of domain knowledge of linguistics itself. Understanding terms such as phonemes and morphemes was pretty standard, as there are whole linguistic classes dedicated to their study. Let’s look at how traditional NLP would try to understand the following word:

Let’s say our goal is to gather some information about this word (characterize its sentiment, find its definition, etc). Using our domain knowledge of language, we can break up this word into three parts:

We understand that the prefix “un” indicates an opposing or opposite idea and we know that “ed” can specify the time period (past tense) of the word. By recognizing the meaning of the stem word “interest,” we can easily deduce the definition and sentiment of the whole word. Seems pretty simple, right? However, when you consider all the different prefixes and suffixes in the English language, it would take a very skilled linguist to understand all the possible combinations and meanings.

Deep Learning, at its most basic level, is all about representation learning. With CNNs, we see the composition of different filters that are used to classify objects into categories. Here, we’re going to take a similar approach with creating representations of words through large datasets.

This post will be structured in a way where we’ll go through the basic building blocks of building deep networks for NLP and then go into talking about some applications through recent research papers. It’ll feel normal to not exactly know why we’re using RNNs or why an LSTM is helpful, but hopefully, by the end, you’ll have a better sense of why Deep Learning techniques have helped NLP so much.

Since Deep Learning loves math, we’re going to represent each word as a d-dimensional vector. Let’s used = 6.

Now, let’s think about how to fill in the values. We want the values to be filled in such a way that the vector somehow represents the word and its context, meaning, or semantics. One method is to create a cooccurrence matrix. Let’s say that we have the following sentence:

From this sentence, we want to create a word vector for each unique word:

A cooccurrence matrix is a matrix that contains the number of counts of each word appearing next to all the other words in the corpus (or training set). Let’s visualize this matrix:

Extracting the rows from this matrix can give us a simple initialization of our word vectors:

Notice that through this simple matrix, we’re able to gain pretty useful insights. For example, notice that the words “love” and “like” both contain 1s for their counts with nouns (NLP and dogs). They also have 1s for the count with “I,” thus indicating that the words must be some sort of verb. With a larger dataset than just one sentence, you can imagine that this similarity will become more clear as “like” and “love,” and other synonyms will begin to have similar word vectors because of the fact that they are used in similar contexts.

Now, although this a great starting point, we notice that the dimensionality of each word will increase linearly with the size of the corpus. If we had a million words (not really a lot in NLP standards), we’d have a million-by-million sized matrix, which would be extremely sparse (lots of 0s). Definitely not the best in terms of storage efficiency. There have been numerous advancements in finding the most optimal ways to represent these word vectors.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Diving Into Natural Language Processing

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

From Preventative To Predictive Maintenance

How You Can Improve Customer Experience With Fast Data Analytics

Why to Bring Shadow IT Into the Light

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Diving Into Natural Language Processing

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change