Diving Into Natural Language Processing

Diving Into Natural Language Processing

This is the third installment of a new series called Deep Learning Research Review. Every couple weeks or so, I’ll be summarizing and explaining research papers in specific subfields of Deep Learning. This week focuses on applying Deep Learning to Natural Language Processing. The last post was about reinforcement learning and the post before was on generative adversarial networks.

Natural Language Processing (NLP) is all about creating systems that process or “understand” language in order to perform certain tasks. These tasks could include:

The traditional approach to NLP involved a lot of domain knowledge of linguistics itself. Understanding terms such as phonemes and morphemes was pretty standard, as there are whole linguistic classes dedicated to their study. Let’s look at how traditional NLP would try to understand the following word:

Let’s say our goal is to gather some information about this word (characterize its sentiment, find its definition, etc). Using our domain knowledge of language, we can break up this word into three parts:

We understand that the prefix “un” indicates an opposing or opposite idea and we know that “ed” can specify the time period (past tense) of the word. By recognizing the meaning of the stem word “interest,” we can easily deduce the definition and sentiment of the whole word. Seems pretty simple, right? However, when you consider all the different prefixes and suffixes in the English language, it would take a very skilled linguist to understand all the possible combinations and meanings.

Deep Learning, at its most basic level, is all about representation learning. With CNNs, we see the composition of different filters that are used to classify objects into categories. Here, we’re going to take a similar approach with creating representations of words through large datasets.

This post will be structured in a way where we’ll go through the basic building blocks of building deep networks for NLP and then go into talking about some applications through recent research papers. It’ll feel normal to not exactly know why we’re using RNNs or why an LSTM is helpful, but hopefully, by the end, you’ll have a better sense of why Deep Learning techniques have helped NLP so much. 

Since Deep Learning loves math, we’re going to represent each word as a d-dimensional vector. Let’s used = 6.

Now, let’s think about how to fill in the values. We want the values to be filled in such a way that the vector somehow represents the word and its context, meaning, or semantics. One method is to create a cooccurrence matrix. Let’s say that we have the following sentence:

From this sentence, we want to create a word vector for each unique word:

A cooccurrence matrix is a matrix that contains the number of counts of each word appearing next to all the other words in the corpus (or training set). Let’s visualize this matrix:

Extracting the rows from this matrix can give us a simple initialization of our word vectors:

Notice that through this simple matrix, we’re able to gain pretty useful insights. For example, notice that the words “love” and “like” both contain 1s for their counts with nouns (NLP and dogs). They also have 1s for the count with “I,” thus indicating that the words must be some sort of verb. With a larger dataset than just one sentence, you can imagine that this similarity will become more clear as “like” and “love,” and other synonyms will begin to have similar word vectors because of the fact that they are used in similar contexts.

Now, although this a great starting point, we notice that the dimensionality of each word will increase linearly with the size of the corpus. If we had a million words (not really a lot in NLP standards), we’d have a million-by-million sized matrix, which would be extremely sparse (lots of 0s). Definitely not the best in terms of storage efficiency. There have been numerous advancements in finding the most optimal ways to represent these word vectors.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

​From Preventative To Predictive Maintenance

2 Mar, 2017

Preventative maintenance and planned maintenance are widely employed across many industry sectors. They are characterized by regular predetermined maintenance intervals or …

Read more

How You Can Improve Customer Experience With Fast Data Analytics

24 Apr, 2017

In today’s constantly connected world, customers expect more than ever before from the companies they do business with. With the …

Read more

Why to Bring Shadow IT Into the Light

1 May, 2022

Shadow IT is the unauthorized use of software, hardware, and cloud services. Typically, users skirt official IT channels in order …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.