Low-Budget Natural Language Processing

Low-Budget Natural Language Processing

Low-Budget Natural Language Processing

If you’ve ever talked to Siri or performed a Google Search using colloquial language and gotten the right answer, you probably had that magical feeling of being understood by a machine. The discipline that studies the interactions between human languages and computers is called natural language processing (NLP), and it’s a very active field. Companies and computer scientists are developing amazing techniques for improving performance on this task, but adding these features to our sites and apps can be very complex. Even great, free resources aren’t useful if you don’t have the time or skills to use them.

The good news is that we can take advantage of our human ability to analyze natural language and use really simple techniques to assist and amaze our users. I’ll explore a couple of ways to use these techniques in your own projects. These examples use web technologies but can be translated to other platforms and systems easily.

One of the goals of the Coral Project team while building Ask, a web product that enables news organizations to ask questions of their readers, was to build the form generation side of the project as an API.

One of the benefits of an API is that it allows developers to create their own integrations and user interfaces for creating and editing the forms. To showcase some possibilities, I built an alternative form creator targeting journalists and news devs who were setting up Ask for the first time.

Read Also:
Free data visualization with Microsoft Power BI: Your step-by-step guide

When creating a form, it’s important to try and select the most appropriate UI input for the question. This helps the user understand how to complete the information, and it helps us understand the data. Since every question in the questionnaire needs a title, I thought it was the perfect scenario for applying a silly but effective NLP technique. The idea is simple:

I used Preact for writing this website (source code), just because I like over-engineering my experiments. But we can implement this easily with jQuery:

And of course, this is easy to implement because we didn’t show the hardest part: the algorithm.

If you want to get really advanced, try this: Before taking a look at this finished algorithm below, start creating a form yourself. Go to the first question of the form you’re working on, and see if you can figure out what the algorithm might look like. Once you’ve given that a try, check out how I implemented it:

That’s it. That’s the way I modeled the English language for my use case. Even if you don’t know what a regular expression is, you can get an idea of how to implement your own model. In case you didn’t try it for yourself or my algorithm didn’t work for you, here is an animated GIF of what you should see:

Read Also:
Smart Cities at the Nexus of Emerging Data Technologies and You

Is this algorithm covering every possibility? No. Is this going to work in every case? No. But this function runs in microseconds in our user’s browser (I actually measured it); it’s really simple to implement; and it helps most of the users choose the right question type, saving time on form creation.

Once you have your script working, you may want to know the “success rate” which in this case can be something like: “What’s the percentage of cases where the model chose a different question type than the default, and the user didn’t change it?”

How can you store all of these events? An easy way: if you are using an analytics solution, you probably get events for free. I usually send this event to Google Analytics where it’s easy to add the results and get the success rate. After all, this success rate is a measure of the behavior of your users on the site.

Read Also:
The Smart City Ecosystem as an Innovation Model: Lessons from Montreal

You can always improve your model by adding, modifying or removing rules. The good thing is that if our rules don’t detect the user input, you just didn’t help your user, and it will be a normal form—but the app still works as intended. The only thing that can really hurt are the false positives.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
The Smart City Ecosystem as an Innovation Model: Lessons from Montreal

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Smart Cities at the Nexus of Emerging Data Technologies and You

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Five Best Practices for Building a Data Warehouse

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
A Snapshot of Current Trends in Visualization

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
5 questions to ask before starting a Big Data project

Leave a Reply

Your email address will not be published. Required fields are marked *