The field of computer-based natural language processing and analytics first emerged in the 1950s. Today, the practice is employed in the mobile and computer application automation that we experience every day. And while natural language processing has dramatically improved through the years, it is still an evolving science.
For most of us, we have only to look as far as our word processors and mobile apps, which help us innumerable times through their built-in algorithms and learning processes with interpretations of spelling and vocabulary, but can also interpret words incorrectly. (Example: I am writing this on a Mac and my language interpreter just interpreted "as far" in the first sentence of this paragraph as "Safari," which is the Mac browser.)
We can work around these natural language processing limitations in big data applications, but the stakes get higher when algorithms and queries are run against big data in pharmaceutical analytics, for example, and they come up against human language ambiguities.
One case concerning an online healthcare website was documented in a 2014 New York Times article. The goal of the website was to give consumers information about drug side effects and interactions. The website used data in a variety of different formats that were culled from a variety of different sources and then aggregated into a big data repository that would be probed by internally developed analytics algorithms. Unfortunately, since the same drug's side effects were described in different ways in different data sources (e.g. drowsiness, somnolence, sleepiness), complications from these languages ambiguities arose that compromised the algorithm's effectiveness and its ultimate accuracy.