The commercial Internet has now been around for twenty some years and the overall experience hasn’t changed much from the days of “You’ve Got Mail.”
The Internet started out as a research tool between government, universities and corporations. With the advent of hyperlinks, the Internet has been transformed into a commercial vehicle for the sale of good and services.
The Internet of today as a research tool is pathetic and has taken on a bias of consumerism. Take this example: “show me all printers that use HP 950 Ink cartridges.” The expectation is to get a list of all printers that use the HP 950 ink cartridges. Instead you receive over 500,000 hits on Google, over a million on Bing, mostly with links to the sales of printer ink cartridges. Yes, you do get a list of printers, however, this list is neither extensive or exclusive to just the printers that use the 950 ink cartridges.
Will artificial intelligence (AI) make the Internet smarter? Probably someday, but don’t look for it in the foreseeable future. Why is this? Because very little knowledge is captured in a form that AI machines can directly ingest today. This is where machine learning comes in. Using the latest breakthrough in neural network design that provides unsupervised machine learning capabilities. Unsupervised machine learning methods require feeding large amounts of various kinds of data on features of a subject matter.
As an example, say you want to sell sweaters on the Internet and you want to use an AI machine to help increase your sales. The first thing you need to do is teach your AI machine about sweaters. You meticulously feed in all kinds of published features of sweaters from all of the various sources including fashion magazines, top retail product catalogs, bloggers, etc. All these sources feeding into your AI machine, learning about all the different sizes (bust, waist, hip) and styles that make up a sweater that include: pullover, v-neck, cardigan, turtlenecks, vests and full length skirts. Also, don’t forget the patterns, colors and materials such as wool, cashmere, cotton and synthetics. All of these features are known as “empirical based” features that are well documented across the Internet.
How about those features that could be considered inferred or “soft features” that are temporal or spatial in nature? Sweaters are garments that are related to seasons or climates that are mostly cold. Would you sell sweaters in the middle of June and July? Absolutely, what about all of the people that live in the southern hemisphere!
Given that you will spend a tremendous amount of time and effort developing sufficiently large and diverse data sets to train your AI machine, what level of correctness do we need in order to have a productive AI machine? Probably the data that is published on the Internet would be necessary and sufficient to answer 90 percent of the inquires on sweaters.
After all, by Google’s own admission, their open source Natural Language Understanding (NLU) system called SyntaxNet has just over a 90 percent accuracy rate. This is a great accomplishment for natural language understanding.
Data Innovation Summit 2017
30% off with code 7wData
Big Data Innovation Summit London
$200 off with code DATA200
Enterprise Data World 2017
$200 off with code 7WDATA
Data Visualisation Summit San Francisco
$200 off with code DATA200
Chief Analytics Officer Europe
15% off with code 7WDCAO17