Every once in a while I run into a little company that comes at an existing market as if the folks already in it are idiots -- and sometimes they are right. Here's the thing: What often happens is a company breaks out in a segment, and everyone groups around that company's ideas and emulates them.
Few initially stand up and say, "Wait a minute -- what if they're wrong?" Often we get so excited about building the market, we don't realize until much later that the initial attempt at solving a problem doesn't work. With big data and analytics, the typical project failure rate, depending on how you categorize the projects, can be as high as a whopping 80 percent.
The company I met with is Pneuron, and its approach is very different and, I think, way better -- at least when it comes to acquiring the data.
I'll wrap up with my product of the week: YouMail, an interesting free voicemail service that could solve our nasty robocall problem.
A few years back, I attended a talk by Harper Reed, the CTO of President Obama's reelection campaign. Most of the talk focused on how his team used analytics effectively to win, but in one part that stuck with me, he touched on why big data was stupid.
Reed argued -- and I agreed -- that the focus on big data caused people to do stupid things -- like think of ways to aggregate and collect lots of data elements into mammoth repositories that then were too big and complex to analyze. He argued compellingly that the focus shouldn't be on collecting massive amounts of data, because that just creates a bigger potential problem. Rather, it should be on analyzing the data you have to obtain the answers to critical questions.
I think this explains why so many big data projects fail. Too much time and money is focused on collecting massive amounts of data from systems that never were designed to talk to each other. Consequently, by the time the giant repositories are built, the data is out of date, corrupted, and damn near impossible to analyze.
But what if you didn't collect the data in the first place?
What if you left the data where it was, analyzed it in place, and then aggregated the analysis? In other words, rather than aggregating the data, you would aggregate the information you needed from it.
Taking that approach, you don't create huge redundant repositories, you don't experience the massive lag of having to move and translate data repositories, and you don't have the potential data corruption problems. What you do have is a fraction of the cost of any given project.
If done right, you'd end up with a higher probability of being both more accurate and more timely. Your hardware costs would be dramatically lower, and because you were solving the problem in components, you'd actually be able to start getting value before the project's conclusion.
With each additional repository, the solution would get smarter, as it would be able to answer questions not only from the new repositories, but also from those previously provided. That is basically what Pneuron does.
You'd have to be careful that bias didn't enter the process through the intermediate analysis, but the risk should be far lower than what naturally would occur when you slammed together data elements that came from very different systems and likely very different ages.
You massively simplify the problem you are trying to solve, and you are better set up for mergers and acquisitions.
An interesting side benefit to this approach is that typically when two companies merge, getting the systems to talk to each other is a nightmare.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20