Big data isn't new. We've actually had fairly sophisticated data infrastructure long before Hadoop, Spark, and such came into being. No, the big difference in big data is that all this fantastic data infrastructure is open source software running on commodity servers.
Over a decade ago, entrepreneur Joe Kraus' declared that "There's never been a better time to be an entrepreneur because it's never been cheaper to be one," and he was right, though he couldn't have foreseen how much so. Though Kraus extolled the virtues of Linux, Tomcat, Apache HTTP server, and MySQL, today's startups have access to a dazzling array of the best big data infrastructure that money doesn't need to buy.
In this way, startups are able to put a target on the backs of much better-funded enterprise rivals.
Take Bidtellect, for example, an adtech startup. The Bidtellect platform helps advertisers, agencies, and media companies deliver targeted native ads across all devices, in any format. In practice, this means that Bidtellect must track and analyze the potential inventory of ad placements—which number in the millions daily—to see how each is affected by numerous variables. Once ads start running, it's essential to track their performance against client KPIs.
As Jeremy Kayne, Bidtellect's CTO, told me in an interview, Bidtellect is engaged in "a kind of arbitrage," whereby the company buys inventory on a per-impression (per-display) basis, but then sells ads on a per-click basis. In order to build a viable business and not a candidate for bankruptcy protection, "It's essential that we're able to predict how many clicks an ad will generate on a given site, on a certain device type, at a certain time of day, and across scores of other variables—so we can price it right and make a fair profit."
This is where big data comes in.
"To accurately make these predictions, identify viable advertising opportunities, and negotiate workable rates and pricing, we had to find a practical way to collect, manage, and understand the billions of transactions and data points involved," Kayne said.
The system that collects and tracks all of this information amounts to petabytes in data volumes. This is big, but it's about to get bigger.