Analytical databases are an increasingly critical part of businesses’ big data infrastructure. Specifically designed to offer performance and scalability advantages over conventional relational databases, analytical databases enable business users as well as data analysts and data scientists to easily extract meaning from large and complex data stores.
But to wring the most knowledge and meaning from the data your business is collecting every minute—if not every second—it’s important to keep some best practices in mind when you deploy your big data analytical database. Leading businesses that have deployed such analytical databases share five pitfalls you should avoid to keep you on track as your big data initiatives mature.
Business users, analysts, and data scientists are very different people, says Chris Bohn, “CB,” a senior database engineer with Etsy, a marketplace where millions of people around the world connect, both online and offline, to make, sell, and buy unique goods. For the most part, data scientists are going to be comfortable working with Hadoop, MapReduce, Scalding, and Spark, whereas data analysts live in an SQL world. “If you put tools in place that your users don’t have experience with, they won’t use those tools. It’s that simple,” says Bohn.
Etsy made sure to consider the end users of the analytics database before choosing an analytical database—and those end users, it turned out, were mainly analysts. So, Etsy made sure to pick a database based on the same SQL as PostgreSQL, which offered familiarity for end users and increased their productivity.
Big data has generated a lot of interest lately. CEOs are reading about it in the business press and expressing their desire to leverage enterprise data to do everything from customizing product offerings, to improving worker productivity, to ensuring better product quality. But too many companies begin their big data journeys with big budgets and even bigger expectations. They attempt to tackle too much. Then, 18 months down the road, they have very little to show.
It’s more realistic to think small. Focus on one particular business problem—preferably one with high visibility—that could be solved by leveraging data more effectively. Address that problem with basic data analytics tools—even Excel can work. Create a hypothesis and perform an exercise that analyzes the data to test that hypothesis. Even if you get a different result than you expected, you’ve learned something. Rinse and repeat. Do more and more projects using that methodology “and you’ll find you’ll never stop—the use cases will keep coming,” affirms HPE’s Colin Mahony, senior vice president and general manager for HPE Software Big Data.
Larry Lancaster, the former chief data scientist at a company offering hardware and software solutions for data storage and backup, agrees. “Just find a problem your business is having,” advises Lancaster. “Look for a hot button. Instead of hiring a new executive to solve that problem, hire a data scientist.”
Virtually all big data veterans warn about unanticipated data volumes.