Data now means more and does more within the enterprise than ever before. From mitigating financial risk by detecting fraud to creating recommendation engines and optimising the customer experience, data is helping companies solve increasingly complex problems.
What, then, have we learned over the past few years as data has moved to the forefront of organisations? With options ranging from proprietary software to cloud-based software and open source tools, today’s developers, architects, and IT professionals have many choices when it comes to large-scale analytics projects. Some require an expensive up-front investment. Others require many resources. And then there are tools that hit the sweet spot: they’re easy to implement and provide extensive features to prototype at scale.
Finding tools that increase project success and help you avoid common pitfalls is key. Here are five tips for selecting the right products for your large-scale analytics projects.
The biggest mistake companies make when embarking on an analytics project is to go too big, too soon. Often, especially when projects are driven from the top down, the temptation is to start by building a complex solution with no clearly defined outcome. This results in expensive and time-intensive projects.
Instead, start small and focus on quick, early ‘wins’ to build confidence with end users. Leverage modern open source technologies that don’t require large, up-front financial commitments and that enable your developers to get started quickly. A desired outcome is an application or prototype built in days or weeks.
Even though you may only be building a prototype, it is critical that you test for scalability as early as possible. Many projects fail because the application wasn’t built or tested with scalability in mind, or because the technologies selected were not designed to handle large data volumes.
Make sure performance testing is not an afterthought. Model out how much data you think you’ll be capturing over time. Test it, reference it, and build the right architecture to enable horizontal scaling with zero performance degradation as data volumes grow.