Five often-overlooked Hadoop

Five often-overlooked Hadoop, Big Data analytics project killers

Five often-overlooked Hadoop, Big Data analytics project killers

When you’re getting ready to perform analytics on a data set, attention often gets focused on the software you’re going to use to analyze and create your reports.  Often, companies are thinking about how they are going to store data and build visualizations for one project and one instance. However, to truly achieve maturity in your big data analytics projects, you have to be thinking about the big picture.  You must be thinking through all the criteria that can take them into success, both today and in the future.

Overlooked One: How will I get and manage the data?

In organizations where data management is immature, users and business units tend to hoard the data. Business users often have mistaken that if you own the data, you own the power. As IT professionals, we should move the organization toward data sharing – the enemy is not within, but it is with your data savvy competitors.  IT can help by introducing technologies that make for easy democratization of the data. By supporting technologies like Kafka, IT can setup a publish and subscribe infrastructure for the data to help break the data fiefdoms.

Read Also:
Building a Data Analytics Culture Takes Time

At HPE, of course we support Kafka in our HPE Vertica platform.  In addition, we’re working on the data democratization problem by doing things like supporting Hadoop file formats like ORC, Parquet, JSON and others so that data may be loaded into the analytics platform and anyone can be a data consumer.  The high performance of our analytics database is not only about the speeds and feeds, it’s also about giving more end users the capability to leverage the data. We rely on a strong partnership network for ETL and data curation including partners like informatica, Talend, SyncSort, Tamr and Pentaho to name just a few.

Overlooked Two: Am I running the right hardware for the task?

Corporations often have banks of IT infrastructure that they can draw upon. HPE has sold a ton of Proliant DL380P servers over the years, offering a solid foundation for most IT tasks and a very predictable plan for power usage, management and operations. However, you should be considering that different workloads in your project may have different requirements for compute, storage and latency. For example, in the Hadoop world, ETL jobs may require lots of storage and the fastest network connection to deliver performance, while BI dashboards will rely on fast CPU and lots of memory to perform better.  By thinking through how the hardware is going to be used, you can optimize and save.

Read Also:
Ending the Data Battle Between Business and IT

This is really what HPE is accomplishing with our recent announcements of big data reference architectures for Vertica SQL on Hadoop.  We have been working with the open source community on reference architectures on both the Proliant and Apollo platforms that can be optimized for the task.  For example, if you need to turn up Hadoop compute resources, you can adjust some settings in YARN and get it.  If you need to turn up storage performance, adjust the YARN labels and go.

Overlooked Three: Is it scalable and elastic?

The big data reference architectures also help you when start to get killed by your own success. Project managers should consider what to do if the project is a wild success and you get more data, more users and more queries.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Artificial Intelligence Is Setting Up the Internet for a Huge Clash With Europe
Read Also:
Six Considerations for Choosing the Right Analytics Solution

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Can Machines Deep Learn Project Management?

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
5 Keys to Leading in the Age of Analytics

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Can Big Data Tame the Chaos of Virtualized IT?

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How Big Data Analytics Can Boost Provider Autonomy, Outcomes

Leave a Reply

Your email address will not be published. Required fields are marked *