5 Big Data Projects You Can No Longer Overlook

5 Big Data Projects You Can No Longer Overlook

Check out 5 Big Data projects that you are not likely to have seen before, but which may be useful to you, and perhaps even scratch an itch you didn't know you had.

The Big Data Ecosystem is big. Some say it's too damn big!

Consider, first, the behemoths in the space, the Big Data processing frameworks: Hadoop. Spark. Flink. Any of the other umpteen Apache projects. Google's platforms. Many others. They all work in the same general space, but with various differentiating factors.

Next consider the support tools in the various data processing ecosystems. Then have a look at the various data stores and NoSQL database engines available. Then think about all of the tools that fit particular niches, both "official" and unofficial, that grow out of both large companies and individuals' ingenuity.

It is this final category that we are concerned with herein. We will take a look at 5 Big Data projects that are outside of the mainstream, but which still have something to offer, perhaps unexpectedly so.

As always, finding overlooked projects is much more art than science. I collected these projects over the course of time spent online over an extended period. The only criteria was that the projects were not alpha-level projects (subjective, no?), caught my eye for some particular reason, and had Github repos. The projects are not presented in any particular order, but are numbered like they are, mostly for ease of referencing, but also because I like numbering things.

Luigi was originally developed at Spotify, and is used to craft data pipeline jobs. From its Github repository README:

Luigi stresses that it does not replace lower-level data-processing tools such as Hive or Pig, but is instead meant to create workflows between numerous tasks. Luigi supports Hadoop out of the box as well, which potentially makes it a much more attractive option for many, many users. Luigi also supports file system abstractions for HDFS, and local files enforce operation atomicity, which is essential for ensuring state between pipeline tasks.

Luigi also comes with a web interface for visualizing and managing your tasks:

Luigi is also gaining in popularity, and currently boasts nearly 5000 repo stars on Github, which is impressive for something I'm categorizing as "not popular." If you are interested in seeing it in action, here is a tutorial on using Luigi together with Python to build data pipelines, written by Marco Bonzanini.

I'm a fan of pipelines; if you are too, Luigi may be a project worth checking out for managing your data processing tasks and workflows.

Developer Altamira reasons that the appropriate tools for exploiting data and extracting insight were not well-enough developed, and so they took it upon themselves to design Lumify, a tool to aggregate, organize, and extract insight from your data.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

What can be done about the data science skills gap?

16 Mar, 2018

The swelling demand for data scientists coupled with the evident data science skills gap has implications for not only the …

Read more

Data Analytics and Data Integration Are at the Core of an Omnichannel Business

5 Dec, 2017

A few weeks ago, we published the first part of our article on omnichannel, which came as a result out …

Read more

Going with the stream: unbounded data processing with Apache Flink

8 Feb, 2017

Previously, we introduced streaming, saw some of the benefits it can bring and discussed some of the architectural options and …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.