The foundational role of Apache Hadoop in open analytics ecosystems is undisputed. It is the clear focus of data science, cognitive computing and big data analytics ecosystems everywhere. Hadoop provides an open platform on which today’s data scientists build innovative analytics. Collectively, Hadoop, Apache Spark, R and other open source tools and languages provide a growing stack of open source analytics code.
Several announcements at Strata + Hadoop World 2016 showed that this growing open source stack continues to develop. The key open analytics ecosystem milestones signal advances in interoperability, partnering and platform integration.
The interoperability milestone was the release by ODPi of core specifications for interoperability and certification. Specifically, ODPi, a nonprofit group in which IBM is a charter member, celebrated a new milestone with the introduction of the first runtime specification, test suite and reference build for its Hadoop interoperability framework. The new ODPi Runtime Specification 1.0 fully leverages and aligns with relevant open source initiatives under the Apache Software Foundation (ASF):
The new ODPi Test Suite links tests directly to lines in the ODPI Runtime Specification. And the new ODPi Reference Build assists developers in assuring that their builds comply with the runtime specification.
Taken together, these new ODPi deliverables enable developers to build applications once and certify them to run across diverse Hadoop distributions. Later in 2016, ODPi plans to release its Operations Specification, a follow-on component that is expected to help users improve installation and management of Hadoop and Hadoop-based applications. This specification covers Apache Ambari, the open source project for provisioning, managing and monitoring Hadoop clusters.
IBM announced the pilot of its Open Analytics Ecosystem for the partnering milestone. It is an initiative that launches at Spark Summit West in early June 2016. Under the program, IBM plans to build relationships within the open analytics community directly with the business leaders, applications makers and technology experts. Among open analytics codebases, this partnership program focuses principally on Spark.
By the end of 2016, IBM plans to have signed up more than 100 open ecosystem partners who are actively participating, and IBM is expected to reach out to a wide range of potential community members.;