As big data initiatives gain steam at organizations, many companies are creating “data lakes” to provide a large number of users with access to the data they need. And as with almost every type of new IT initiative, this comes with a variety of security risks that enterprises must address.
Data lakes are storage repositories that hold huge volumes of raw data kept in its native format until it’s needed. They’re becoming more common as organizations gather enormous amounts of data from a variety of resources.
The growing business demand for analytics is helping to fuel the move to large repositories of data. And data lakes are likely to take on even more significance with the growth of the internet of things (IoT), in which companies will gather data from and about countless networked objects.
“Businesses and consumers are creating data like never before,” says Mohit Aron, founder and CEO of data storage company Cohesity. “In turn, the number of siloed data lakes has exploded, meaning that enterprises are faced with the challenge of protecting separate security perimeters around each data lake.”
“For the executive, the idea of gaining competitive advantage, unique insight and anticipatory intelligence is compelling,” Hockenberry says. “However, in order to generate these outcomes the data scientist is advocating for a data lake. This lake is a combination of proprietary, open source and other datasets that can be analyzed in unique ways.”
It can also be a major target for cyber criminals. “Hacks into data lakes are a continual threat, one that is exacerbated by the large number of data lakes that enterprises have,” says Aron, who as a former Google engineer and lead architect of Google File System 2 has helped build and maintain some of the biggest data lakes in the world.
Considering the high business value of these information resources and the growing risks, security and IT executives need to make data lake security a high priority. To begin with, there needs to be an understanding at the highest levels of the organization of the need to protect data stores to the greatest extent possible.
Unfortunately, this doesn’t always happen.
“The appeal of increased agility, reduced costs and removal of silos cause many organizations to jump head first into the data lake and ignore basic information governance best practices at their own peril,” says Jonathan Steenland, principal at Zyston CISO Advisory Services, where he is responsible for co-leading CISO advisory and consulting.
“Since data lakes are such a data rich target, hackers will prioritize their efforts at exploiting these types of technologies and the users who connect to them,” says Steenland, who previously served as CISO at Fujitsu.
Data lakes should be managed as a highly valuable corporate asset, Hockenberry says. “In many cases, executives look at this as a ‘tech problem,’” he says. “However, a data lake should be seen as corporate IP [intellectual property] and if someone gains access to it, they could see strategic information that could affect shareholder value, compromise [research and development], and reveal plans and intentions that can create issues for a company.”
The best way to address these issues is to understand what data the enterprise is collecting, how it’s being analyzed, protected and disseminated, Hockenberry says. Business, IT and security executives need to build data-centric risk management strategies to ensure information is protected no matter where it resides, he says.