The answer is very few. And Avellanet has the numbers to quantify his thesis: “Of the 20 data integrity audits that I conducted just last year for clients, just one firm had a change control process that required data regression testing, and they’d just implemented it and weren’t certain yet how to do it. So, we’re making progress, but we’ve a long way to go.”
In the big data/SaaS world, Lucas Moody, CISO, Palo Alto Networks, says it seems as if we’ve created a giant game of telephone, but the reality is all the parties engaged have a vested interest in ensuring the integrity of the game and the final outcome.
And while integrity in big data environments has been a debate in recent months, particularly in use cases involving massive compute operations, genetic research and clinical studies among others, the pollution or injection of small amounts of data are oftentimes inconsequential when dealing with large data sets, as the law of large numbers would indicate, Moody says. That said, in environments where data integrity is paramount, data at rest, data in transit and control around those who have the capability to manipulate data has to be considered in a comprehensive information security strategy, he says.
Protecting the integrity of big data is a much larger and more complex problem than that of traditional PII, says Michael Taylor, applications and product development lead at Rook Security. A single record of information about an individual may contain data like street address, date of birth and Social Security number, he points out. “In a big data context, a single user may generate many thousands of times that volume of data through their every day use of a website, app or service. This larger volume of data will typically be generated and piped through several different resources.”
Verifying the integrity of data as it passes through multiple tools is where the increased complexity comes into play, he adds. “Ensuring that the data generated on the user application side has not been manipulated inadvertently or maliciously before arriving at the final data store requires external monitoring and sampling of the data in motion and at rest.”
The state of data integrity is not very good, says Tavakoli. “We’re in the early stages of understanding the implications of data integrity issues. While data engineering teams have been trained to cleanse data (throw some of it out because it lacks certain key fields) and normalize it, they have not been trained to look for signs of tampering with the data. It’s akin to the early days of cybersecurity when there were weaknesses in the way code was developed and the SDL acronym hadn’t been invented yet.”
Akamai’s Shaul says we have so many things to secure. Organizations need to take a holistic view of their data, he explains. “They must ask: Where is the sensitive information stored? How is it used, processed and transmitted? Who has access at each level – and more importantly, who should have access?”
“Security always starts with understanding your own estate and building a threat model that helps you understand what and where an attacker is likely to target,” Shaul says.