230 million patients. 3,300 hospitals. 900,000 healthcare professionals. 98 percent of U.S. pharmacies. More than 700 different electronic health record platforms. 764 million medication histories. 6.5 billion transactions processed last year alone.
"We've definitely had an opportunity to become experts in Big Data," said Paul Calatayud, CISO at Surescripts.
Surescripts is the country's largest health information network, storing and protecting one of the most valuable information treasure troves on the planet.
Calatayud said that the company first began looking at Big Data analytics to spot fraudulent activity by patients or doctors. Two years ago, this was done using spreadsheets and pivot tables, he said, and it took about a year to move to Hadoop for the data storage and Splunk for the analytics.
Then, six months ago, Surescripts began using the same approach for internal security, processing incident and log data.
Logs and incident reports
Surescripts might be operating on a bigger scale than most companies, but most enterprises of any size are dealing with a flood of data from firewalls, networks, email systems, individual work stations, servers, and other devices.
The information comes in fast, in large volumes, and in a wide variety of formats -- the classic definition of Big Data.
Traditional data management systems would quickly run into scalability issues.
However, just having all the log and incident data in one place doesn't improve security if there's too much information there for people to manage, said Marcin Kleczynski, founder and CEO at Malwarebytes.
Big Data analytics helps companies process all this information, prioritize the most significant threats, and weed out random noise and false alerts. At least, that's the idea.
"Lots of mysterious black-box technologies are offered for this," said Mike Lloyd, CTO at security analytics company RedSeal. They include genetic algorithms, machine learning, and artificial intelligence.
"What they have in common is that they are poorly understood, but powerful, and this makes them very appealing as silver bullets," he said. "But artificial intelligence has been frustratingly hard to build – computers just aren't all that smart."
"Data mountains need data mountaineers," he said. "The data won't analyze itself. Simply buying a big data warehouse and layering some Hadoop technologies on top isn't going to bring about enlightenment."
However, the automation technologies are evolving. They correlate more feeds, and are increasingly able to look at events from different perspectives.
For example, risk management, incident response and forensics ask different questions of the data, he said, and different technological approaches are being developed to meet these needs.
And that's just the start of what Big Data can do to improve security, said Jerry Irvine, CIO at Prescient Solutions.
Users behaving badly
Surescripts began looking at user behaviors and credentials three months ago.
"That's where things move more into unstructured data," said Surescripts' Calatayud.
To get the data analyzed, Calatayud is looking at an analytics platform from Gurucul, which specializes in identity access intelligence and user behavior analytics.
"They can slice up the data to specifically address my use cases," he said. "And it allows us to leverage industry expertise rather than trying to build up core competencies and strategies that might not become directly revenue opportunities.
It's not just people who can behave badly. Similar technology can be used to identify normal behaviors of individual endpoints, and recognize when they're doing something suspicious.
"Previous efforts in security analytics were unable to meaningfully represent the expected and normal behavior for connected endpoints," said Bryan Doerr, CEO at Observable Networks, one of the vendors offering this technology.
That challenge has only been getting harder, he added, as the number of endpoints has been growing along with the amount of data they generate that companies can now store.
"Our big idea was to use the data avalanche as inputs to a modeling process," he said. "We use all this rich data about endpoints to maintain models of their behavior, so that we can recognize when they do things they should not do."