In most security data science talks that describe a specific algorithm used to solve a security problem, the audience is always left wondering: how did they perform system testing when there is no labelled attack data; what metrics do they monitor; and what do these systems actually look like in production? Academia and industry both focus largely on security detection, but the emphasis is almost always on the algorithmic machinery powering the systems.
Prior art productizing solutions is sparse: it has been studied from a machine-learning angle or from a security angle but has not been jointly explored. But the intersection of operationalizing security and machine-learning solutions is important not only because security data science solutions inherit complexities from both fields but also because each has unique challenges—for instance, compliance restrictions that dictate data cannot be exported from specific geographic locations (a security constraint) have a downstream effect on model design, deployment, evaluation, and management strategies (a data science constraint).
Ram Shankar Siva Kumar and Andrew Wicker explain how to operationalize security analytics for production in the cloud, covering a framework for assessing the impact of compliance on model design, six strategies and their trade-offs to generate labeled attack data for model evaluation, key metrics for measuring security analytics efficacy, and tips to scale anomaly detection systems in the cloud.