This week I had the opportunity to participate in a panel discussion at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. The panel discussion was part of the “Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data” organized by the DMG (Data Mining Group). The panel session and the associated presentations spoke in detail about the challenges associated with operationalizing models. Too often, once analytical models have been created by the data science team, the process of operationalization is lengthy and labor intensive. In many instances, there is no turn-key strategy for deploying these models to a real-time scoring solution running elsewhere in the company. Indeed, in many cases, the production version of a model must be manually developed in C++ or Java by an entirely separate team, with a Word document written by the data scientists serving as the model specification. As can be imagined, this is an error-prone process, requiring extensive testing and impacting an enterprise’s agility and ability to rapidly deploy and update models.
In many instances, this requirement for recoding between training and deployment is a result of the incompatibility between the models created by the toolchains used by the data scientists and the model formats supported by operationalized scoring engines. Ideally, a model generated by a data scientist would be directly consumable by the operationalized frameworks, with a guarantee that both components interpret the model identically.
Luckily, open standards for describing predictive models do exist.