There is a lot of interest in Big Data, Business Intelligence, Predictive Analytics, and other data-related fields these days. Whether in distinctly non-legal areas like the Internet of Things or legal areas like jury selection, litigation finance, textual analytics, and hedge fund replication, techniques for using data are clearly changing many aspects of the business world.
This set of tools and techniques as a whole can be generically termed data analytics — and with major increases in computing power and software interfaces, 2017 may well be the biggest year yet for data analytics advances. Still for most novices in the field, there is a major misunderstanding around what data analytics can and cannot do.
To begin with, all data analytics processes start with a basic truism — garbage in, garbage out. If the data being analyzed is not accurate and representative of the world, then it’s not useful. This concept seems simple, but it is often forgotten. For instance, in a risk management function, people often think of data as being useful for extrapolating the likelihood of future events — but that is only true if we have data where the events we are worried about are actually occurring with the same frequency that they do in the world.
Take jury selection for example — we can use a statistical model called a probit model to figure out the probability of a particular juror making a decision at the end of the case. In order to model that effectively, we need to have data on the juror — age, sex, employment, background, etc. Once we have that data, we can figure out the decision that juror is likely to come to given the facts of the case, and equally importantly, data analysis can tell us statistically how confident we are in that outcome. In other words, we might be 95% sure that juror XYZ would render a verdict of guilty, while we are only 63% sure that juror ABC would render such a verdict.
Yet in order to build this type of model, we need to have the right underlying data — that means having the right data on the juror, and having the right data on past cases that have been decided with other jurors and the data about those other jurors. In other words, building a data model requires investment of time and money — it is not a simple one-off process in many cases.