The “big data” revolution has already transformed fields such as biology, astronomy and physics, but its impact has been much more patchy in the social sciences.
To explore why this might be, the University of Essex has teamed up with Sage Publishing to produce a Sage white paper titled Who Is Doing Computational Social Science?: Trends in Big Data Research.
In the natural sciences, the authors point out, big data research relies on “high-throughput instruments” such as particle accelerators and genome sequencers that have been “designed specifically for analysis by scientists”.
By contrast, while social scientists may deal with equally large amounts of data, these “derive overwhelmingly from mixed sources (e.g., social media, unstructured text, digital sensors, financial and administrative transactions) not designed to produce valid and reliable data for social scientific analysis”. This leads to “the challenge of harmonizing and extracting meaningful features from a variety of data streams”.
To learn more about how they are meeting this and other challenges, Sage Publishing carried out a survey of social scientists around the world and got 9,412 responses, mainly from the US (3,302 respondents) and the UK (728), followed by India (405) and Canada (353). Thirty-three per cent had already been involved in big data research, while of those who had not 49 per cent were “definitely planning on doing so in the future” or “might do so in the future”.
Along with “an appetite to engage with big data research”, the white paper identifies a number of barriers to entry, with “finding collaborators with the right skills” and the “time required to learn a new field” flagged up as the most significant. Some of those already using big data, meanwhile, said they had “a big problem” in “getting funding” (42 per cent) or “getting access to commercial or proprietary data” (32 per cent), while 61 per cent found “choosing an appropriate journal” a “big problem” or “something of a problem”. (One free-text response noted that “Several of the top journals in business school disciplines have not yet embraced Big Data Analytics.”)
Researchers’ own skills gaps were another significant problem, with 40 per cent of respondents wanting either “basic introductory training on big data analytics or data science” or a better understanding of “specific topics, such as text mining and R and Python programming [languages]”.
So how does this broad picture play out in the lives of individual researchers?
Laura Nelson, assistant professor of sociology and anthropology at Northeastern University, has worked on “analyz[ing] feminist movements in Chicago and New York City from 1865 to 1975”. After visiting archives across the country, collecting large quantities of text and then digitising them, she “started to incorporate techniques developed in computer science and computational linguistics to make the entire content analysis process more reliable, reproducible and scalable”.
Yet she encountered a number of obstacles. While she was at graduate school at the University of California, Berkeley, from 2006 to 2014, “there were few classes teaching the specific skills I needed. [Either classes] were taught in computer science and applied math departments and were largely impenetrable to me…or they focused on skills that were marginal to what I really needed”.