Annotation – Social Media Analysis – Challenges in Topic Discovery, Data Collection, and Data Preparation

So far my bibliography has included research on data mining terminology and techniques, and some of the processing methods commonly used to extract sentiment and knowledge from social media data sets. Based on my research there is evidently no end to published articles on opinion mining, Natural Language Processing, and social network analysis. Many scholars have reported on the development and use of open source tools to extract and analyze data from Twitter or Facebook’s Graph API.

But what are the gaps in the research?

Today’s annotation summarizes an article published in late 2017 reporting on a systematic literature review of existing research on social media analytics. The authors seek to identify challenges that have not yet been solved by the models or the tools. I’ll give you a hint at what they find: We are very good at looking for specific needles in the social media haystack, but not so good at finding the needles we’re not looking for.

Stieglitz, Stefan, et al. “Social Media Analytics – Challenges in Topic Discovery, Data Collection, and Data Preparation.” International Journal of Information Management, vol. 39, Dec. 2017, pp. 156–68.

In this 2017 article for the International Journal of Information Management, Stieglitz, Mirbabaie, Ross, and Neuberger discuss social media analytics research, and address a perceived gap in the literature on stages of data discovery, collection, and preparation. They address this gap by conducting an extensive literature analysis to identify key challenges and solutions. In presenting their findings, they suggest ways to extend existing social media research frameworks to better serve scholars and practitioners.

The authors argue that while many research papers have been published on social media network analytics and qualitative data, they often represent isolated case studies bound by a specific subject and time frame. While noting that the methods used by these studies to extract useful information from social networks is often similar, there remains a lack of more comprehensive discussions on a general framework for social media analytics to guide future research. Much of the literature they review concerns specific methods for social media data analysis, such as opinion mining and social network analysis, but the authors assert that data analysis is only one step in the larger process of social media analytics. Their literature review therefore focuses on challenges faced by researchers in discovering topics, and during the process of collection and preparation of social media data for analysis.

One of those challenges is discovery of what to research. Most existing social media analysis models assume that topics of research are pre-defined, for example in political communications or crisis situations where specific topics are tracked. Such models may be useful in conducting a sentiment analysis on known issues and trends, but are less useful in discovering new issues and trends.

Given the size of “big data” in social media network communications, analytics is also challenged by the volume of storage space required, the velocity of data creation, the variety of data forms including unstructured and proprietary data, and the uncertain veracity of the data and its sources. The authors state that there has been little research on topic discovery that addresses these specific challenges.

In their literature review the authors classify the phases of social media analysis research, and usefully present the results in a table showing which have been addressed in papers by other scholars. Their analysis of existing research points to gaps they suggest should be addressed to better define a more generalizable model of social media analytics, and new methods and tools to solve the existing challenges of topic discovery in a world of big data.