I met Ian Brooks and his family through the theatre program at Champaign’s Central High School, where our kids participated in numerous performances and the whole wonderful high school theatre thing. I knew he was doing research in the use of social media to inform public health interventions, as a faculty member of the University of Illinois iSchool and associate of the National Center for Supercomputing Applications. So when I took on this project he was the first person I wanted to consult for insight.
In a recorded interview in his office, Ian demonstrated and explained the tools his team is using to access, process, and analyze data from multiple social media networks including Facebook, Twitter, YouTube, Google Plus, etc., plus a large set of comment data from various sources. The access piece is an application from the company Crimson Hexagon, for which the University of Illinois has an academic site license.
After the interview with Ian, I checked further into Crimson Hexagon and how social media analytics in general is currently being used on campus. Here are a few other interesting resources for those interested:
- Commons Knowledge: Insights from the Scholarly Commons at the University of Illinois Library
- Commons Knowledge: Open Source Tools for Social Media Analysis
- Technology Services Blog: Telling Compelling Stories with the Social Media Analytics Team
Below is an annotation of my interview with Ian Brooks:
Brooks, Ian. Personal interview. 4 Apr. 2018.
In his work as a research scientist at the University of Illinois iSchool, Ian Brooks is investigating ways of leveraging social media data to inform and improve public health outcomes. Brooks provided a guided tour of tools he’s currently using to download and analyze data from multiple social media networks, including Twitter, Facebook, Reddit, Google Plus, Yelp, YouTube, and Instagram. He is working with a team of scholars with expertise in computer science, history, and digital humanities, using a Crimson Hexagon application to search, view, process, and analyze up to 10 thousand social media posts per batch. The Crimson Hexagon application provides a user interface for constructing a compound search of public posts on selected social media networks. Brooks says the application searches only the “front end” of the social networks, and does not access data through their APIs. Crimson Hexagon was founded as a for-profit “AI-powered consumer insights company” in 2007, and markets its applications and data products to a wide range of corporate, government, and educational entities. The University of Illinois licenses its applications for use by university-affiliated researchers and students.
Brooks uses the Crimson Hexagon tool to follow specific subject terms over time, so as to determine the context, sources, and emotional content of the terms as they are used in social media posts. For example by following hashtags associated with the disease Ebola, he was able to follow public reaction to the arrival of Americans infected with Ebola in Dallas and New Hampshire, including the spread of misinformation about the cases and Ebola itself through social channels. Brooks noted that the World Health Organization had conducted an earlier public information campaign on Ebola, but the campaign had ended by the time the first cases of Ebola were reported on American soil. The social media data showed that a variety of non-credible sources was filling this vacuum with misinformation about Ebola. Understanding these information flows over time provides “actionable” public health responses, e.g. by recognizing when public communication campaigns from official sources like the WHO are needed to counteract misinformation.
Brooks also demonstrated his current research on public sentiment concerning the skin disease psoriasis. He explains that social media data appears more comprehensive and reliable than traditional clinical sources, which are limited both in number of responses and the willingness of patients to complete surveys. His analysis of the social media data shows that in 2010, people affected by psoriasis began expressing less negative sentiment about the disease, possible indicating that more effective medical interventions were introduced at that time.
Crimson Hexagon provides a number of utilities for filtering and analyzing the data, including export to csv files. The application also provides its own sentiment analysis of the posts, displayed and filterable in various numerical and graphic representations. The posts searched by the application are downloadable for further processing and analysis. Brooks’ team has developed a number of Python scripts to identify and remove spam content, which he says is major component of the data. They then use machine learning techniques such as support vector machine (SVM) algorithms and random forests to identify meaningful terms, entities, and expressions. An interesting aspect of studying social media is the personal nature of meaningful expressions, which requires adjustments to standard stopword lists, e.g. making sure not to exclude personal pronouns like “I” and “my.”
Brooks and his team are just beginning to use these tools as a foundation for improving decisions in public health interventions. He explains they are general purpose tools, and are also used by political campaigns and commercial entities for marketing purposes. Social media data and the tools to mine it and extract meaning are rapidly shifting the boundaries of what it is possible to know about how people actually think and communicate.