Annotation – A Bundle of Open Source Resources for Social Media Data Mining and Analysis

We’ve reached the final annotation in our series on Social Media Data Collection, Processing, and Use in Research, Marketing, and Political Communication. Toward the end of the project my research drifted from traditional academic sources to investigative journalism. We now veer further off-track into blog posts and GitHub repos. Some videos and a course syllabus on “Data Science for Social Systems.” Tools, documentation, and related sources that don’t fit neatly into any particular box. This isn’t so much an annotation as a grab bag of annotated links. I apologize in advance.

My bibliography for IS452 has officially become a network of small pieces loosely joined. But there’s fodder here for additional research, and a starting point for not just programming tasks, but the social and political context of social media analytics as currently practiced.

I hope you find this interesting and possibly useful.


Facebook. “The Graph API.” https://developers.facebook.com/docs/graph-api

Where would we be without the Facebook Graph API? Probably still in Paris Climate Accord. Anyway, this was the source of data mined by the folks at Cambridge Analytica through the Facebook app created by Aleksandr Kogan, which crawled through 80 million Facebook profiles to build a dataset for sentiment analysis and psychographic messaging strategies. Facebook has since locked it down to some extent, but it’s still pretty useful for social network analysis and opinion mining.

Klipfolio. “Using Facebook’s Graph API Explorer to Retrieve Insights Data.” Klipfolio.Com, 11 Apr. 2014, https://www.klipfolio.com/blog/facebook-graph-api-explorer.

This short tutorial is written for non-programmers and those unfamiliar with APIs. It provides step-by-step instructions for accessing and using the Graph API Explorer, setting up an access token, and retrieving insights from Facebook pages. Klipfolio is a commercial vendor that provides proprietary dashboard solutions for data analytics, and this blog post is couched in terms of feeding Facebook data into their product. It may still be useful to those who need a very basic introduction to the Facebook API and API Explorer.

Ranjan, Ravi. “How to Use Facebook Graph API and Extract Data Using Python?” Towards Data Science, 2016, https://towardsdatascience.com/how-to-use-facebook-graph-api-and-extract-data-using-python-1839e19d6999.

A data scientist explains how to extract data from the Facebook Graph API using Python. Ranjan walks through the process of getting an access token, which is required for making API calls. He references Graph API version 2.7, whereas the current version is 3.0, but the programming patterns are the same. (The Graph API Reference https://developers.facebook.com/docs/graph-api/reference/v2.7/ provides documentation for the current and past versions.) The guidance will be useful for anyone with some Python experience who is just beginning to explore what data can be mined from Facebook.

Ferrara, Emilio. “Data Science for Social Systems.” http://www.emilio.ferrara.name/i400-590-mining-the-social-web/. Accessed 10 May 2018.

This site is a comprehensive syllabus by Prof. Elimio Ferrara, Research Assistant Professor at the Deptartment of Computer Science at the University of Southern California, covering “how to unleash the full power and potential of the Social Web for research and business application purposes!” Topics include machine learning, Natural Language Processing, sentiment analysis, topic modeling, network visualization, and recommender system among many other area of social media processing and analysis. The course has a strong Python orientation.

Shaik, Afiz. “Facebook Data Analysis Using Python: Explore GraphAPI Part 2.” 2018. YouTube, https://www.youtube.com/watch?v=o1qeNwoLh68.

Shaik walks through the process of using Jupyter Notebook and Python 3 to mine and process Facebook Graph API data. After setting up the development environment using Anaconda, he explains the use of Facebook access tokens and API queries. He then demonstrates how to work with the Graph API Explorer to pull specific data in JSON format. This brief tutorial may be useful for those who prefer learning from video sources.

Spring. “Accessing Facebook Data.” https://spring.io/guides/gs/accessing-facebook/. Accessed 20 Apr. 2018.
Spring.io presets a “getting started” guide to the process of creating a web application to access Facebook data using Java. The guide walks though the requirements and the steps needed to develop working code. This resource will be more useful for those with at least intermediate programming experience, especially in Java.

bigdataenthusiast. “Mining Facebook Data Using R & Facebook API!” Data Enthusiast, Mar. 19, 2016. https://bigdataenthusiast.wordpress.com/2016/03/19/mining-facebook-data-using-r-facebook-api/.

A blog post by an enthusiastic programmer showing how to extract Facebook API data using the R programming language, and the Rfacebook package. The author provides a detailed, step-by-step guide using screenshots and code examples. Even with recent changes to the Facebook Graph API, the author’s basic approach should still be valid.

Conkwright, William. “How to Get Public Data from Facebook with PHP.” Will Conkwright, June 14, 2017. https://www.willconkwright.com/how-to-get-public-data-from-facebook-with-php/.

Conkwright provides a guide to accessing Facebook API data using PHP, with an example of getting a “talking about” cont for locations around Raleigh, North Carolina. First he shows how to use the Facebook API explorer to generate queries. He explains the specifics of the query string with screenshots, and breaks down the query url to show the parameters. He then shows how to retrieve Facebook data using a custom PMP function, and provides a link to a gist of the PMP code snippet on GitHub.

Computational Linguistics Research Group. “Pattern: Web Mining Module for Python, with Tools for Scraping, Natural Language Processing, Machine Learning, Network Analysis and Visualization.” Last commit 2017. https://github.com/clips/pattern.

This resource is a repo on the GitHub account of the Computational Linguistics Research Group at the University of Antwerp. Pattern is a Python module with tools for data mining, Natural Language Processing, machine learning, and network analysis. It supports a variety of methods for extracting syntactic, semantic, and sentiment information, including n-gram search, clustering, and SVM. Pattern appears well documented and includes bundled examples. The main branch supports Python 2.7, but a Python 3 version is available in the development branch. The documentation https://www.clips.uantwerpen.be/pages/pattern includes code examples and several case studies.

GW Libraries. “Social Feed Manager.” Social Feed Manager, https://gwu-libraries.github.io/sfm-ui/. Accessed 10 May 2018.

This site from George Washington University Libraries offers code, documentation, and how-to articles related to the Social Feed Manager, an open source project that harvests social media data from a variety of sources. The project also maintains extensive documentation on readthedocs http://sfm.readthedocs.io/en/latest/

Routley, Nick. “The Multi-Billion Dollar Industry That Makes Its Living From Your Data.” Visual Capitalist, Apr. 14, 2018. http://www.visualcapitalist.com/personal-data-ecosystem/.

Finally, here’s a fun little guide for consumers on how big tech companies and data aggregators mine and monetize our personal information. The article serves as a reminder that Facebook is only one company in a large industry that consumes data and excretes all the products of contemporary marketing and financial management. The article covers the nature of personal digital profiles compiled by data brokers like Acxiom and Experian, and suggest ways consumers can limit the exposure of their data.

News Sources

As if it wasn’t bad enough that I’m annotating YouTube videos and GitHub repos, here are some recent works of journalism that provide a social and political framework for what the technology now affords. I’ll let the titles speak for themselves. Don’t get too depressed!