We've arrived at the inevitable point of imperfection. I started out looking for authoritative resources on how social media data is harvested, processed, and used in commercial and political communication campaigns, and I sure did find them. The problem is this realm is changing so fast, many of these sources are out of date. Regardless, this research project must come to an end on May 11th when I turn in my IS 452 final project. My proposed annotated bibliography has become a series of marginal blog posts. But the fight continues.
So far I've been looking at how researchers analyze social media networks, and perform tasks like opinion or sentiment analysis to understand how people feel and think about various subjects and entities. With this annotation we're looking at a thing called content marketing, where influential users of a social network are first identified, then used to spread messages to their network of influence. Even in a huge network, a little bit of leverage in the right places can move products, and perhaps elections.
This is an academic research project and for the most part I'm focusing on academic resources. But I'm working to understand the specific tools and methods for mining social media data in order to effectively intervene using communications campaigns. The annotation offered here adds to these concepts by introducing "community mining" and techniques for analyzing key players, roles, and strong subgroup connections of communication and influence within a larger social media network. These are key concepts for understanding how opinions within a network are formed, shared, and spread. Or as someone might have once said, it's about influencing a group by influencing the influencers.
I had no sense of how much research has been done on sentiment analysis, opinion mining, and machine learning. I suppose major progress had to wait until there were massive datasets of text content available to process. The explosion of data on social media has provided both the impetus and the fodder to develop increasingly sophisticated techniques that are beginning to actually work. In this annotation I present an overview of the techniques and tools of comparative opinion mining. It's great to know what people feel and think about one thing. It's probably twice as great to know what they think about two things. Like two different cars, guitars, or political candidates.
The phrase "sentiment analysis" is high on the list of search terms for anyone seeking to understand how to process social media data. It's a component of Natural Language Processing (NLP), where a machine extracts (somewhat) accurate meaning from human language and textual information. This seems really hard, unless I'm wrong, because the whole AI field and NLP seem to be moving forward fast once again. Here's an annotation of a journal article that provides a decent overview.
I met Ian Brooks and his family through the theatre program at Champaign's Central High School, where our kids participated in numerous performances and the whole wonderful high school theatre thing. I knew he was doing research in the use of social media to inform public health interventions, as a faculty member of the University of Illinois iSchool and associate of the National Center for Supercomputing Applications. So when I took on this project he was the first person I wanted to consult for insight.
While mining the Information Science Virtual Library for academic papers on "social media" and "data mining," I came across Matthew Russell's O'Reilly book Mining the Social Web: Data Mining Facebook, Twitter, Linkedin, Google+, Github, and More. The 2nd edition was published in October 2013, with a 3rd edition scheduled for publication next month. Because the book covers the specific techniques I'm after concerning data mining and analysis of social media, I decided to pull the trigger and buy the book right now. The book is basically a tutorial on data mining social media sites using Python. Alas all the source code it references is in Python 2.7 and I've been working with version 3.6, but that's fine. It also covers using IPython Notebooks and even begins with a guide to setting things up on a virtual server. I'll probably wait to actually do that until I see what's new in the 3rd edition. But the book definitely makes the final cut for my annotated bibliography. With that as a given, I thought it would be useful to get started with the first annotation.
I've been spending time learning Python since January, and it's creating new problems. For example, suddenly I want to do things with Python. I want to write a program to process titles and filenames of media archives records from an Excel spreadsheet, and find the matching media files which are stored on a network drive. I need to read a few thousand PBCore XML records and convert them to JSON. I want to take the JSON output from Google Speech-to-Text transcripts, and convert it to WebVTT files. But I can take on just one project right now, and here it is.