We've reached the final annotation in our series on "Social Media Data Collection, Processing, and Use in Research, Marketing, and Political Communication." Toward the end of the project my research drifted from traditional academic sources to investigative journalism. We now veer further off-track into blog posts and GitHub repos. Some videos and a course syllabus on Data Science for Social Systems. Tools, documentation, and related sources that don't fit neatly into any particular box. This isn't so much an annotation as a grab bag of annotated links. I apologize in advance.
The phrase "sentiment analysis" is high on the list of search terms for anyone seeking to understand how to process social media data. It's a component of Natural Language Processing (NLP), where a machine extracts (somewhat) accurate meaning from human language and textual information. This seems really hard, unless I'm wrong, because the whole AI field and NLP seem to be moving forward fast once again. Here's an annotation of a journal article that provides a decent overview.
I met Ian Brooks and his family through the theatre program at Champaign's Central High School, where our kids participated in numerous performances and the whole wonderful high school theatre thing. I knew he was doing research in the use of social media to inform public health interventions, as a faculty member of the University of Illinois iSchool and associate of the National Center for Supercomputing Applications. So when I took on this project he was the first person I wanted to consult for insight.
While mining the Information Science Virtual Library for academic papers on "social media" and "data mining," I came across Matthew Russell's O'Reilly book Mining the Social Web: Data Mining Facebook, Twitter, Linkedin, Google+, Github, and More. The 2nd edition was published in October 2013, with a 3rd edition scheduled for publication next month. Because the book covers the specific techniques I'm after concerning data mining and analysis of social media, I decided to pull the trigger and buy the book right now. The book is basically a tutorial on data mining social media sites using Python. Alas all the source code it references is in Python 2.7 and I've been working with version 3.6, but that's fine. It also covers using IPython Notebooks and even begins with a guide to setting things up on a virtual server. I'll probably wait to actually do that until I see what's new in the 3rd edition. But the book definitely makes the final cut for my annotated bibliography. With that as a given, I thought it would be useful to get started with the first annotation.
I've been spending time learning Python since January, and it's creating new problems. For example, suddenly I want to do things with Python. I want to write a program to process titles and filenames of media archives records from an Excel spreadsheet, and find the matching media files which are stored on a network drive. I need to read a few thousand PBCore XML records and convert them to JSON. I want to take the JSON output from Google Speech-to-Text transcripts, and convert it to WebVTT files. But I can take on just one project right now, and here it is.