Critical data modeling and the basic representation model – Annotation & Notes

Chart showing Race disparities in US criminal justice system, late 2010s

Data models are foundational to information processing, and in the digital world they stand in for the real world. When machines are used to make algorithmically-informed decisions, their algorithms are informed by the data models they use. And the data structured by data models is numerical of necessity, since machines must perform logical operations, and not creative interpretations. It follows that data used in machine operations are machine-language translations of real-world phenomena, expressed in a data model designed for efficient processing. It should not be surprising then that as information systems increasingly make decisions that affect people and communities, their operations are in a very direct sense an extension of the messy human world. This has resulted in information systems that reflect human racism, sexism, and many otherisms, with real-world harm to individuals and communities. But given the black-box nature of “machine learning” algorithms, how do we know what happens inside the black box? How can we document machine bias so as to design algorithms that don’t perpetuate social harms?

Annotation – A Bundle of Open Source Resources for Social Media Data Mining and Analysis

We've reached the final annotation in our series on "Social Media Data Collection, Processing, and Use in Research, Marketing, and Political Communication." Toward the end of the project my research drifted from traditional academic sources to investigative journalism. We now veer further off-track into blog posts and GitHub repos. Some videos and a course syllabus on Data Science for Social Systems. Tools, documentation, and related sources that don't fit neatly into any particular box. This isn't so much an annotation as a grab bag of annotated links. I apologize in advance.

Annotation: Mining the Social Web

Chart of Connections among the Twitter users

While mining the Information Science Virtual Library for academic papers on "social media" and "data mining," I came across Matthew Russell's O'Reilly book Mining the Social Web: Data Mining Facebook, Twitter, Linkedin, Google+, Github, and More. The 2nd edition was published in October 2013, with a 3rd edition scheduled for publication next month. Because the book covers the specific techniques I'm after concerning data mining and analysis of social media, I decided to pull the trigger and buy the book right now.

The book is basically a tutorial on data mining social media sites using Python. Alas all the source code it references is in Python 2.7 and I've been working with version 3.6, but that's fine. It also covers using IPython Notebooks and even begins with a guide to setting things up on a virtual server. I'll probably wait to actually do that until I see what's new in the 3rd edition. But the book definitely makes the final cut for my annotated bibliography. With that as a given, I thought it would be useful to get started with the first annotation.