Critical data modeling and the basic representation model – Annotation & Notes

Reference: 

Wickett, Karen M. n.d. “Critical Data Modeling and the Basic Representation Model.” Journal of the Association for Information Science and Technology n/a (n/a). Accessed August 22, 2023. https://doi.org/10.1002/asi.24745.

Data models are foundational to information processing, and in the digital world they stand in for the real world. When machines are used to make algorithmically-informed decisions, their algorithms are informed by the data models they use. And the data structured by data models is numerical of necessity, since machines must perform logical operations, and not creative interpretations. It follows that data used in machine operations are machine-readable translations of real-world phenomena, expressed in a data model designed for efficient processing. 

But humans are involved in every step of the design of data systems, and every design decision is shaped by human assumptions. Human judgements are made about which data is relevant to the envisioned machine operation, which data should not be included, and how the data is represented. As Karen Wickett puts it:

“Within an information system, multiple levels of representation and encoding provide a path between our messy real world and the binary logic of computing. Information system design decisions may appear neutral and disconnected from the real world, especially when viewed in isolation from the wider sociotechnical context where they will play out for people and communities. However, these decisions are made by humans who are actively creating a lens on the world that structures the information within a system. The resulting system actions and data objects are therefore shaped by the worldviews of information system builders” (p.1). 

It should not be surprising then that as information systems increasingly make decisions that affect people and communities, their operations are in a very direct sense an extension of the messy human world. This has resulted in information systems that reflect human racism, sexism, and many otherisms, with real-world harm to individuals and communities. 

We know this from observing the many documented examples of such harm [could insert many examples or citations here]. But given the black-box nature of “machine learning” algorithms, how do we know what happens inside the black box? How can we document machine bias so as to design algorithms that don’t perpetuate social harms? 

Karen Wickett introduces “critical data modeling” as an analytical method, and demonstrates how everyday information objects can be critically analyzed using a “basic representational model.” The model enables “technical close readings of information objects, where the lens of qualitative content analysis is applied to the layers of representation and meaning that instantiate digital objects” (p.6). Wickett then presents an example of using critical data modeling to analyze the selection of real-world events recorded as data in the Arrest Database maintained by the Los Angeles Police Department. 

The Basic Representational Model provides distinct entry points for critical analysis by defining three types of things and three types of relationships between them in “machine learning” applications. Propositional Content consists of truth values, or in other words, “propositions that may be true or false” (p.5). Propositions are semantic content, as distinct from how it may be encoded or expressed in specific forms, e.g. a spreadsheet, an XML document, JSON object, or even a narrative sentence. Wickett provides the example of a class roster comprised of names, ID numbers, grades, etc. Each element can be regarded as a “fact” (whether true or false) that may have relevance outside the system where it is expressed in whatever form.

In the case of the Arrest Database, prepositional content is a set of assertions about LAPD arrest events. The critical data modeling approach examines the selection of the recorded events. Do they consist of truth values that fairly represent what happened? Do they leave out other relevant semantic content? Wickett’s critical analysis of the data shows the answers 

Wickett’s critical data model offers several other entry points to critical analysis of data systems that increasing impact our lives, best described in the paper. It provides a methodology for making visible the hidden biases inside the black box, grounded in “the technical realities of computational systems and the social realities of our communities” (p.10).

As algorithmic systems are increasingly offered as technical solutions to messy human and social problems, it seems important to remember that the data to train the systems reflect the same messy human and social problems. If the system uses messy data to focus police attention on specific communities, as in “predictive policing,” the messiness can perpetuate historical injustice.