Introduction to Feminism Data Science

This essay is based out of Santa Fe Institute’s Complexity Explorer’s Course offering on Humanities Analytics.

The following are abridged and/shorten summaries and notes of a lecture by Lauren F. Klein (Associate Professor of English and Quantitative Theory and Methods at Emory University) based on her book “Data Feminism” coauthored with Catherine D’Ignazio (Assistant Professor of Urban Science and Planning, MIT)

Get the book here (click on the pic):

Data Feminism (Strong Ideas)

The real POWER in the 21st century is DATA (and information): from deciding on granting or not granting bail to any accused (read ProPublica’s assessment of Machine Bias pre-trial risk algorithm), data-driven algorithms flagging parents suspected of child abuse often unfairly leading to grave consequences (read Virginia Eubanks on how such systems fail poor families), a more recent fiasco on using data-based algorithm to predict A-level scores in UK due to exams getting cancelled due to Covid (read Guardian’s A-Level Result fiasco), data-driven systems are implemented in rather crucial stages of courts, governance, administration, education amongst others often with serious consequences for the parties involved).

However this power is unilaterally and inequitably concentrated in a very small well-resourced institutions of the global elite (mostly white men). This is where INTERSECTIONAL FEMINISM can help. Intersectional Feminism is concerned with the structures of power, and when applied to data sciences can challenge this asymmetric concentration of power.

Operationalizing (or quantifying) attempts to merge Feminist ideologies with Data Science with an intention to provide a clear set of principles working in the sector or even if not, resulted in the following SEVEN PRINCIPLES OF DATA FEMINISM which can be used as a defining set of guidelines:

  1. Examine Power
  2. Challenge Power
  3. Rethink binaries and hierarchies
  4. Elevate emotion and embodiment
  5. Embrace pluralism
  6. Consider context
  7. Make labor visible

Invisible Labor as a feminist concept

This field of feminist labor studies draws its roots from the most common example invisible labor i.e. household. In a capitalist society, household work is
a) invisible (takes place inside homes and away from purview of labor markets)
b) deemed unproductive (as it doesn’t contribute to revenue generation).
However a new broader term of Reproductive Labor should be used while addressing housework and childcare, as it actually aides and allows the productive faction of the economy to thrive.

Principle: Make Labor Visible

The Abolitionist Movement (to end slavery) is a notable struggle of the American and World History. This section deals with the internal dynamics of the movement. The movement saw a coalition of multi-racial men and women in an otherwise fairly segmented society. Everyone was in agreement with the end (to end slavery) but the means to this were highly debated: whether a radical approach was favorable over a gradual one, whether it was wise to move to the centre to gather mobilization or was there no room for moderation.

The debates documented through newspapers of the era which often mimicked each other to show agreement, resemble a lot of the contemporary debates of today like the need for a police, the movement for racial equality amongst others. The digitized versions of newspapers document the abolitionist movement for the aid of current scholars.
One of the important things to ask about is what exactly are these newspapers talking about, what are the thematic and statistical differences in their approach with respect to the different editors or authors and the intended audiences.
The following paper by Lauren F. Klein borrows ideas from topic modelling and information theory to explain this: Dimensions of Scale.

Modelling Invisible Editorial Labor

Analysing the works of female editors Lydia Maria Child (1802-1880) and Mary Ann Shad (1823-1893) using pointwise mutual information (PMI) to identify uniquely significant topics for each editor, drawing out significance of each topic for the abolitionist movement and analyse the burdens of labor intersectionally i.e. how the larger institutions of power exacted different costs from both editors based on their racial background, Child being a white and Shad being a black. White abolitionist also had better access to resources like print media while the black community mainly depended on oral propagation of ideas, an example of which is the Color Conventions that used to take place. This led to a biased data set being available to today’s contemporary scholars researching on the abolitionist movement, and hence a lot of research is being directed towards analysing what is within these datasets rather than what is not.

Significant topics while Child was editor (Screenshots taken exactly as is from lecture video)
Image: Significant topics while Shad was editor (Screenshots taken exactly as is from lecture video)

A view into this unbalanced archive is provided in this paper Abolitionist Networks which develops a model of linguistic and semantic change and leadership, building on a semantic leadership score system to see which newspapers tended to be leaders in semantic leadership and which newspapers tended to be followers. The paper tends to contribute on these fronts:
HUMANISTIC
a) leadership dynamics of the movement
b) evidences from the biased dataset
TECHNICAL
a) model of semantic change
b) measure of semantic leadership
c) network analysis of leader-follower pairs.

Modelling Semantic Change

In studies to model semantic change, machine algorithms are trained to analyse certain keywords all from contextual, source and time (or period) perspective and map its evolution on the the usage front. Following screenshot from the lecture highlights modelling semantic change for justice and tracing its evolution on the grounds of context, source and time.

Measuring Semantic Leadership

Word embedding (where words and/or phrases are mapped onto vectors of real numbers, to correlate the similarities between the vectors with the words’ semantic similarity) and find out how a base embedding predicts a new embedding and the new embedding subsequently replaces the older (which is the case for Leadership).

Image: Measuring semantic leadership (Screenshot taken as is from lecture video)

Source for cover image: https://datafeminism.io/blog/book/data-feminism-reading-group/