The series ran daily on the front page of The Chicago Times in 1888. Helen Cusack-Carvalho posed as "Nell Nelson", a poor female working girl and reported in-depth on the living conditions, wages, and work environments of industrialized 19th-century Chicago. The early publication of the series makes "Nelson" a pioneer of investigative reporting with ground-breaking work exposing the mistreatment of women and children.
The success of her series is a bewilderment considering the inversion of social norms represented in the writing of her exposé. She questioned the practices of the most prestigious men of Chicago through her satirical and witty writing and beat the adversity of being a woman in the 19th-century since the publication of the series gained attention from Chicago's political representatives and the owners of the companies scrutinized.
After the initial publication of the series, two books were printed within the same year recognizing "Nelson" and her investigative work. In addition to the two books, Nelson was hired by The World to reproduce a similar series, in New York. Although the significance of the series was recognized at the time of publication, Nelson's role in exposing the hardships of the "City Slave Girls" lacks any proper representation in modern history. The series should be a monumental resource for women's history, labor history, and the history of undercover journalism.
The biographical information on Helen Cusack-Carvalho is illusive; currently, anything close to a complete biography of her is nonexistent. The primary source of biographical information comes from a journal article by Eric W. Liguori published in 2012. In this article, Liguori writes that Helen's career as a journalist allowed her to maintain her middle-class status she was born into. Liguori reports that Helen married above her social class, to a man named Solomon Solis Carvalho, in 1895. After marriage, Helen's journalism career ceased to exist, and she lived the remainder of her life raising their two daughters. According to an obituary in a 1945 New York Times, Helen died in New Jersey. The obituary lacks any special recognition of her investigative series, despite its influential effects on the labor reforms of the late 19th-century.
She is the project manager and senior editor. She manages the project's site development, GitHub page, CodeBook, and team. She also contributes to research development, data visualizations, transcriptions, and encoding.
Since joining the team in Spring 2016, she leads project transcriptions and GIS research. She also contributes to the encoding of transcriptions.
As co-editor of the project, he initialized the parts-of-speech analysis. He contributes to computational analysis, proof-reading the encoded transcriptions, and the creation of data visualizations.
Dr. Elisa Beshero-Bondar, Dr. David Birnbaum, Prof. Greg Bondar, The Pittsburgh Supercomputing Center and our Pitt-Greensburg student contributors: Shane Daube, Brooke Stewart, Alexander Mielnicki, Kari Womack, and Cody Karch.
Since Rebecca's research showed that the series and Nelson lacked a comprehensive digital presence, she proposed to Dr. Beshero-Bondar's Fall 2014 Digital Humanities course a digital project restoring Nelson and her series. Shane Daube, a fellow classmate, agreed to join her project. The project's first tasks were to transcribe and OCR the three publications of the series and develop a research question using a XML tag set to analyze the series.
The analysis of the texts caused the team to question: How did the book publications provide a variation to Nelson's clearly disgusted image portrayed in the original Chicago Times publication? Shane Daube, also showed interest in analyzing the connotation of the three main character types within the series: the male personae, the female personae, and Nelson. He wanted to compare (using the minimally transcribed data in development) the three different voices and the variation of voice connotations within the original articles versus the article parts in The White Slaves of Free America text. Using XQuery and SVG Shane developed a graph visualization that provided interesting conclusions in regard to the alterations made between versions. He provided variation data that otherwise would only have been derived and made obvious after a close reading of both sources.
After the first round of transcribing, markup, and research visualizations, complications with the subjectivity of the research became more apparhent. In spring 2015 semester, Rebecca also started to think about ways of making the project last and concluded it was a necessity to alter the markup to conform with the TEI guidelines so that the project could talk to other long-lasting projects.
In the 2015 fall semester, the major focus was overhauling the markup of the articles. We worked to establish a system consistent with the TEI that was as objective as possible. When evaluating our code, we considered three major ideas: the quantitative measure of dialogue dependent on gender and speaker, the relationship between possessive adjectives and their corresponding nouns, and Nell's distinct pool of vocabulary within her articles. We also changed the CSS, added SSI's, and established a schematron. Along with these changes, we found a second home at Newtfire, within the Pittsburgh Supercomputing Center.
Our first step was bringing our system of coding in line with the Text Encoding Initiative (TEI). This was a major task; in working with the prior tagging system, we found that nearly every element needed to be replaced or renamed. On top of that, the initial markup was inherently subjective by design. The prior code was intended to mark up points of connotation (such as sexual, sarcastic, positive, or negative). As a group, we discovered significant inconsistency in this interpretation of text from coder to coder. For example, what one person considered sarcastic, another considered sexual. This also posed the question of how much of the text needed to be wrapped. If a whole paragraph is sarcastic, should all of the sentences be wrapped? What if just one word seemed sarcastic? Our initial concept was to map these inconsistencies between coders. We thought that this, too, would cause an issue with time, forcing every indiviual within the project to read and mark up each article. After spending time bringing this inconsistent set of tags into the TEI, we made the ultimate decision to change the focus of our system. In doing so, we removed all tags representing connotations.
The elements that we did keep were mostly structural ones and also others focusing on dialogue: p (representing paragraphs), said @who @ana (representing quoted speech; @who represents the speaker; @ana represents the gender of the speaker), rs @type=interruption (representing narration between a single quote), placeName @type (for any reference to a place; @type including address, locRef, country, state, or city), persName (for names), and orgName @ref (for references to companies that Nell was exposing; @ref being a reference for the exposed companies). For all of the different people and speakers within the articles, we created a list of 'archetypes': nelNelson, workingGirl, foreperson, employer, employee, benefactor, messenger, and unidentified. We also created TEI headers for all articles, which includes bibliographical information, attribution for transcription and coding, and more.
Our next goal was to consider a more objective way to mark up Nelson's articles. Thanks to a suggestion from Prof. Greg Bondar, we began marking up grammatical elements. We decided to begin with both nouns and adjectives. This alone proved to be an immensely time-consuming task. This also proved to be quite complicated as well; each team member had to be instructed on the parts of speech. At times, nouns can act as adjectives (clothing factory). Other times, while we decided not to tag pronouns (he, she, it, etc.), we decided to tag possessive pronouns (my purse, his watch, their employees) and included an attribute to hold who or what it was referring to (using our set of 'archetypes'). This is the tagging system: w @type @subtype @ana. 'w' represents both nouns and adjectives, defined by @type. @ana was used for the stem of the word for an adjective (laziest: type='adj' ana='lazy'), or it would be used as the 'archetype' of a noun if applicable (seamstress type='noun' ana='#workingGirl'). @subtype was used for adjectives that were possessive, and the @ana would then be the 'archetype' (his type='adj' subtype='poss' ana='#employer'). In order to keep adjectives tied with the nouns that they were modifying, the words were wrapped in element 'seg'.
However, even this system required us to make some concessions with our system. The most prominent issue is the use of our 'seg' element to bind nouns and adjectives. One adjective is fine (lazy worker). Nouns with two or more adjectives also work with this system, as long as all of the adjectives are modifying the same noun (dumb lazy incompetent worker). Things become complicated when a noun is being modified by an adjective that is being modified by an adjective. Here is an example of a considerably complex issue: 'pine board three feet long and sixteen inches wide'. Here, the base noun is board. Pine is a simple adjective modifying board. The issue comes when we consider the other modifiers: three feet long is mean to define the length of the board, but it is not easy to convey this through tags. 'Three' modifies 'feet', which modifies 'long', which modifies 'board'. 'Long' can be marked as an adjective that modifies 'board', but that implies that the board itself is long (which is not the correct implication). Labeling 'three' as an adjective is correct, but with our system, it would seem that it is instead modifying 'board'. Ultimately, because we did not want to become tangled in 'seg' elements, the decision was made to 'ignore' these instances (three feet long @type='adj' @ana='long'). While still disingenuous, with the precise measurement and unit simply ignored, the idea that the property of length is discussed remains intact.
This example also brings up the issue of units as adjectives: times, dates, ages, numerals, and prices. We created a system of rules to attribute for each case. For numerals of quantity or age, the @ana would be the number represented as a digit (thirty five @type='adj' @ana='35'). Dates, such as July 13th or thirteenth, are ignored. Price and monetary values are defined as such: for either $5 or or $5.00 or five dollars: the number is represented as the attribute, like quantity. The noun is either the word or symbol for dollar. Decimal values are alllowed only if the original number inclues them ($1.23 @ana='1.23'; $5.60 @ana='5.6'). The number is never converted to another value and is dependent on the unit that Nellson uses within the article; for example, cents should never be converted into dollars by the coder (63 cents is represented differently than $0.63). For time, instances like 6:00 and 6 o'clock are ignored; however, 6 hours past noon would be represented as such: the 'seg' around 6 hours, with 6 as the adjective and hours as the noun. Noon is tagged outside of that 'seg', as its own individual 'w' element.
With these rules and tagging system established, we created two more sets of data visualization. The first was a list, pulling all of the adjectives inside the entire collection of articles and outputting a count for all the adjectives that appear more than twice. This is to display visually the interesting choice of language that Nelson makes with her descriptions. The next set of data visualization is a table representing possessive nouns and pronouns and the nouns that are associated with them for each individual article. Using our system of 'archetypes', we can see who or what these nouns and adjectives are being associated with.
We further honed our tagging system and completed a formal Codebook. For grammatical markup, we decided to focus on possessive nouns and adjectives and their associated nouns. In doings so, we removed tagging for verbs and other nouns and adjectives. We also created a system for versioning; a system of tagging that controls where parts of the article were amended between publications. For more information, visit the codebook (linked above). New data visualization were created using our grammatical markup. By feeding nouns and possessive adjectives/nouns into Cytoscape, directional network analysis graphs were produced to show possessive relationships. A basic reading view was also created for the Barkley publication.
Then in the Summer of 2016 the project was accepted to the ILiADS conference. At this conference the team re-developed the website using bootstrap and the suggestions of conference experts. Progress to digitally reproduce a map of the locations Nelson visits in the articles was also made.
As this project grows, the editors would like the opportunity to work with the originals of the newspaper articles. Before completion of the archived material, there is hope to re-photograph the articles and render better images of them to display on this site alongside the transcriptions. Work left to be done includes: finishing transcriptions and coding of all three sources, creating a map that provides a visualization of all the places referenced in the articles, and generationg new graphs/visualizations that further analyze the text.