NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Friday, 05-Feb-2021 04:38:28 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at Powered by firebellies.

The Text:

For this assignment, work with the plain-text screenplay of 1998 version of Mulan. Download the file, open it in <oXygen/>. For our purposes, keep everything in this document. There is some source information near that top that you should keep, and just plan to revise its tagging by hand at the end of the autotagging process.

Your Task:

Your goal is to use Find/Replace operations to prepare an XML-encoded digital edition of the play. This time, the specific markup tags you use are up to you, but we expect to see specific structural distinctions marked in your XML. These include the following:

Your goal is to use Find and Replace operations using regular expression patterns to create descriptive (rather than presentational) XML markup. Your task is to make the XML that identifies, holds, and nests the structural units of the screenplay so that them markup distinguishes speaking parts, speakers, and stage directions. You should not use manual tagging except in situations that occur so rarely that there is really no point in using an autotagging solution. (For example, you do not need to use an autotagging strategy to tag the title of the whole screenplay or to create a root element for your XML: just do that manually.)

Here is our output XML file as a suggested model for your XML output: mulan1998.xml Your output can be different from ours, as long as you find a way to mark off stage directions, speeches, and speakers. You will find stage directions are outside of speeches, and also inside them. Just find a way to mark them consistently whether they appear inside or outside speeches. Speeches themselves always contain some indication of their speakers, but these may span across lines.


Consult our Guide to Autotagging with Regular Expressions and notes from class on regular expressions as you work. The TEI provides some helpful guidelines for tagging the XML structural units of plays.