NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Sunday, 27-Feb-2022 02:29:10 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

For our first XQuery exercise we’ll be working with a special collection of Shakespeare’s plays coded in TEI that are part of our eXist XML database. Because the XML elements in this collection are coded in the TEI namespace, we need to begin by declaring that TEI is our default element namespace (otherwise we will be unable to access the element nodes in the collection). Open eXide, and a new XQuery window, and paste in the following line, all the way to the semicolon, to establish that we are working in the TEI namespace:

declare default element namespace "http://www.tei-c.org/ns/1.0";

You can then access this collection:

collection('/db/apps/shakespeare/data/')

If your connection to our eXist-dB doesn’t work, you can work with the same collection stored in our textAnalysis-Hub under Class-Examples > XQuery > shakespeare > data. You should copy this directory to your own space on your computer and you may either load into a local installation of eXist-dB or work on this in oXygen as a file directory. If working in oXygen, open a new XQuery document to write your expressions and save somewhere outside the data directory: you will need to determine how to address the collection as a local file path on your computer, relative to where you save your XQuery document. Work in the XQuery Debugger view: toggle the tiny XQuery button in the top right-hand corner of oXygen. Save your homework file with the .xquery extension.

As you work on this it will help you to refer to our XQuery tutorial page to look up how to access files in a collection and see examples of queries. Write XQuery expressions for each of the following tasks using the eXide window in our eXist database, and test them by hitting the Eval button. Then paste your XQuery expressions into a markdown or text file, adding comments as needed. You will be submitting your markdown or text file to Canvas.

  1. Find all of the main titles of each of the 42 Shakespeare plays in the collection, by stepping down the descendant axis from the collection. You will need to look at the TEI code of the collection first to see where the main titles are (hint: the play’s main title is coded near the top of the file in a special element called the titleStmt). The simplest answer is a single XPath expression starting with the collection function and descending to the nodes you want. The output should look something like:
    1
    <title xmlns="http://www.tei-c.org/ns/1.0">Love's Labour's Lost</title>
    2
    <title xmlns="http://www.tei-c.org/ns/1.0">Macbeth</title>
    3
    <title xmlns="http://www.tei-c.org/ns/1.0">A Lover's Complaint</title>
    4
    <title xmlns="http://www.tei-c.org/ns/1.0">Pericles, Prince of Tyre</title>
    5
    <title xmlns="http://www.tei-c.org/ns/1.0">Cymbeline</title>
    6
    <title xmlns="http://www.tei-c.org/ns/1.0">Romeo and Juliet</title>
    7
    <title xmlns="http://www.tei-c.org/ns/1.0">All's Well That Ends Well</title>
    ...
                
  2. Modify your XPath above to return just the text of the titles, without the tags. You can do that by using text() or data() or string() . Your output should look something like:
    1
    Love's Labour's Lost
    2
    Macbeth
    3
    A Lover's Complaint
    4
    Pericles, Prince of Tyre
    5
    Cymbeline
    6
    Romeo and Juliet
    7
    All's Well That Ends Well
                
  3. Write an XPath expression that isolates the root element TEI of each play. Notice how you can page through the results using the arrows on top of the return window in eXide. We want to be able to isolate specific plays with interesting features, and to do that we will write filters on the root elements of each one.
  4. Speeches are coded in the Shakespeare plays like this:
    <sp who="ID"><speaker>Name</speaker> text of the speech</sp>
    Write an expression using a predicate [ ] on the TEI element to help you locate four plays that hold a speaker named Falstaff. Which plays are they? Record your XPath expression. Writing the predicate requires that you specify how to look down the tree from the TEI element, and not from top of the collection. Hint: This involves using a dot . or using the name of the descendant:: axis. (Notice the difference when you do not specify these things.)
  5. Modify your expression to return only the main titles of those four plays, and record your expression. Notice where the title elements are recorded in the document. Hint: You will need to modify our expression to return not the whole TEI elements, but to step down the tree and return the element holding the main title of the play. (Note: In an XPath expression, you can step down the tree after your predicate filter on the TEI element.)
  6. Describe what changes in your results if you add text() or string() to your previous statement. (One of these, text() reaches a node in the XML tree. The other, string(), works as an XPath function to pull all descendant text nodes from within an element.)
  7. Falstaff is one of the characters who turns up in multiple Shakespeare plays. So how often does Falstaff speak in this whole Shakespeare collection? Write a new XPath expression to return all of the speeches spoken by Falstaff. Then modify your expression to use the count() function and return only the numerical count.
  8. XQuery FLWOR Statement or XPath expression?: Did you write your XQuery for finding all of Falstaff's speeches with a long XPath expression (from left to right, starting from the collection())? Or did you write it up as a FLWOR statement in multiple lines, storing information in variables and referring to them? (Review our tutorial for details and examples on writing FLWOR statements using variables.) Whichever way you chose to write your XQuery in the previous steps, try the other way and see if you can duplicate your results. Record your XPath / XQuery expressions in your text file.
  9. Optional challenge: Our last step, which we will explore together in class next time, is to return information about how often Falstaff speaks in each play. For this we will learn to write a FLWOR statement using a For loop that looks inside each play, finds Falstaff's speeches, counts them, and returns them along with the title. (Feel free to give this a try yourself ahead of time here, though it is not required for homework.)

When you have completed the assignment copy and paste your expressions into a text file. Upload your text file containing your XPath and XQuery expressions to Canvas.