NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Monday, 31-Oct-2022 21:15:33 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

The Fall 2020 DIGIT 400 James Bond project team has prepared XML for the screenplay Goldeneye, which you can access by right-clicking on the file and downloading it from here: Goldeneye.xml. Open the file in oXygen and work with the XPath Window set to version 3.1. Respond to the XPath questions below in a text or markdown file, and upload to Canvas for this assignment when you’re finished. (Please use an attachment! If you paste your answer into the text box, canvas may munch the code formatting.) Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. Sometimes doing that will help you figure out what’s wrong, and even when it doesn’t, it will help us identify the difficult moments.

You should consult The XPath Functions We Use Most page and especially its section 4 on Strings. As always, consult our class notes and our introductory guide Follow the XPath!. Be sure to give the XPath expression you used in your answer, and don’t just report your results. This way, if the answer is incorrect, we can help explain what went wrong.


First of all, skim through the document to get a sense of how it is coded. Write some XPath to see if you can write XPath expressions to find all the scenes, stage directions, speeches, and speakers just to warm up and familiarize yourself with the file.

  1. Let’s start by exploring the sd elements. These contain the stage directions.
    1. Write an XPath to return a count of all the sd elements in the document.
    2. What XPath expression returns all the stage directions that contain the word (or partial word) "Russian"? How many are there?
    3. There is usually a pretty important stage direction after a scene change. Each scene is contained in a Scene element, and in each Scene the first element child is a Heading element. How can you reliably find the first stage direction immediately following that Heading element? (Hints: Take this in stages: First look for all of the Heading elements. Notice how the first sd element is positioned in relation to a Heading element: they are children of the same parent. Our solution uses the following-sibling:: axis and a numerical position predicate to indicate the first in a sequence.)
    4. Of these these stage directions that come immediately following Heading elements, we are interested in the ones that feature computers in the scene. How can you find out which ones contain the string "computer"? (Hint: add a predicate).
    5. Some unusual scenes in the Goldeneye script contain no stage directions at all. Write an XPath expression to isolate any Scene elements without sd elements inside. How many of these scenes are there? (Hint: use a predicate with the not() function.)
  2. This set of questions explores what you can find out with the XPath string() function, which pulls text strings out of XML nodes, and the string-length() function, which measures the number of text characters in the XML node that you visit.
    1. This time, let’s work with the speeches in the screenplay, coded in sp elements. Write an XPath to locate all of the speeches (and notice how they are coded with a spk element inside). Now, use the simple map ! operator to apply the string() function to each sp element one by one. How is this return with string() different from just returning the sp elements? (Respond with your XPath expression, and a brief explanation of what you are seeing in the return window: How did the string() function change your results?)
    2. Change the previous XPath expression to remove the string() function, and instead, step to the text() node child of sp. How does this change the results in the return window? (Note: text() is a node in the XML tree, so this is not a function, but a path step from parent to child. Tecnically, text() is a child of the parent element.)
    3. Now that we have isolated the speeches in the screenplay, write an XPath expression that returns their string-length(). What does this return?
    4. Send those results to the max() function to find out the longest length of a speech in the Goldeneye script.
    5. The string-length() and max() functions took us off the XML tree to yield calculated results. How can we write XPath to return the XML element sp that has the maximum string-length()? Hint: Try searching for sp elements with a predicate that checks to see if the string-length() is equal to the maximum string-length you found in the previous step.
  3. Now we will turn our attention to the spk elements, to return information about the speakers.
    1. Notice how spk elements are nested as children inside the sp elements. Write an XPath expression to return all the speakers who deliver speeches that contain the word "Iraq". (Hint: Try breaking this down: first return all of the speeches that contain "Iraq" and then take a step to return the spk element.
    2. All the spk elements are entered in block caps. Use the XPath lower-case() function to return all the spk elements lower-cased instead and record your expression. Hint: For this special function, you will need to refer to the self:: node using the dot like this: lower-case(.)
    3. We don’t really want to make the speakers names all lower-case. We just want to lower-case the letters after the first one, to change BOND to Bond. We can do that kind of string-surgery in XPath by working with substrings. Consult this page to learn about the XPath substring() function and see how to write it out. Now, see if you can apply the substring() function to isolate the 2nd letter onward in the spk elements. Then, lower-case() that substring!
    4. Now, if you could apply the substring() to isolate letters 2 to the end, you should be able to change it to return only the very first letter. (This time, we do not want to apply the lower-case function, because we want to preserve the upper case of the first letter.) Try it and record your expression.
    5. One last challenge. If we can isolate part of the speakers' names to lower-case the 2nd letter to the end, we should be able to connect the first (capital) letter to the rest of the lower-cased letters. For this we want to use the XPath concat() function, and there is a convenient shorthand for it in XPath 3.1 which sets two vertical bars || between the expressions you want to connect. However, we need to be careful because concatenation requires joining exactly one thing to exactly one other thing. (XPath can't figure out on its own how to concat (or tie together) the whole sequence of substrings of the first letter to the whole sequence of the substrings of the rest.) To help XPath to work one at a time over sequences of spk substrings, look up the for $i in (sequence) return ... XPath sequence. (This is a for-loop in XPath, and $i is known as a range variable that isolates each member of the series, one by one.) With the for-loop, you can go one step at a time through the series of //spk nodes and return a concatenation of the substring functions you figured out, using $i as the first argument of your substring functions. See if you can work out how to write this XPath.