NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Sunday, 27-Feb-2022 02:30:06 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

We continue working with our collection of Shakespeare’s plays stored in our eXist XML database. Record your responses to the following questions in a text or markdown file. In eXide or oXygen, keep your top line the same to declare that TEI is the defaulte element namespace and to access the collection of files. (Here is the code you need at the top in order for XQuery to read the collection):

declare default element namespace "http://www.tei-c.org/ns/1.0";

In our first XQuery exercise we isolated plays with Falstaff as the speaker. We're going to go back to querying the entire collection now instead of just that subset.

  1. For this exercise, let’s see if we can find seven special plays that contain a count of more than 50 unique (distinct) speakers!
    1. Start a FLWOR by defining a variable that points to the whole Shakespeare collection. If you are working in eXist-dB's eXide, it looks like this: (Adapt your path in collection to your local file system as shown in class if you are running this in oXygen.)
      let $plays := collection('/db/apps/shakespeare/data/')//TEI
    2. Now, define a variable using let that will find all of the speakers in the plays and return it to make sure you are seeing <speaker> elements. How many results do you see? (Try sending your sequence to the count() function with => count() and return the number.)
    3. Take a look at what the distinct-values() function does. Write a variable that sends the sequence of speaker elements to the distinct-values() function. Then return it and look at the results: Distinct values removes duplicates from the list: From what you can see, did this function do a perfect job? Why or why not?)
    4. Now write a variable to return a count() on those distinct-values(). How many are there?
    5. Okay: now that you have seen how these functions work, let’s apply them to answer our main question: Remember, we want to find the plays that contain a count of more than 50 distinct speakers. For this we need to
      • Look inside each play one by one using a for loop in our FLWOR, like this:
        for $p in $plays
      • Now, we work with the $p variable: (This little variable made in the for statement is known as a range variable and it is working on one play at a time.) First, make a new variable that catches the main title of each $p (you will need this for your output!) Use return to make sure your new variable is working.
      • Continue working with $p: Return each play’s distinct-values of speakers. Then return the count() of those distinct-values. When you return you should see 43 different results in eXide.
      • Use a where statement (the W in the FLWOR), to ask for a count greater than 50.
      • Experiment with returning the count, then return the title. You should now be seeing just 7 results.
    1. Modify your solution to the preceding question to return just the text of the seven play titles, without the <title> tags. (You can take the same approach that you did in the previous homework exercise.)
    2. When retrieving a single file from a collection, the base-uri() function can be useful. Try appending base-uri() to your XQuery expression and run it: What result do you see in the output window, and what is it telling you?
    3. We can bundle (or concatenate) our results together using the concat() function, as shown in class. Write your XQuery to concatenate the count with the title and its filepath.
    4. Optional Challenge: What if we wanted only to return the file name with its file extension after the last forward slash (/) in the preceding results of base-uri()? How could we remove the previous string of text in our output? We would use the tokenize() function (which you can look up on at the w3schools list of XPath functions or in the Michael Kay book). That function breaks apart a string of text by dividing it at a particular regex pattern, and in this case the pattern is the forward slash. The tokenize() function returns tokens or broken-off pieces of a string: each chunk before and after the regex you enter. In order to isolate just the piece we want, we can identify the pieces by their position in the sequence of broken pieces: is it the first token, the second, the third, or the last one, whatever it is? To retrieve the first token, after you run the tokenize function, you can place a predicate holding the position value: [1], [2], etc. To retrieve the last item in a series, without knowing its numerical position, you can use the last() function (which you can read about in the same resources we mentioned above or in The XPath functions we use the most).Note: nothing goes inside the parentheses in last(). With this information, how would you write your XQuery to return just the last part of the results of the base-uri() function, the part that appears after the last forward slash character? (Concatenate your results as before, only this time with your trimmed reference to just the filename.)

When you have completed the assignment copy and paste your expressions into a text file. Upload your text file containing your XPath and XQuery expressions to Canvas.