NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Thursday, 23-Mar-2023 22:32:57 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

For this exercise, we will work with a student-coded collection of XML files from the Disney Songs project. Our collection represents work by a DIGIT student project team form 2021, and contains 93 XML files. Please copy your queries into a text or markdown file in response to this exercise so you can submit them on Canvas.

If you need to work with this on your local computer outside of our newtfire eXist-dB, the collection is stored in our textAnalysis-Hub at Class-Examples >> XQuery >> disneySongs. Please copy the directory to your own space (such as your own GitHub repo), and write your queries over it from oXygen. You can also work with your own local installation of eXist-dB: check with Dr. B about how to set this up and upload your own file directories.

You can access newtfire's eXist-dB directly by logging in to to query the files. (Please alert Dr. B on Slack if you have trouble opening eXide or accessing the database.)

On newtFire's eXist, the Disney Songs collection is stored in this filepath '/db/disneySongs/', so we can access it with:

collection('/db/disneySongs/')

The song files contain metadata about song titles and movie origins. Here is a short sample file:

  <xml>
    <metadata>
        <title> One Jump Ahead (Reprise 2)</title>
        <origin>Lyrics from <movie>Aladdin (Live-Action)</movie>
        </origin>
        <author>Music by <composer ref="#Menken">Alan Menken</composer> Lyrics by <lyricist ref="#Pasek">Benj Pasek</lyricist> and <lyricist ref="#Paul">Justin Paul</lyricist>
        </author>
        <perform>Performed by <voiceActor ref="#Massoud" role="#Aladdin">Mena Massoud</voiceActor> as Aladdin</perform>
    </metadata>

    <song>
        <lg n="1">
            <ln n="1"> Riffraff! Street rat! Would they think that</ln>
            <ln n="2">If they look much closer</ln>
            <ln n="3">Still, I can't play a prince here</ln>
            <ln n="4">No, siree</ln>
        </lg>

        <lg n="2">
            <ln n="1">Gotta tell the truth</ln>
            <ln n="2">I can't pretend</ln>
            <ln n="3">Even if it means this dream will end</ln>
            <ln n="4">Even if she walks away from me</ln>
        </lg>
    </song>
</xml>

Let us begin by querying the database to return information about each individual song:

         let $disneySongs := collection('/db/disneySongs/')/*
         for $d in $disneySongs
         

(If you were to return $d at this point, you would return each of the 93 XML files in this collection.)

Let's start by building up a FLWOR that looks for information about each song:

  1. Working with $d, write and return variables to retrieve <title> element of each song. Record your FLWOR statement in your homework file.
  2. Write and return a variable for the count of lines in each song. (Use the count() function.)
  3. What if you wanted to measure the length of the songs a little differently, based on literally how much text they have? Let’s try out the string-length() function. Remember how string() looks down the tree from any element to return all the text() nodes inside? string-length() does the same thing, but returns a numerical count of the text characters (letters and punctuation marks). Try making a variable that captures the string-length() of the entire <song> element.
  4. Write an order by statement to organize your results in descending order, from highest to lowest string-length.
  5. Now, build up your return to concatenate the info we retrieved: bundle together each song title with its line count and its string length. (We can use the XPath concat() function, or its convenient shorthand, like this (if you were using these variable names: ($title || ' string separator ' || $length) Make it more readable by returning strings instead of element nodes for those titles.
  6. Okay, what if you want to return only the song information about the longest or the shortest song in the list? For this we will need to write some new variables to use the max() function, and the min() function. Write variables to store and return the max() and min() string-length values.
  7. You can set up yet another for loop to to capture the title and other information of each song to return using concat(). This time, try writing a where statement to return results only where the string-length of a song equals the maximum string-length. Can you limit your return to one line holding the longest song’s information? And try doing the same with the shortest song’s information.

When you have completed the assignment, copy and paste your expressions into a text or markdown file. (If you have been working in oXygen, you may save your XQuery file as .xquery or .xql to submit as well.) Upload your file containing your XPath and XQuery expressions to Canvas.