NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh}
Creative Commons License Last modified: Tuesday, 16-Nov-2021 23:31:13 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

The input collection on our textEncoding-Hub

For this assignment and the next, you will be working with a digitized XML collection of Warren Behrend’s last correspondence with his parents in 1929. You will need to access this collection from our textEncoding-Hub, and the goal is for you to write XSLT to process a local directory of files rather than just one at a time as we have been doing up to this point. Here is how to access the directory:

Please be careful to copy rather the move the directory out of GitHub! If you move it out of the directory, the next time you sync our textEncoding-Hub, GitHub will prompt you to commit the change and push it, which will effectively eliminate the WBLastLetters folder. I can easily put it back if that happens, but please alert me ASAP if something goes awry!

Working with a Collection of Files in XSLT

We can process a whole directory of files using the collection() function in XSLT, so we can represent content from a whole collection of XML files in one or more output HTML files. One useful application for working with a collection is to process several short XML files and unify them on a single HTML page designed to merge their content. For this assignment, we will transform the small collection of XML files and their associated images so that they output on one HTML page, which we will produce with a table of contents, followed by the full documents, formatted in HTML with numbered lines.

Since these documents are all encoded with the same structural elements, we can use the collection() function to reach into them as a group, and output their content one by one based on their XML hierarchy. Really, we are treating the collection itself as part of the hierarchy as we write our XSLT, so we move from the directory down into the document node of each file to do our XSLT processing.

Using Modal XSLT

Besides working with a collection of files, the other interesting new application in this assignment is modal XSLT, which lets you process the same nodes in your document in two different ways. How can you output the same element contents to sit as list items in a table of contents at the top of an HTML page, and also as headers positioned throughout the body of your document, below the table of contents? Wouldn’t it be handy to be able to have two completely different template rules that match exactly the same elements: one rule to output the data as list items in the table of contents, and the other to output the same data as headers? You can write two template rules that will match the same nodes (have the same value for their @match attribute), but how do you make sure that the correct template rule is handling the data in the correct place?

To permit us to write multiple template rules that process the same input nodes in different ways for different purposes, we write modal XSLT, and that is what you will be learning to write with this assignment. Modal XSLT allows you to output the same parts of the input XML document in multiple locations and treat them differently each time. That is, it lets you have two different template rules for processing the same elements or other nodes in different ways, and you use the @mode attribute to control how the elements are processed at a particular place in the transformation. Please read the explanation and view the examples in Obdurodon’s tutorial on Modal XSLT before proceeding with the assignment, so you can see where and how to set the @mode attribute and how it works to control processing.

Overview of the assignment

For this assignment you want to produce in one HTML page our collection of letters and documents, and that page needs to have a table of contents at the top. The table of contents should have one entry for each document, which produces the information we have encoded in <title> element that is a descendant of the <meta> element in our XML source code, together with the first line. Below the full table of contents you should output a new section that renders the complete text as encoded of all the documents. In the full text, you should wrap <span> elements around any markup of interest (including tagging of unclear passages, as well as persons, places, etc. Preserve some info from your source XML by outputting markup information in the HTML @class attribute. To generate the attribute value on @class, we used an Attribute Value Template, which you should review here.

You can see our output at https://newtfire.org/courses/tutorials/WBColl-1.html, but you don’t have to deal with outputting the images yet or styling unless you want to. For this assignment, just concentrate on outputting the full text and the table of contents at the top.

Housekeeping with the stylesheet template and output line: From XML to XHTML

To ensure that the output would be in the XHTML namespace, we add a default namespace declaration (in purple below). To output the required DOCTYPE declaration, we also created <xsl:output> element as the first child of our root <xsl:stylesheet> element (in green below), and we needed to include an attribute there to omit the default XML declaration because if we output it that XML line in our XHTML output, it will not produce valid HTML with the w3C and might produce quirky problems with rendering in various web browsers. So, our modified stylesheet template and xsl:output line is this, and you should copy this into your stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
         <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
    xmlns="http://www.w3.org/1999/xhtml">
    
   <xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" 
              include-content-type="no" indent="yes"/>
    
    </xsl:stylesheet>

How to begin

First of all, file locations will be important in this assignment. Save your XSLT file just inside the WBLastLetters directory that you copied to your local homework file space. The XSLT file should be sitting outside the images/ and xml/ directories.

Forget about the table of contents for the moment and concentrate now on just outputting the full text of the documents. Except for having to pull the contents from a collection of files, this is just like the XML-to-HTML transformations you have already written, and you’ll use regular template rules (without a @mode attribute) to perform the transformation.

The collection() function: Here is how we write and run XSLT to process a collection of files. Just ahead of the first template match, after the <xsl:output method> statement, we define a variable in XSLT, which simply sets up a convenient shorthand for something complicated that we need to use more than once, so we don’t have to keep retyping it.

<xsl:output method="xhtml" html-version="5" omit-xml-declaration="yes" include-content-type="no" indent="yes"/> <xsl:variable name="WBColl" select="collection('XML/?select=*.xml')"/>

An xsl:variable works by designating an @name which holds any name you like to refer to it later (we have used "WBColl" here to refer to the Warren Behrend Last Letter collection of files), and with @select it holds anything you wish: a complicated XPath expression or a function, or whatever it is that is easier to store or process in a variable rather than typing it out multiple times. We use variables to help keep our code easy to read! In this case, we are using a variable to define our collection, using the collection() function in the @select attribute. The collection() function is set to designate the directory location of the collection of poems in relation to the stylesheet I am currently writing. My XSLT is saved in the directory immediately above the XML/ directory, so I am simply instructing the XSLT parser to take a directory-path step down to it by designating WBColl inside the collection function. Definitely keep the the ?select=*.xml because it helps make sure that only XML files are included in the collection, screening out the Relax NG file and hidden files that Mac or Windows operating systems sometimes add to file directories.

We will call this variable later in the XSLT file whenever we need it, to show how we are stepping into our collection of documents. That will happen in the first template rule that matches on the root element. Open any one of the input XML files in the XML collection in <oXygen/> and you will see that the transcription contents are all coded within the <body> element, so we can write this stylesheet to look through the whole collection of files and process only the elements below <body>. You call or invoke the variable name for the collection by signalling it first with a dollar sign $, giving the variable name, and then simply step down the descendant axis straight to the <body> element in each file. Here is how the code looks to call or invoke the variable in our first template match:

<xsl:apply-templates select="$WBColl//body"/>

Note on running the transformation: Unlike other transformations we do on single XML files, when we run this XSLT in <oXygen/> it actually doesn’t matter what file we have selected in the XML input, because we have indicated in the stylesheet itself what we are processing, with the collection() function. We can even set a file that is outside of our collection as the input XML file (and we actually ran it successfully with the HTML file of the previous exercise selected). You do need to enter something in the input window, but when you work with the collection() function, your input file is just a dummy or placeholder that <oXygen/> needs to have entered so it can run your XSLT transformation.

In our HTML output (scroll down past the table of contents, to where the full text of the poems is rendered), the Poem number (and publication info in parentheses) are inside an HTML <h2> element and the stanzas of each poem are held and spaced apart using HTML <p> elements. To make each line of the poems start on a new line, we add an HTML empty <br/> ([line] break) element at the end of each line within the stanza. If you don’t include the <br/> elements, the lines will all wrap together in the browser. We have renumbered the lines in our sample output to make them consecutive from the start to end of a document by using the count() function over the <ln> elements on the preceding:: axis. (We used the preceding:: axis instead of preceding-sibling::, because we wanted to number lines by counting them consecutively within each file rather than just inside each paragraph. (You can read about the preceding:: axis in the Michael Kay book on page 612, or on our Follow the XPath Tutorial section of XPath axes.) Here’s a sample of HTML output for one of our letter documents:

        <section class="doc" id="WtoP-Dec17">
          <!-- I output the xml:id from the title as an id on an HTML section element to organize my documents on the page. -->
            <div class="text">
               <h2>Letter from Warren Behrend to Mary and Ernst Behrend, 1929-12-17</h2>
               
               
               <div class="header">
                  
                  <div class="date">1929-12-17</div>
                  
                  <div class="greeting">Dear Father & Mother,</div>
                  </div>
               
               <p>
                  <span class="lineNum">1</span>  It is raining hard and freezing on
                  <br/><span class="lineNum">2</span>  the trees; very beautiful indeed, but not
                  <br/><span class="lineNum">3</span>  so practical for people who would drive
                  <br/><span class="lineNum">4</span>  thier cars on a night like this. I
                  <br/><span class="lineNum">5</span>  hope you had nice weather so
                  <br/><span class="lineNum">6</span>  that you could enjoy driving South
                  <br/><span class="lineNum">7</span>  in the Duesenburg
                  </p>
                <!-- more of the letter follows here-->
                 <div class="closer">
                  <span class="lineNum">59</span>  Much love,
                  <br/><span class="lineNum">60</span>  Warren.
                  </div>
               </div>
         </section>
      
        

The fine print: Don’t worry if your HTML output isn’t wrapped the same way ours is, if it puts the empty line break elements at the beginnings of lines instead of at the ends, or if it serializes (spells out) those empty line break elements as <br></br> instead of as <br/>. (You may even choose not to output the line breaks at all!) You should open your HTML output in <oXygen/> and simply check to make sure that what you’re producing is valid HTML and renders the text appropriately.

More fine print: If you are outputting <br/> elements, a line break only makes sense between lines: You don’t need a <br/> element at the start of the first line if you are outputting <p> elements to wrap paragraphs in HTML anyway. In our solution we used an <xsl:if> element to check the @n value on the source XML’s ln element, and if it was not equal to 1, we output the <br/> so we wouldn't get an extra blank line at the top of a paragraph. You can look up <xsl:if> at http://www.w3schools.com/xsl/xsl_if.asp or by searching for xsl:if on Obdurodon’s XSLT Advanced Features tutorial, or looking it up in the Michael Kay book so you can perform this check yourself.

Remember to output span elements for interesting markup in the texts that you can style (later) with CSS.

Once your documents are all being formatted correctly in HTML, you can add the functionality to create the table of contents at the top, using modal XSLT.

Adding the table of contents

The template rule for the document node in our solution, revised to output a table of contents with all the information we wish to show before the text of the poems, looks like the following:

             <xsl:variable name="WBColl" select="collection('XML/?select=*.xml')"/>
            <xsl:template match="/">
        <html>
            <head>
                <title>Warren Behrend’s Last Correspondence and Memorial</title>
                <link rel="stylesheet" type="text/css" href="style.css"/>
            </head>
            <body>
                <h1>Warren Behrend’s Last Correspondence and Memorial</h1>
                
                <section id="toc">
                    <h2>Contents</h2>
                    <ul>
                        <xsl:apply-templates select="$WBColl//xml" mode="toc"/>
                    </ul>
            <!-- ebb: Here I am outputting the table of contents using the special mode attribute, 
            Fbecause I will need to output the same xml elements differently when I am outputting the full text below.-->
                </section>
                <section id="fulltext">
                    <xsl:apply-templates select="$WBColl//xml"/>
                </section>
            <!--ebb: And here I am selecting the SAME xml elements for processing without the mode attribute. -->
  
            </body>
        </html>
    </xsl:template> 
        

The highlighted code is what we added to include a table of contents, and the important line is <xsl:apply-templates select="$WBColl//xml" mode="toc"/>. This is going to apply templates to each document with the @mode attribute value set to toc. The value of the @mode attribute is up to you (we used toc for table of contents), but whatever you call it, setting the @mode to any value means that only template rules that also specify a @mode with that same value will fire in response to this <xsl:apply-templates> element. Now we have to go write those template rules!

What this means is that when you process the <meta> and <body> elements to output the titles with the full text of the documents, you use <xsl:apply-templates> and <xsl:template> elements without any @mode attribute. To create the table of contents, though, you can have <xsl:apply-templates> and <xsl:template> elements that select or match the same elements, but that specify a mode and apply completely different rules. A template rule for <meta> elements in table-of-contents mode will start with <xsl:template match="meta" mode="toc">, and you need to tell it to create an <li> element that contains the text of the <title> element. You can then apply-templates with mode="toc" and write another template rule also with mode="toc" to output the first line of text in each document. The rule for those same elements not in any mode will start with <xsl:template match="$WBColl//body"> (without the @mode attribute). That rule can create a <section> element for each document with an <h2> header to hold the text of the <title> element, and then output the full text of the document using <p> elements, with <br/> elements between the lines. In this way, you can have two sets of rules for the poems, one for the table of contents and one to output the full text, and we use modes to ensure that each is used only in the correct place.

Remember: both the <xsl:apply-templates>, which tells the system to process certain nodes, and the <xsl:template> that responds to that call and does the processing must agree on their mode values. For the main output of the full text of every poem, neither the <xsl:apply-templates> nor the <xsl:template> elements specifies a mode. To output the table of contents, both specify the same mode.

Completing and checking your work