NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {dh|ds}
Authored by: Rebecca J. Parker (rjp43 at pitt.edu | Twitter: @bcpkr396) Edited and maintained by: Elisa E. Beshero-Bondar (ebb8 at pitt.edu) Creative Commons License Last modified: Friday, 24-Feb-2017 10:51:10 EST. Powered by firebellies.

The input text

For this assignment we’ll be producing HTML from an XML file originally prepared by students in the Nell Nelson project team in the fall of 2015 and modified for use in this XSLT exercise. The XML file is available here: http://newtfire.org/dh/NelsonArticle_1888-07-30.xml. You should right-click on this link, download the file, and open it in <oXygen/>.

The usual housekeeping:

Because this document is not in a namespace, we do not need the @xpath-default-namespace attribute, and the only thing we need to add to <oXygen>’s default XSLT stylesheet template. We also add our usual <xsl:output> line that we use when producing HTML (for making sure we produce valid HTML 5 in XHTML format). Here’s what we need:

          <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
    xmlns="http://www.w3.org/1999/xhtml" version="3.0">
    <xsl:output method="xhtml" encoding="utf-8" doctype-system="about:legacy-compat"
        omit-xml-declaration="yes"/>
            
        

Overview of the assignment

We’re going to work with this entire XML document (on all levels of the hierarchy), concentrating on processing the XML “salad” of mixed text and in-line elements to style them for presentation on the web in HTML’s limited tagset. You can use some of the basic HTML in-line elements, like <em> for emphasis or <strong> for strong emphasis, but you’ll also want to use CSS to set some elements to have different colors or background colors or to alter borders or fonts or font sizes or font styles (e.g., italic) or font weights (e.g., bold) or add text decoration (e.g., underlining) or text transformation (e.g., convert to all upper case) … well … really anything stylistically possible.

There are several types of in-line elements in our input XML document, but for the purposes of this assignment we will focus on processing the paragraphs (<p>) inside the <articleBody element> to receive special tagging:

Some are immediately inside <p> elements in the article body (like <workingConditions> and <unclear>). Others sit inside other in-line elements like all of the voice tags sitting inside of <dialogue> elements or <headLine> elements that sit inside the <head> element. You may not know at the outset which ones can be inside which other ones, or how deeply they can nest. Happily, with XSLT, unlike with many other programming languages, you don’t need to care about those questions!

An example of possible desired output can be found here http://newtfire.org/dh/1888-07-30_XSLTresults.html, though we did not style the body paragraphs in this output file. It is important to note that the majority of the styling choices on this file are controlled with a CSS file. You will make your own CSS and relate it to your XSLT; therefore, your stylistic choices might vary greatly from ours and your output may look completely different. What should look relatively similar is the underlying raw HTML, which is generated by running the XSLT. By viewing the page source of our output you can review the underlying raw HTML (view-source:http://newtfire.org/dh/1888-07-30_XSLTresults.html).

Guide to Approaching the Problem

In XSLT, processing something normally happens in two parts. You normally have an <xsl:apply-templates> element that tells the system what elements (or other nodes) you want to process, and you then have an <xsl:template> element that tells the system exactly how you want to process those elements, that is, what you want to do with them. If you find the image helpful, you can think of this as a situation where the <xsl:apply-templates> elements throw some nodes out into space and say would someone please process these? and the various <xsl:template> elements sit around watching nodes fly by, and when they match something, they grab it and process it.

Therefore, for this assignment, your XSLT transformation (after all the housekeeping) should have several template rules:

  1. Begin with a template rule for the document node (<xsl:template match="/">), in which the basic HTML output is created: the <html> element, <head> and its contents, and <body>. Inside the <body> element that just created, use <xsl:apply-templates> and select the various <head> elements (using XPath expressions indicating where they are in the source XML for each of the values of the @select attributes) and then do the same to select the <articleBody> text.
  2. Then create separate template rules that match on each of the inline elements you are required to style, so each rule will be invoked as a result of the preceding <xsl:apply-templates> selection from our first template rule.

In this case, then, your @select on the <xsl:apply-template> elements inside the template rule for the document node will tell the system what specific elements (using their XPath location in the source XML) you want to appear and where in your output HTML you wish for them to appear. You create the order each selection appears by placing the various <xsl:apply-template> elements in the desired order inside of that first template rule matching on the document node. This will tell the system that you want to select only certain elements, at which point the template rule for the document node will call out what portions of the document need to be processed at this particular point. The processing work actually gets done by the other <xsl:template> rules, the ones that you write to then match on the elements that need styled.

Analysis of the task

How to process richly mixed content

Prose paragraphs with in-line elements that might contain other in-line elements are richly mixed content, with varied and unpredictable combinations of elements and plain text. This is the problem that XSLT was designed to solve. With a traditional procedural programming language, you’d have to write rules like inside this paragraph, if there’s a <dialogue> do X, and, oh, by the way, check whether there’s a <nellVoice> or a <company> inside the <dialogue>, etc. That is, most programming languages have to tell you what to look for at every step. The elegance of XSLT when dealing with this type of data is that all you have to say inside paragraphs and other elements is I’m not worried about what I’ll find here; just process (apply templates to) all my children, whatever they might be.

The way to deal with mixed content in XSLT is to have a template rule for every element and use it to output whatever HTML markup you want for that element and then, inside that markup, to include a general <xsl:apply-templates/>, not specifying a @select attribute. For example, if you want your <nellVoice> to be tagged with the HTML <strong> tags, which means strong emphasis and which is usually rendered in bold, you could have a template rule like:

<xsl:template match="nellVoice">
  <strong>
      <xsl:apply-templates/>
  </strong>
</xsl:template>

You don’t know or care whether <nellVoice> has any children nodes or, if it does, what they are. Whatever they are, this rule tells the system to try to process them, and as long as there’s a template rule for them, they’ll get taken care of properly somewhere else in the stylesheet. If there are no children nodes, the <xsl:apply-templates/> will apply vacuously and harmlessly. As long as every element tells you to process its children, you’ll work your way down through the hierarchy of the paragraph without having to know which elements can contain which other elements or text nodes.

Taking stock: when to use @select

In our XSLT tutorial we describe the use of <xsl:apply-templates select="…"/> which specifies exactly what you want to process and where. That makes sense when your input and output are very regular in structure. Use the @select attribute when you know exactly what you’re looking for and where you want to put it. We will want to use <xsl:apply-templates select="…"/> in order to grab all of the <headline> elements sitting inside of the <head> element and to output them inside of the <html> element at the beginning of your XSLT transformation separate from the <articleBody> text. We will also want to use the <xsl:apply-templates select="…"/> in order to place the rest of the source text sitting inside of <articleBody> into a <p> element below the headlines. By setting up these very specific selections of these elements we are deciding the placement of where the headlines of the source document sit in relation to the rest of the text found in <articleBody> for our HTML output. It would also be logical to add heading elements in the HTML portion of our XSLT to indicate placement of the <newspaperTitle>, <seriesTitle>, <date>, and <byline> elements. Consider where each of these elements’ placements make sense in relation to the <headline> elements and the <p> element containing the text from the <articleBody> element. Don’t forget what is represented in the <html> element of your XSLT is the basic superstructure of your output HTML document; therefore, the content inside of the <head> element, including the <title> element, will not appear unless the underlying HTML is being viewed. Hence the importance in creating visible header elements (<h1>, <h2>, etc.) that contain the actual title and byline information.

For the rest of this assignment, you don’t know (and don’t need to know) the order and nesting hierarchy of whatever salad of elements and plain text you might find inside <articleBody>, <head> or its sub-elements. You just want to process whatever comes up whenever it comes up. <xsl:apply-templates/> without the @select attribute says apply templates to whatever you find. Omit the @select attribute where you don’t want to have to think about and cater to every alternative individually. (You can still treat them all differently because you’ll have different template rules to catch them, but when you assert that they should be processed, you don’t have to know what they actually are.)

What should the output look like

HTML provides a limited number of elements for styling in-line text, which you can read about at http://www.w3schools.com/html/html_formatting.asp. You can use any of these in your output, but note that presentational elements, the kind that describe how text looks (e.g., <i> for italic), are generally regarded as less useful than descriptive tags, which describe what text means (e.g., <em> for emphasis). Both of the preceding are normally rendered in italics in the browser, but the semantic tag is more consistent with the spirit of XML than the presentational one.

The web would be a dull world if the only styling available were the handful of presentational tags available in vanilla HTML. In addition to those options, there are also ways to assign arbitrary style to a snippet of in-line text, changing fonts or colors or other features in mid-stream. To do that:

  1. Before you read any further in this page, read Obdurodon’s Using <span> and @class to style your HTML page.
  2. To use the strategies described at that page, create an XSLT template rule that transforms the element you want to style to an HTML <span> element with a @class attribute. For example, you might transform <nellVoice> in the input XML to <span class="nellVoice">...text node (represented in XSLT with <xsl:apply-templates/>) ...</span> in the output HTML. You can then specify CSS styling by reference to the @class attribute, as described in the page we link to above.

    Note that you can make your transformations very specific. For example, instead of setting all <workingConditions> elements to the same HTML @class, you can create separate template rules to match on workingConditions according to their attribute values. For example, <xsl:template match="workingConditions[@category='positive']"> is a normal XPath expression to match <workingConditions> elements only if they have a @category attribute with the value positive. So within that matching template rule you create a <span> element with a logical @class (let’s say positive) and then simply place the <xsl:apply-templates/> inside of the span. Then in the CSS make reference to the @class, again as described in the page we link to above.

  3. This next part really exercises your XPath skills! Note that directly inside of the <root> element in the <toneElements> element the @category attributes of the <workingConditions> elements are listed with an @id associated to the @category attributes of the <workingConditions> elements and an @tone attribute declaring values of good, bad, or neutral. You can write a matching rule that will dereference the @category attribute on, say, <workingConditions category="positive">...text node...</workingConditions>, look up whether this is a good, bad, or neutral tone, and set the @class value accordingly. You could then make all good working Conditions one color and all bad working conditions a different color, letting XPath look up the tone reference for you. Hint: In your XSLT matching template rule set the @category of an inline <workingConditions> element equal to the XPath steps of the specific associated @tone attributes in the <toneElements> list. Then precede with the <span> and @class setup detailed above. A similar dereference can be made with the @connotation attribute on the variety of voice tags (<femVoice>, <nellVoice>, <mascVoice>).
  4. Setting the @class attributes in the output HTML makes it possible to style the various <span> elements differently according to the value of those attributes, but you need to create a CSS stylesheet to do that. Create the stylesheet (just as you’ve created CSS in the past), and specify how you want to style your <span> elements. Link the CSS stylesheet to the XSLT by creating the appropriate <link> element inside of the HTML <head> element of your XSLT (you can remind yourself of the <link> element format by referencing our CSS Tutorial).
  5. Besides wrapping your <xsl:apply-templates/> in <span> elements and other HTML elements (Hint: including HTML heading elements say if you want each of your <headLine> elements to appear as individual headings instead of block text) you might consider adding extra spaces or text outside some of these as well. For example, in our HTML output note that each of the voice tags have some added words appearing in front of the quoted speeches (where the sex of the speaker or if Nelson was the speaker is indicated). Also we added double <br/> elements to add space around the blocks of dialogue. Use what you think looks best and provides the most readable HTML output.
  6. The element <unclear> will need a slightly different rule matching on it. If you refer back to the source document and XPath your way to the empty self-closing <unclear/> elements, you might notice that the purpose of this element in the original document was to take place for word(s) that the project team was unable to transcribe due to the poor quality of the original source images. We do not want to lose the information that there are words missing when we transform this document into HTML. Like all other XML elements the <unclear> element will disappear and therefore we will no longer have the marker telling the reader that there is a word missing here! Since <unclear> is an empty element, it contains no text of its own to process, so if we want to output anything in our HTML for it, we need to generate that text ourselves. To remedy this we want to write a template rule matching on <unclear>, and inside of that rule we want to include some placeholder informational text, a kind of pseudomarkup, just to indicate there is a word missing. Here’s how we chose to present it (and you may choose to do this differently):
    <xsl:text>[missing word(s)]</xsl:text>
    Check out our sample output to see the result of that template rule more clearly.

Your Final Results

What you should produce, then, is:

Important