counter


Username:Password:
///////////////////////////////////

February 23, 2008

Optimized Presentation of XML Content

Filed under: Xml — admin @ 10:13 pm

Ivan Pepelnjak shows how to optimize the process of converting XML back-end data stored on a web server into HTML markup displayed in a web browser.

Introduction

The internal representation of static web site content or dynamic data displayed on a web site is increasingly stored in XML format due to ever-widening support of XML in scripting languages, web browsers, and SQL databases. Before this semantically structured data is presented to the end user, it’s almost always transformed into browser-friendly HTML markup.

There are many ways to perform XML-to-HTML transformation. Sometimes programmers use XML Document Object Model (DOM) calls in web server scripts to extract individual fields and attributes from the XML data and insert them into an HTML template. Similarly, some AJAX solutions download XML data to the browser and perform the copy/paste operations there. More advanced engineers usually tap the power of XSLT to handle XML-to-HTML transformations on the web server or within JavaScript code on browsers supporting XSLT.

All of these solutions are suboptimal for the following reasons:

  • If you perform XML-to-HTML conversion on the web server, you’re wasting server CPU cycles and increasing the page download time. (HTML output is almost always significantly larger than the corresponding XML data.)
  • If you perform the conversion on the web browser within an AJAX framework, the web pages are not visible to search engines, older browsers, or users who have disabled JavaScript.

In the following sections, you’ll learn how you can use the XSLT technology to reduce the document download time and server CPU utilizations for most of your visitors, while at the same time retaining compatibility with older browsers and ensuring visibility to search engines.

The Framework

The solution presented in this article is architecturally very simple:

  • Source data (page content or database-derived data) is in XML format.
  • XSLT is used to transform XML to HTML.
  • Transformed HTML is displayed in the web browser or returned to search engine spiders.
  • The XML-to-HTML transformation ideally is performed in an XSLT-capable browser or on the web server to support non-XSLT-capable clients.

The code examples we’ll use for this article also assume the following (although you can easily adapt them to fit your environment):

  • XSLT transformation results in a complete (X)HTML document. Alternatively, you can use AJAX-based solutions to render XML fragments in XSLT-capable browsers or on the web server.
  • XML data processed on the server should be in a format that allows XSLT local transformation. (In most cases, this requires parsing the XML data into a DOM tree.) Potential workarounds are available in case you serve static XML files.
  • A cookie is set on XSLT-capable browsers to tell the web server that it doesn’t have to perform the XSLT transformation. This cookie can be set on the fly (by the first page downloaded by the visitor) or in a dedicated browser-check web page.

To request the browser to perform the XSLT transformation, we’ll use the <?xml-stylesheet ?> processing instruction, with the type attribute set to text/xsl. Upon encountering this instruction in an XML document, XSLT-capable browsers download the referenced XSLT style sheet and use it to transform the XML data into HTML markup, resulting in the following benefits:

  • The XML-to-HTML processing is performed on the browser, reducing CPU utilization on the web server.
  • The XSL transformation achieves perfect separation of data (XML) and its presentation (HTML).
  • The web page formatting is stored in static .xsl files that are cached by the browser. The formatting rules are thus downloaded only once and used to transform many subsequent data pages.
  • The solution doesn’t involve JavaScript and hence works for all visitors with XSLT-capable browsers (including Internet Explorer 6 and above as well as Firefox), regardless of their security settings.

For visitors without XSLT-capable browsers or for search engine spiders, the XML-to-HTML transformation is performed on the web server, and the resulting HTML markup is served to the visitor.

The Solution

The core component of the solution (the complete source is available here) is the OutputXMLResponse subroutine shown in Listing 1, which checks for the presence of the XML cookie with the UseXMLOutput function, outputting an XML document if the cookie is set, or the transformed HTML markup if the cookie is not present. The parameters to this function are an XML document parsed in a DOM tree (XDoc) and the relative path to the XSLT style sheet (StyleURL).

NOTE

The code examples in the article use Microsoft XML (MSXML) with ASP. PHP DOM processing is very similar.

Listing 1 Main output function.

Sub OutputXMLResponse(XDoc,StyleURL)
 Dim XSLT,HText

 If UseXMLOutput Or IsNull(StyleURL) Then
  OutputXMLDocument XDoc,StyleURL
 Else
  HText = LocalXSLTransform(XDoc,StyleURL)
  OutputUTFtext HText, "text/html"
 End If
End Sub

The UseXMLOutput function, shown in Listing 2, checks for the presence of the XML cookie. Due to incomplete support of XSLT in Internet Explorer 5, it always returns False if the user agent is IE5.

Listing 2 Select XML or HTML response.

Function UseXMLOutput
 UseXMLOutput = False
 If InStr(Request ("HTTP_USER_AGENT"),"MSIE 5") > 0 Then Exit Function
 UseXMLOutput = Request.Cookies("XML") = "1"
End Function

The OutputXMLDocument subroutine in Listing 3 is conceptually simple: It prepends the xml-stylesheet processing directive in front of the XML root element, and outputs the value of the xml property of the DOM document by using the OutputUTFtext function. A minor adjustment is needed to ensure that the <?xml ?> processing instruction is always present in the output XML stream; see the source code for the complete subroutine.

NOTE

The PHP equivalent for the xml property is the DomDocument->dump_mem method.

Listing 3 Send XML text to the client.

Sub OutputXMLDocument(XDoc,StyleURL)
 Dim XPI
 If StyleURL <> "" Then
  Set XPI = XDoc.createProcessingInstruction("xml-stylesheet", "href=’" & StyleURL & "’ type=’text/xsl’")
  XDoc.insertBefore XPI, XDoc.firstChild.nextSibling
 End If
 OutputUTFtext XDoc.xml,"text/xml"
End Sub

When the XML cookie is not set, the LocalXSLTransform function in Listing 4 performs the XSLT transformation. It creates a new DOM document, loads the XSL style sheet into it, and uses the style sheet to transform the input DOM document. You could also enhance its performance with server-cached XSLT processors.

NOTE

Several options are set on the input XML document and the XSL style sheet with the SetXMLOptions subroutine to enable the use of the <xsl:import> tag and the XSL document() function.

Listing 4 Perform local XSLT transformation.

Function LocalXSLTransform(XDoc,StyleURL)
 Dim XSLT
 Set XSLT = Server.CreateObject(DOMClass)
 SetXMLOptions XDoc : SetXMLOptions XSLT
 If Not XSLT.Load (Server.MapPath(StyleURL)) Then RaiseError "XSL stylesheet load failed " & StyleURL
 If XSLT.parseError.errorCode <> 0 Then RaiseError "XML parsing failed: " & XSLT.parseError.reason
 LocalXSLTransform = XDoc.transformNode(XSLT)
End Function

Sub SetXMLOptions (XD)
 XD.Async = False
 XD.setProperty "ServerHTTPRequest",true
 XD.setProperty "AllowDocumentFunction",true
End Sub

Sub RaiseError(Txt)
 Err.Raise vbObjectError+1, "XML library",Txt
End Sub

Finally, the OutputUTFtext function in Listing 5 returns the XML text string or HTML markup to the client. This function clears the output buffer (to erase any previous debugging messages or HTML markup embedded in the script), sets the content-type header, changes the codepage property to a UTF-8 code page, displays the input result text, and stops response processing.

NOTE

The HTML markup produced by the local XSLT transformation doesn’t include the META http-equiv=content-type tag, to prevent the mismatch between the HTTP response header and the META tag. The source code for the LocalXSLTransform function includes a fix for a bug related to META tags in MSXML version 4.0.

Listing 5 Output UTF-8 text to the client.

Sub OutputUTFtext (txt,contentType)
 Response.Clear
 Response.ContentType = contentType
 Response.Charset = "utf-8"
 Response.Codepage = 65001
 Response.Write txt
 Response.End
End Sub

Browser Checking

The code needed to check the browser’s XSLT support is conceptually simple:

  1. A small XML file requesting an XSLT style sheet is downloaded from the web server (see Listing 6).

    Listing 6 Browser-check XML file.

    <?xml version="1.0" ?>
    <?xml-stylesheet href="browserCheck.xsl" type="text/xsl"?>
    <root />
  2. The resulting HTML markup contains JavaScript code to set the XML cookie (see Listing 7). Therefore, if the browser processes the <?xml-stylesheet ?> directive correctly, the XML cookie will be set. (You can also include the Internet Explorer 5 check here.)

    Listing 7 Browser-check XSL style sheet.

    <?xml version="1.0" encoding="utf-8" ?>
    <xsl:stylesheet
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    <xsl:output method="html" />
    
    <xsl:template match="/">
     <html>
      <head><script src="xmlPresentation.js" type="text/javascript"><!-- contains xSetCookie --></script></head>
      <body onload="xSetCookie(’XML’,’1’)">Cookie set</body>
     </html>
    </xsl:template>
    
    </xsl:stylesheet>

NOTE

To support clients that have disabled JavaScript, you could include an IFRAME tag in the XSL style sheet to trigger another request to the server. The response to the second request would set the XML cookie.

The simplest way to integrate the browser-check code with your web pages is to include a hidden IFRAME in every page:

<IFRAME src="browserCheck.xml" style="height: 0px; width: 0px;" />

If you use this solution, make sure that you attend to these details:

  • The browser-check XML file and XSL style sheet must have very long expiration times (you might have to use dynamic scripts to set the Expires header), so that they’re not continuously reloaded from the web server.
  • The robots.txt file on your web server should prevent search engines from accessing the browser-check XML file; otherwise it will become the highest-ranking page on your web site.

You can also implement a number of other solutions:

  • Set an XSL transformation parameter or a special attribute in the XML root element on server transforms, and include the IFRAME code only if the transformation is performed on the server.
  • Check for the presence of an XML cookie with JavaScript, and dynamically generate the IFRAME on the browser if the XML cookie is not set.

Static XML Files

The static XML files residing on your server must be handled in three different ways:

  • If the XML file contains the <?xml-stylesheet ?> directive, it can be sent to XSLT-capable browsers directly.
  • If the desired XSLT style sheet has to be inserted into the XML data, the file has to be read and processed by a server-side script and sent to XSLT-capable browsers as an XML data stream.
  • For all other clients (including search engines), the XSLT transformation is performed on the server.

Due to the varying client requirements, the static XML files cannot be served directly, but instead have to be processed by a server-side script accepting the filename and XSLT style sheet as input parameters. The URL to download a static XML file would thus be similar to this:

sendXML.asp?file=filename&xsl=stylesheet

The easiest way to implement this script is to load the XML data from the source file into a DOM tree structure and output the result using the OutputXMLResponse function described earlier. A simplistic implementation of this script (with no error checking) is included in Listing 8. Alternatively, you can implement the same functionality by reading the source XML file and inserting the <?xml-stylesheet ?> directive as a string in the output stream.

NOTE

A complete implementation of the file-serving script would include setting the Last-Modified HTTP header and processing the If-Modified-Since header. See my article “Reap the Benefits of Web Caching, Part 2: Reduce the Download Time” for more details.

Listing 8 Return XML or HTML response from a static XML file.

Set XDoc = Server.CreateObject("MSXML2.DOMDocument.5.0")
XDoc.Load(Server.MapPath(Request("file")))
xmlStyleSheet = Null
If Request("xsl") <> "" Then xmlStyleSheet = Request("xsl")
OutputXMLResponse XDoc,xmlStyleSheet

For XML files already containing the <?xml-stylesheet ?> directive (or whenever sendXML.asp is called without the style sheet parameter), you can implement another optimization technique: Rather than returning the raw XML data, the server-side script can return the 301 (Moved Permanently) HTTP status code, which ensures that the browser won’t call the server-side script in the future but instead will fetch the XML file directly.

Summary

XSLT is a powerful tool to transform server-side XML data into client-side HTML markup. Most commonly, it’s used on the web server, where its usage increases the server CPU utilization as well as the web page download time, since the HTML markup is usually significantly larger than the underlying XML data.

In this article, you’ve learned how you can use the <?xml-stylesheet ?> processing instruction to request XML-to-HTML transformation on the web browser. This solution doesn’t require JavaScript (like AJAX-based solutions) and thus is also available to visitors who have disabled JavaScript in their browsers.

With this solution, server-side XML-to-HTML transformation will be performed for visitors without XSL-capable browsers, as well as for search engine spiders. A cookie set with an automatic browser-check is used in the framework presented in this article to identify XSL-capable browsers; all other visitors receive traditional HTML markup generated on the server.


No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URI

Sorry, the comment form is closed at this time.