Thursday, December 17, 2015

Transforming DocBook XML contents into HTML and PDF in C# (Part 1)

In previous articles in my blog I discussed on generating DOcBook standard XML and converting XML to HTML and PDF.

DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary .
DocBook stylesheets distribution http://wiki.docbook.org/topic/DocBookXslStylesheets.

System.xml namespace has support for xsl transformation in c#. XslCompiledTransform class is recommended to perform such a transform. In order to implement the transformation logic in the code, it is needed to compile xsl stylesheets in advance using xsltc tool which comes with Visual Studio. This is to avoid possible StackOverflowException in loading xsl into XslCompiledTransform since DocBook stylesheets are bit large.
Open the visual studio command prompt, navigate to the location of /docbook-xsl-1.71.0/ and pre-compile xsl as following.
Navigate into /docbook-xsl-1.71/fo/ folder and execute the following command in VS command prompt.

\docbook-xsl-1.71.0\html>xsltc /c:DocBookV5FoXsl docbook.xsl /settings:script+,dtd,document /out:DocBookV5FoXsl.dll

Likewise, navigate into /docbook-xsl-1.71/html/ folder and execute a similar command to get class DocBookV5HtmlXsl inside DocBookV5HtmlXsl.dll
Next, copy the generated dll files into solution, and add to your project as references. You may now perform the transformation as follows.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
 
//Object to hold output xml
XDocument newTree = new XDocument();

using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the pre-compiled style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver();
    //Load the pre-compiled Docbook to Html xsl
    xslt.Load(typeof(DocBookHtmlXsl));
    // Execute the transform and output the results to a writer.
    xslt.Transform(XmlReader.Create(@"", settings), writer);
}

It is important that you need to pre-compile and load XSL, because if you try to load it as below, it will result in StackOverflowException. (reference)

XSL Transformation StackOverflow Scenario as follows:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
  
XslCompiledTransform xslt = new XslCompiledTransform();
var xmlresolver = new MyXMLResolver();
xslt.Load(XmlReader.Create(@"path to your docbook.xsl file", settings), null, xmlresolver);

Note:  Since docbook xsl contains xsl include statements such as <xsl:include href="../lib/lib.xsl"/> it is important that you create XmlReader passing xsl file location at xslt.Load(XmlReader.Create(@"<path to your docbook.xsl file>", settings), null, xmlresolver); line. If you try to create XmlReader with a Stream from the xsl file, it will create FileNotFoundExceptions.

Reference:
XSLT Compiler (xsltc.exe) [link]
How to: Perform an XSLT Transformation by Using an Assembly [link]

I will discuss  FOP implementation options in C# to convert generated FOP file into PDF in Part 2 of this article.