Thursday, December 17, 2015

Transforming DocBook XML contents into HTML and PDF in C# (Part 1)

In previous articles in my blog I discussed on generating DOcBook standard XML and converting XML to HTML and PDF.

DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary .
DocBook stylesheets distribution http://wiki.docbook.org/topic/DocBookXslStylesheets.

System.xml namespace has support for xsl transformation in c#. XslCompiledTransform class is recommended to perform such a transform. In order to implement the transformation logic in the code, it is needed to compile xsl stylesheets in advance using xsltc tool which comes with Visual Studio. This is to avoid possible StackOverflowException in loading xsl into XslCompiledTransform since DocBook stylesheets are bit large.
Open the visual studio command prompt, navigate to the location of /docbook-xsl-1.71.0/ and pre-compile xsl as following.
Navigate into /docbook-xsl-1.71/fo/ folder and execute the following command in VS command prompt.

\docbook-xsl-1.71.0\html>xsltc /c:DocBookV5FoXsl docbook.xsl /settings:script+,dtd,document /out:DocBookV5FoXsl.dll

Likewise, navigate into /docbook-xsl-1.71/html/ folder and execute a similar command to get class DocBookV5HtmlXsl inside DocBookV5HtmlXsl.dll
Next, copy the generated dll files into solution, and add to your project as references. You may now perform the transformation as follows.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
 
//Object to hold output xml
XDocument newTree = new XDocument();

using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the pre-compiled style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver();
    //Load the pre-compiled Docbook to Html xsl
    xslt.Load(typeof(DocBookHtmlXsl));
    // Execute the transform and output the results to a writer.
    xslt.Transform(XmlReader.Create(@"", settings), writer);
}

It is important that you need to pre-compile and load XSL, because if you try to load it as below, it will result in StackOverflowException. (reference)

XSL Transformation StackOverflow Scenario as follows:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
  
XslCompiledTransform xslt = new XslCompiledTransform();
var xmlresolver = new MyXMLResolver();
xslt.Load(XmlReader.Create(@"path to your docbook.xsl file", settings), null, xmlresolver);

Note:  Since docbook xsl contains xsl include statements such as <xsl:include href="../lib/lib.xsl"/> it is important that you create XmlReader passing xsl file location at xslt.Load(XmlReader.Create(@"<path to your docbook.xsl file>", settings), null, xmlresolver); line. If you try to create XmlReader with a Stream from the xsl file, it will create FileNotFoundExceptions.

Reference:
XSLT Compiler (xsltc.exe) [link]
How to: Perform an XSLT Transformation by Using an Assembly [link]

I will discuss  FOP implementation options in C# to convert generated FOP file into PDF in Part 2 of this article.

Thursday, November 12, 2015

DocBook XML to PDF Conversion via FOP - Manual Approach with Oxygen XML Editor

In my previous post, I discussed how to convert DocBook xml to html. The easiest way to generate pdf from DocBook xml is to use a pdf tool which can convert html to pdf.

If we are to use FOP (Formatting Objects Processor) to generate pdf, the conversion would be docbook5 —(xsl)—> xml.fo —(fop)—> pdf . Commands involved: xsltproc, fop. The following figure illustrates the pipeline.

Xml to Pdf conversion process via Fop

In this example, I have used DocBook version 1.71.0 for the transformation. DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary.

From DocBook 1.71.0 distributon, following stylesheets can be used to transform XML files into HTML/ XML FO.

XML to HTML: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\html\docbook.xsl
XML to XML FO: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl

For better support in xml editing, xslt debugging, commercial tools are there such as Oxygen XML editor. Oxygen XML Editor also ships sample DocBook xml files, DocBook xsds, xslt and transformation tools by default.

Below is how to use Oxygen XML to do the transformation manually.

  • Open sample DocBook xml files in Oxygen Editor. You may use the sample DocBook version 5 files which is included in Oxygen Editor.
  • In the XML, append the below like after xml decleration tag.

<?xml-stylesheet type="text/xsl" href="<Path To>\docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl"?>

  • Using the Oxygen Editor, perform the xml to xml fo transformation. You may save the output file with extension .fo
  • From XML FO, we need a formating objects processor (FOP) to generate PDF output. There are few commercial tools which supports .net for this. From Java background, apache foundation is maintaing an open source project called Apache FOP which can be used to generate pdf from xml fo.  For the POC, we have used Apache FOP command line tool which can be downloaded from here.
  • After downloading Apache FOP, extract it to a folder. In command prompt, navigate to the apache fop installation folder and execute the following command.
fop -fo -pdf

Apache FOP is an efficient tool which will generate the pdf within seconds.

Monday, November 2, 2015

Transforming DocBook XML Contents into HTML in C#

From my previous article, I talked on generating a DocBook  xml in C#. When you have a set of DocBook xmls, you need to transform it to a presentation format to make those book data readable. Here we will look on how to transform DocBook xml files into html.
One of main advantages pf DocBook, is that DocBook xml content files can be converted to many formats by xsl stylesheets transformations. Its easy as writing few code lines.

string xslMarkup = @"<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
    <xsl:template match='/Parent'>
        <Root>
            <C1>
            <xsl:value-of select='Child1'/>
            </C1>
            <C2>
            <xsl:value-of select='Child2'/>
            </C2>
        </Root>
    </xsl:template>
</xsl:stylesheet>";
XDocument xmlTree = new XDocument(
    new XElement("Parent",
        new XElement("Child1", "Child1 data"),
        new XElement("Child2", "Child2 data")
    )
);
             
//The output xml
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load(XmlReader.Create(new StringReader(xslMarkup)));
    // Execute the transform and output the results to a writer.
    xslt.Transform(xmlTree.CreateReader(), writer);
}


You can find DocBook xsl stylesheets distribution from http://wiki.docbook.org/topic/DocBookXslStylesheets
Also have a look on list of DocBook tools available at http://wiki.docbook.org/DocBookTools

Tuesday, September 1, 2015

Generating DocBook xml from C#.net

DocBook is a semantic markup language for technical documentation. As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages, Web help and HTML Help, without requiring users to make any changes to the source. The conversion from DocBook to other format may perform via XSLT transformations. Hence DocBook is quite useful in content authoring as an generic format to store data.

If maintainability and ease of implementation are considered, one of the good approach would be to use (Java) Jaxb equivalent implementation in C#.net. Using Jaxb, POJO (Plain Old Java Objects) classes can be generated from xml schema definition files. Generated POJOs can be used to serialze and deserialize data from/into xml files.

Following libraries and tools can be used to generate POCO (Plain Old C# Objects) from DocBook schema files.


In my attempt, the xsd.exe which is available with VS SDK was tried out to generate C# classes from DocBook xsd file. Here is how to use it.

  1. Download the xsd schema file from http://www.docbook.org/xml/5.0/xsd/docbook.xsd and place it in a folder. (I have used DocBook version 5.0 schema)
  2. Open Visual Studio command promot and cd to the folder which contains the docbook.xsd
  3. Run the command xsd docbook.xsd /c /l:CS

If you get following errors while generating classes from DocBook xsd; the easiest solution would be just to comment out the lines state in warnings and execute the same command with xsd.exe :)

Microsoft (R) Xml Schemas/DataTypes support utility
[Microsoft (R) .NET Framework, Version 4.0.30319.18020]
Copyright (C) Microsoft Corporation. All rights reserved.
Schema validation warning: The 'http://www.w3.org/1999/xlink:href' attribute is not declared. Line 46, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:type' attribute is not declared. Line 47, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:role' attribute is not declared. Line 48, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:arcrole' attribute is not declared. Line 49, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:title' attribute is  not declared. Line 50, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:show' attribute is not declared. Line 51, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:actuate' attribute is not declared. Line 52, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:label' attribute is  not declared. Line 7515, position 8.
Schema validation warning: The 'http://www.w3.org/1999/xlink:from' attribute is not declared. Line 7522, position 8.
Schema validation warning: The 'http://www.w3.org/1999/xlink:to' attribute is no t declared. Line 7523, position 8.
Warning: Schema could not be validated. Class generation may fail or may produce  incorrect results.
Error: Error generating classes for schema 'docbookV5'.
  - The attribute href is missing.
If you would like more help, please type "xsd /?".

Though I was able to generate POCOs from xsd.exe, the generated classes resulted in StackOverflowException when trying to initialize with XmlSerializer(). 

As a remedy, Xsd2Code can be successfully generate POCOs instead of xsd.exe.  The steps to use Xsd2Code can be found at http://xsd2code.codeplex.com/. Generated classes from Xsd2Code worked well without any issue.

After generating POCOs, you can include it in your Visual Studio project and use it to write into/ read from DocBook XML files by serializing/deserializing. Sysmet.Xml.Serialization.XmlSerialiser can be used to above purpose. Following utility class can be used for that.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Web;
using System.Xml.Serialization;

namespace DocBookXml
{
public static class XmlConverter
{
private static XmlSerializer _serializer = null;

///
/// Static constructor that initialises the serializer for this type
///
static XmlConverter()
{
_serializer = new XmlSerializer(typeof(T));
}

///
/// Deserialize the supplied XML into an object
///
///
///
public static T ToObject(string xml)
{
return (T) _serializer.Deserialize(new StringReader(xml));
}

///
/// Serialize the supplied object into XML
///
///
///
public static string ToXML(T obj)
{
using (var memoryStream = new MemoryStream())
{
_serializer.Serialize(memoryStream, obj);

return Encoding.UTF8.GetString(memoryStream.ToArray());
}
}

}
}

Reference documentation for DocBook XSL transforms
HTML edition of book explaining the use of DocBook XSL