Thursday, November 12, 2015

DocBook XML to PDF Conversion via FOP - Manual Approach with Oxygen XML Editor

In my previous post, I discussed how to convert DocBook xml to html. The easiest way to generate pdf from DocBook xml is to use a pdf tool which can convert html to pdf.

If we are to use FOP (Formatting Objects Processor) to generate pdf, the conversion would be docbook5 —(xsl)—> xml.fo —(fop)—> pdf . Commands involved: xsltproc, fop. The following figure illustrates the pipeline.

Xml to Pdf conversion process via Fop

In this example, I have used DocBook version 1.71.0 for the transformation. DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary.

From DocBook 1.71.0 distributon, following stylesheets can be used to transform XML files into HTML/ XML FO.

XML to HTML: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\html\docbook.xsl
XML to XML FO: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl

For better support in xml editing, xslt debugging, commercial tools are there such as Oxygen XML editor. Oxygen XML Editor also ships sample DocBook xml files, DocBook xsds, xslt and transformation tools by default.

Below is how to use Oxygen XML to do the transformation manually.

  • Open sample DocBook xml files in Oxygen Editor. You may use the sample DocBook version 5 files which is included in Oxygen Editor.
  • In the XML, append the below like after xml decleration tag.

<?xml-stylesheet type="text/xsl" href="<Path To>\docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl"?>

  • Using the Oxygen Editor, perform the xml to xml fo transformation. You may save the output file with extension .fo
  • From XML FO, we need a formating objects processor (FOP) to generate PDF output. There are few commercial tools which supports .net for this. From Java background, apache foundation is maintaing an open source project called Apache FOP which can be used to generate pdf from xml fo.  For the POC, we have used Apache FOP command line tool which can be downloaded from here.
  • After downloading Apache FOP, extract it to a folder. In command prompt, navigate to the apache fop installation folder and execute the following command.
fop -fo -pdf

Apache FOP is an efficient tool which will generate the pdf within seconds.

Monday, November 2, 2015

Transforming DocBook XML Contents into HTML in C#

From my previous article, I talked on generating a DocBook  xml in C#. When you have a set of DocBook xmls, you need to transform it to a presentation format to make those book data readable. Here we will look on how to transform DocBook xml files into html.
One of main advantages pf DocBook, is that DocBook xml content files can be converted to many formats by xsl stylesheets transformations. Its easy as writing few code lines.

string xslMarkup = @"<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
    <xsl:template match='/Parent'>
        <Root>
            <C1>
            <xsl:value-of select='Child1'/>
            </C1>
            <C2>
            <xsl:value-of select='Child2'/>
            </C2>
        </Root>
    </xsl:template>
</xsl:stylesheet>";
XDocument xmlTree = new XDocument(
    new XElement("Parent",
        new XElement("Child1", "Child1 data"),
        new XElement("Child2", "Child2 data")
    )
);
             
//The output xml
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load(XmlReader.Create(new StringReader(xslMarkup)));
    // Execute the transform and output the results to a writer.
    xslt.Transform(xmlTree.CreateReader(), writer);
}


You can find DocBook xsl stylesheets distribution from http://wiki.docbook.org/topic/DocBookXslStylesheets
Also have a look on list of DocBook tools available at http://wiki.docbook.org/DocBookTools