Thursday, November 12, 2015

DocBook XML to PDF Conversion via FOP - Manual Approach with Oxygen XML Editor

In my previous post, I discussed how to convert DocBook xml to html. The easiest way to generate pdf from DocBook xml is to use a pdf tool which can convert html to pdf.

If we are to use FOP (Formatting Objects Processor) to generate pdf, the conversion would be docbook5 —(xsl)—> xml.fo —(fop)—> pdf . Commands involved: xsltproc, fop. The following figure illustrates the pipeline.

Xml to Pdf conversion process via Fop

In this example, I have used DocBook version 1.71.0 for the transformation. DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary.

From DocBook 1.71.0 distributon, following stylesheets can be used to transform XML files into HTML/ XML FO.

XML to HTML: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\html\docbook.xsl
XML to XML FO: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl

For better support in xml editing, xslt debugging, commercial tools are there such as Oxygen XML editor. Oxygen XML Editor also ships sample DocBook xml files, DocBook xsds, xslt and transformation tools by default.

Below is how to use Oxygen XML to do the transformation manually.

  • Open sample DocBook xml files in Oxygen Editor. You may use the sample DocBook version 5 files which is included in Oxygen Editor.
  • In the XML, append the below like after xml decleration tag.

<?xml-stylesheet type="text/xsl" href="<Path To>\docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl"?>

  • Using the Oxygen Editor, perform the xml to xml fo transformation. You may save the output file with extension .fo
  • From XML FO, we need a formating objects processor (FOP) to generate PDF output. There are few commercial tools which supports .net for this. From Java background, apache foundation is maintaing an open source project called Apache FOP which can be used to generate pdf from xml fo.  For the POC, we have used Apache FOP command line tool which can be downloaded from here.
  • After downloading Apache FOP, extract it to a folder. In command prompt, navigate to the apache fop installation folder and execute the following command.
fop -fo -pdf

Apache FOP is an efficient tool which will generate the pdf within seconds.

No comments: