Tuesday, February 2, 2016

Transforming DocBook XML contents into HTML and PDF in C# (Part 2)

This is a continuation from my previous article on transforming DocBook XML contents into HTML and PDF. In this article, I will compare some of FOP implementation options in C#.

.NET framework itself does not have its own implementation for handing FOP processing. Though there are few commercial fop libraries are there having the capability of doing fop to pdf conversion, it seems to be too expensive. No open source native .net fop implementations were found. From Java side, Apache is actively maintaining a open source project called Apache FOP which is easy to use, and has good performance. Following options are there in integration apache fop to generate PDF.

FO.NET [link]
NFop [link]
Apache FOP using IKVM
Calling Apache FOP command promt comands

Using FO.NET

FO.NET consists of 100% C# managed code. It is not just .NET integration, but a full port of FOP to the .NET environment. FO.NET is a port of Apache FOP to .NET based on version 0.20.4 of FOP and does not follow FOP's subsequent progress. How well FO.NET handles large amounts of data depends on the complexity of the XSL-FO. For reasonably complex XSL-FO developers said that they have seen processing times of about 1 second per page.

This is an open source project, but it seems its no longer actively maintained.

To use fo.net, download fo,net binaries, and place them in a folder. For manual conversion you may use the command line tool. A sample comand would be as follows. If it fails to create output pdf file, you may create a emty pdf file on the filesystem first, and give the location of it as the output pdf file location. The tool will overwrite the file with generated pdf content.

fonet -fo <path to fo xml file>  <path to output pdf file>

To generate three pages pdf from a sample fo file which is generated previously, fo.net command line took around 10000 ms.
In order to use fo.net in your managed code, you must add fonet.dll which comes with the command line tool, as a rference to your project. A simple implementation of pdf generation using the filesystem is shown below.

private void GeneratePdf()
{
    FonetDriver driver = FonetDriver.Make();
    driver.CloseOnExit = true;
    driver.OnInfo += new Fonet.FonetDriver.FonetEventHandler(OnInfo);
    driver.OnWarning += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.OnError += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.Render(@"<path to fo file>", @"<path to pdf output file>");
}
  
private static void OnInfo(object driver, FonetEventArgs e)
{
    Console.WriteLine(e.GetMessage());
}

private static void OnWarning(object driver, FonetEventArgs e)
{
    Console.WriteLine(e.GetMessage());
}

Reference: Creating PDF documents from XML [link]
It seems that the performance from the library can be an issue when processing a large amount of data into pdf.

Using NFop

NFop is a Formatting Objects Processor (FOP) for XSL-FO that runs on the .NET Framework. It is a port from the Apache XML Project's FOP Java source to .NET's Visual J#. This makes it great for pure .NET reporting modules.
To use NFop in your project, you need to add nfop.dll which comes with this release; and vjslib.dll from Visual J# Distribution which can be downloaded from here.

Reference: Generating PDF reports using nfop [link]

NFop seems to have a same performace than FO.NET at least in the command line tool. NFop also is an open source projct not actively maintained now. It is older than FOP.NET.

Using Apache FOP

Apache FOP standalone command line tool has a better performance than both FOP.NET and NFop tools. In addition, apache project is actively maintained.

Integrating Apache fop with IKVM
http://balajidl.blogspot.com/2006/01/net-calling-apache-fop.html

Calling Apache fop as a command line app
http://stackoverflow.com/questions/20518905/opening-apache-fop-command-line-from-c-net-application

Sample Implementation Using FO.net

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt2 = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver(); 
    //DocBookFoXsl is the class of compiled styelsshets
    xslt.Load(typeof(DocBookFoXsl));
    // Execute the transform and output the results to a writer. If xml file is generated via xml serialization, provide that xml document to below
    xslt.Transform(XmlReader.Create(Server.MapPath("<path to given sample xml>"), settings), writer);
}
using (Stream fileStream = System.IO.File.Create(@"<path to output pdf>"))
{
    FonetDriver driver = FonetDriver.Make();
    driver.CloseOnExit = true;
    driver.OnInfo += new Fonet.FonetDriver.FonetEventHandler(OnInfo);
    driver.OnWarning += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.OnError += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.Render(newTree2.CreateReader(), fileStream);
}

This implementation took 9 - 30 seconds to convert the same sample xml file I used do to the evaluation.

Sample Implementation Using Apache FOP as a Command Line Tool

Following is the sample code to use apache fop comand line tool by invoking it as a process.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver();
    //DocBookFoXsl is the class of compiled styelsshets
    xslt.Load(typeof(DocBookFoXsl));
    // Execute the transform and output the results to a writer. If xml file is generated via xml serialization, provide that xml document to below
    xslt.Transform(XmlReader.Create(Server.MapPath("<path to given sample xml>"), settings), writer);
}


newTree.Save(@"<path to save generated xml fo as a temp file>");
var process = new Process();
string output;
try
{
    process.StartInfo.FileName = @"<path to apache fop installation folder>\fop.cmd";
    process.StartInfo.Arguments = @"-fo <path to temp saved xml fo file> -pdf <path to temp output pdf file>";
    process.StartInfo.UseShellExecute = false;
    process.StartInfo.RedirectStandardOutput = true;
    process.Start();
    // Synchronously read the standard output of the spawned process.
    StreamReader reader = process.StandardOutput;
    output = reader.ReadToEnd();
    // Write the redirected output to this application's window. In production code, use logger instead
    Console.WriteLine(output);
    process.WaitForExit();
}
catch (Exception e)
{
    //Handle exceptions
}
finally
{
    process.Close();
}

The above implementation took 3-7 seconds to convert the sample xml file to pdf.

As a conclusion, implemantation using apache fo as a command line tool to process fo xml is faster and stable than the other options. I have of course did not consider .net commercial fop libraries here.

1 comment:

Peter Floyd said...

Using C#, efficiently transform DocBook XML into HTML and PDF formats. Leverage libraries like ZetPDF, XMLReader, XSLT, and FO to parse and process XML, producing high-quality outputs suitable for web and print publication. I prefer ZetPDF. Thanks