Tuesday, February 2, 2016

Transforming DocBook XML contents into HTML and PDF in C# (Part 2)

This is a continuation from my previous article on transforming DocBook XML contents into HTML and PDF. In this article, I will compare some of FOP implementation options in C#.

.NET framework itself does not have its own implementation for handing FOP processing. Though there are few commercial fop libraries are there having the capability of doing fop to pdf conversion, it seems to be too expensive. No open source native .net fop implementations were found. From Java side, Apache is actively maintaining a open source project called Apache FOP which is easy to use, and has good performance. Following options are there in integration apache fop to generate PDF.

FO.NET [link]
NFop [link]
Apache FOP using IKVM
Calling Apache FOP command promt comands

Using FO.NET

FO.NET consists of 100% C# managed code. It is not just .NET integration, but a full port of FOP to the .NET environment. FO.NET is a port of Apache FOP to .NET based on version 0.20.4 of FOP and does not follow FOP's subsequent progress. How well FO.NET handles large amounts of data depends on the complexity of the XSL-FO. For reasonably complex XSL-FO developers said that they have seen processing times of about 1 second per page.

This is an open source project, but it seems its no longer actively maintained.

To use fo.net, download fo,net binaries, and place them in a folder. For manual conversion you may use the command line tool. A sample comand would be as follows. If it fails to create output pdf file, you may create a emty pdf file on the filesystem first, and give the location of it as the output pdf file location. The tool will overwrite the file with generated pdf content.

fonet -fo <path to fo xml file>  <path to output pdf file>

To generate three pages pdf from a sample fo file which is generated previously, fo.net command line took around 10000 ms.
In order to use fo.net in your managed code, you must add fonet.dll which comes with the command line tool, as a rference to your project. A simple implementation of pdf generation using the filesystem is shown below.

private void GeneratePdf()
{
    FonetDriver driver = FonetDriver.Make();
    driver.CloseOnExit = true;
    driver.OnInfo += new Fonet.FonetDriver.FonetEventHandler(OnInfo);
    driver.OnWarning += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.OnError += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.Render(@"<path to fo file>", @"<path to pdf output file>");
}
  
private static void OnInfo(object driver, FonetEventArgs e)
{
    Console.WriteLine(e.GetMessage());
}

private static void OnWarning(object driver, FonetEventArgs e)
{
    Console.WriteLine(e.GetMessage());
}

Reference: Creating PDF documents from XML [link]
It seems that the performance from the library can be an issue when processing a large amount of data into pdf.

Using NFop

NFop is a Formatting Objects Processor (FOP) for XSL-FO that runs on the .NET Framework. It is a port from the Apache XML Project's FOP Java source to .NET's Visual J#. This makes it great for pure .NET reporting modules.
To use NFop in your project, you need to add nfop.dll which comes with this release; and vjslib.dll from Visual J# Distribution which can be downloaded from here.

Reference: Generating PDF reports using nfop [link]

NFop seems to have a same performace than FO.NET at least in the command line tool. NFop also is an open source projct not actively maintained now. It is older than FOP.NET.

Using Apache FOP

Apache FOP standalone command line tool has a better performance than both FOP.NET and NFop tools. In addition, apache project is actively maintained.

Integrating Apache fop with IKVM
http://balajidl.blogspot.com/2006/01/net-calling-apache-fop.html

Calling Apache fop as a command line app
http://stackoverflow.com/questions/20518905/opening-apache-fop-command-line-from-c-net-application

Sample Implementation Using FO.net

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt2 = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver(); 
    //DocBookFoXsl is the class of compiled styelsshets
    xslt.Load(typeof(DocBookFoXsl));
    // Execute the transform and output the results to a writer. If xml file is generated via xml serialization, provide that xml document to below
    xslt.Transform(XmlReader.Create(Server.MapPath("<path to given sample xml>"), settings), writer);
}
using (Stream fileStream = System.IO.File.Create(@"<path to output pdf>"))
{
    FonetDriver driver = FonetDriver.Make();
    driver.CloseOnExit = true;
    driver.OnInfo += new Fonet.FonetDriver.FonetEventHandler(OnInfo);
    driver.OnWarning += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.OnError += new Fonet.FonetDriver.FonetEventHandler(OnWarning);
    driver.Render(newTree2.CreateReader(), fileStream);
}

This implementation took 9 - 30 seconds to convert the same sample xml file I used do to the evaluation.

Sample Implementation Using Apache FOP as a Command Line Tool

Following is the sample code to use apache fop comand line tool by invoking it as a process.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver();
    //DocBookFoXsl is the class of compiled styelsshets
    xslt.Load(typeof(DocBookFoXsl));
    // Execute the transform and output the results to a writer. If xml file is generated via xml serialization, provide that xml document to below
    xslt.Transform(XmlReader.Create(Server.MapPath("<path to given sample xml>"), settings), writer);
}


newTree.Save(@"<path to save generated xml fo as a temp file>");
var process = new Process();
string output;
try
{
    process.StartInfo.FileName = @"<path to apache fop installation folder>\fop.cmd";
    process.StartInfo.Arguments = @"-fo <path to temp saved xml fo file> -pdf <path to temp output pdf file>";
    process.StartInfo.UseShellExecute = false;
    process.StartInfo.RedirectStandardOutput = true;
    process.Start();
    // Synchronously read the standard output of the spawned process.
    StreamReader reader = process.StandardOutput;
    output = reader.ReadToEnd();
    // Write the redirected output to this application's window. In production code, use logger instead
    Console.WriteLine(output);
    process.WaitForExit();
}
catch (Exception e)
{
    //Handle exceptions
}
finally
{
    process.Close();
}

The above implementation took 3-7 seconds to convert the sample xml file to pdf.

As a conclusion, implemantation using apache fo as a command line tool to process fo xml is faster and stable than the other options. I have of course did not consider .net commercial fop libraries here.

Thursday, December 17, 2015

Transforming DocBook XML contents into HTML and PDF in C# (Part 1)

In previous articles in my blog I discussed on generating DOcBook standard XML and converting XML to HTML and PDF.

DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary .
DocBook stylesheets distribution http://wiki.docbook.org/topic/DocBookXslStylesheets.

System.xml namespace has support for xsl transformation in c#. XslCompiledTransform class is recommended to perform such a transform. In order to implement the transformation logic in the code, it is needed to compile xsl stylesheets in advance using xsltc tool which comes with Visual Studio. This is to avoid possible StackOverflowException in loading xsl into XslCompiledTransform since DocBook stylesheets are bit large.
Open the visual studio command prompt, navigate to the location of /docbook-xsl-1.71.0/ and pre-compile xsl as following.
Navigate into /docbook-xsl-1.71/fo/ folder and execute the following command in VS command prompt.

\docbook-xsl-1.71.0\html>xsltc /c:DocBookV5FoXsl docbook.xsl /settings:script+,dtd,document /out:DocBookV5FoXsl.dll

Likewise, navigate into /docbook-xsl-1.71/html/ folder and execute a similar command to get class DocBookV5HtmlXsl inside DocBookV5HtmlXsl.dll
Next, copy the generated dll files into solution, and add to your project as references. You may now perform the transformation as follows.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
 
//Object to hold output xml
XDocument newTree = new XDocument();

using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the pre-compiled style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    var xmlresolver = new MyXMLResolver();
    //Load the pre-compiled Docbook to Html xsl
    xslt.Load(typeof(DocBookHtmlXsl));
    // Execute the transform and output the results to a writer.
    xslt.Transform(XmlReader.Create(@"", settings), writer);
}

It is important that you need to pre-compile and load XSL, because if you try to load it as below, it will result in StackOverflowException. (reference)

XSL Transformation StackOverflow Scenario as follows:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
  
XslCompiledTransform xslt = new XslCompiledTransform();
var xmlresolver = new MyXMLResolver();
xslt.Load(XmlReader.Create(@"path to your docbook.xsl file", settings), null, xmlresolver);

Note:  Since docbook xsl contains xsl include statements such as <xsl:include href="../lib/lib.xsl"/> it is important that you create XmlReader passing xsl file location at xslt.Load(XmlReader.Create(@"<path to your docbook.xsl file>", settings), null, xmlresolver); line. If you try to create XmlReader with a Stream from the xsl file, it will create FileNotFoundExceptions.

Reference:
XSLT Compiler (xsltc.exe) [link]
How to: Perform an XSLT Transformation by Using an Assembly [link]

I will discuss  FOP implementation options in C# to convert generated FOP file into PDF in Part 2 of this article.

Thursday, November 12, 2015

DocBook XML to PDF Conversion via FOP - Manual Approach with Oxygen XML Editor

In my previous post, I discussed how to convert DocBook xml to html. The easiest way to generate pdf from DocBook xml is to use a pdf tool which can convert html to pdf.

If we are to use FOP (Formatting Objects Processor) to generate pdf, the conversion would be docbook5 —(xsl)—> xml.fo —(fop)—> pdf . Commands involved: xsltproc, fop. The following figure illustrates the pipeline.

Xml to Pdf conversion process via Fop

In this example, I have used DocBook version 1.71.0 for the transformation. DocBook version 1.71.0 standard's xsds can be downloaded from the sourceforge repositary.

From DocBook 1.71.0 distributon, following stylesheets can be used to transform XML files into HTML/ XML FO.

XML to HTML: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\html\docbook.xsl
XML to XML FO: \docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl

For better support in xml editing, xslt debugging, commercial tools are there such as Oxygen XML editor. Oxygen XML Editor also ships sample DocBook xml files, DocBook xsds, xslt and transformation tools by default.

Below is how to use Oxygen XML to do the transformation manually.

  • Open sample DocBook xml files in Oxygen Editor. You may use the sample DocBook version 5 files which is included in Oxygen Editor.
  • In the XML, append the below like after xml decleration tag.

<?xml-stylesheet type="text/xsl" href="<Path To>\docbook-xsl-1.71.0\docbook-xsl-1.71.0\fo\docbook.xsl"?>

  • Using the Oxygen Editor, perform the xml to xml fo transformation. You may save the output file with extension .fo
  • From XML FO, we need a formating objects processor (FOP) to generate PDF output. There are few commercial tools which supports .net for this. From Java background, apache foundation is maintaing an open source project called Apache FOP which can be used to generate pdf from xml fo.  For the POC, we have used Apache FOP command line tool which can be downloaded from here.
  • After downloading Apache FOP, extract it to a folder. In command prompt, navigate to the apache fop installation folder and execute the following command.
fop -fo -pdf

Apache FOP is an efficient tool which will generate the pdf within seconds.

Monday, November 2, 2015

Transforming DocBook XML Contents into HTML in C#

From my previous article, I talked on generating a DocBook  xml in C#. When you have a set of DocBook xmls, you need to transform it to a presentation format to make those book data readable. Here we will look on how to transform DocBook xml files into html.
One of main advantages pf DocBook, is that DocBook xml content files can be converted to many formats by xsl stylesheets transformations. Its easy as writing few code lines.

string xslMarkup = @"<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
    <xsl:template match='/Parent'>
        <Root>
            <C1>
            <xsl:value-of select='Child1'/>
            </C1>
            <C2>
            <xsl:value-of select='Child2'/>
            </C2>
        </Root>
    </xsl:template>
</xsl:stylesheet>";
XDocument xmlTree = new XDocument(
    new XElement("Parent",
        new XElement("Child1", "Child1 data"),
        new XElement("Child2", "Child2 data")
    )
);
             
//The output xml
XDocument newTree = new XDocument();
using (XmlWriter writer = newTree.CreateWriter())
{
    // Load the style sheet.
    XslCompiledTransform xslt = new XslCompiledTransform();
    xslt.Load(XmlReader.Create(new StringReader(xslMarkup)));
    // Execute the transform and output the results to a writer.
    xslt.Transform(xmlTree.CreateReader(), writer);
}


You can find DocBook xsl stylesheets distribution from http://wiki.docbook.org/topic/DocBookXslStylesheets
Also have a look on list of DocBook tools available at http://wiki.docbook.org/DocBookTools

Tuesday, September 1, 2015

Generating DocBook xml from C#.net

DocBook is a semantic markup language for technical documentation. As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages, Web help and HTML Help, without requiring users to make any changes to the source. The conversion from DocBook to other format may perform via XSLT transformations. Hence DocBook is quite useful in content authoring as an generic format to store data.

If maintainability and ease of implementation are considered, one of the good approach would be to use (Java) Jaxb equivalent implementation in C#.net. Using Jaxb, POJO (Plain Old Java Objects) classes can be generated from xml schema definition files. Generated POJOs can be used to serialze and deserialize data from/into xml files.

Following libraries and tools can be used to generate POCO (Plain Old C# Objects) from DocBook schema files.


In my attempt, the xsd.exe which is available with VS SDK was tried out to generate C# classes from DocBook xsd file. Here is how to use it.

  1. Download the xsd schema file from http://www.docbook.org/xml/5.0/xsd/docbook.xsd and place it in a folder. (I have used DocBook version 5.0 schema)
  2. Open Visual Studio command promot and cd to the folder which contains the docbook.xsd
  3. Run the command xsd docbook.xsd /c /l:CS

If you get following errors while generating classes from DocBook xsd; the easiest solution would be just to comment out the lines state in warnings and execute the same command with xsd.exe :)

Microsoft (R) Xml Schemas/DataTypes support utility
[Microsoft (R) .NET Framework, Version 4.0.30319.18020]
Copyright (C) Microsoft Corporation. All rights reserved.
Schema validation warning: The 'http://www.w3.org/1999/xlink:href' attribute is not declared. Line 46, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:type' attribute is not declared. Line 47, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:role' attribute is not declared. Line 48, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:arcrole' attribute is not declared. Line 49, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:title' attribute is  not declared. Line 50, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:show' attribute is not declared. Line 51, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:actuate' attribute is not declared. Line 52, position 6.
Schema validation warning: The 'http://www.w3.org/1999/xlink:label' attribute is  not declared. Line 7515, position 8.
Schema validation warning: The 'http://www.w3.org/1999/xlink:from' attribute is not declared. Line 7522, position 8.
Schema validation warning: The 'http://www.w3.org/1999/xlink:to' attribute is no t declared. Line 7523, position 8.
Warning: Schema could not be validated. Class generation may fail or may produce  incorrect results.
Error: Error generating classes for schema 'docbookV5'.
  - The attribute href is missing.
If you would like more help, please type "xsd /?".

Though I was able to generate POCOs from xsd.exe, the generated classes resulted in StackOverflowException when trying to initialize with XmlSerializer(). 

As a remedy, Xsd2Code can be successfully generate POCOs instead of xsd.exe.  The steps to use Xsd2Code can be found at http://xsd2code.codeplex.com/. Generated classes from Xsd2Code worked well without any issue.

After generating POCOs, you can include it in your Visual Studio project and use it to write into/ read from DocBook XML files by serializing/deserializing. Sysmet.Xml.Serialization.XmlSerialiser can be used to above purpose. Following utility class can be used for that.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Web;
using System.Xml.Serialization;

namespace DocBookXml
{
public static class XmlConverter
{
private static XmlSerializer _serializer = null;

///
/// Static constructor that initialises the serializer for this type
///
static XmlConverter()
{
_serializer = new XmlSerializer(typeof(T));
}

///
/// Deserialize the supplied XML into an object
///
///
///
public static T ToObject(string xml)
{
return (T) _serializer.Deserialize(new StringReader(xml));
}

///
/// Serialize the supplied object into XML
///
///
///
public static string ToXML(T obj)
{
using (var memoryStream = new MemoryStream())
{
_serializer.Serialize(memoryStream, obj);

return Encoding.UTF8.GetString(memoryStream.ToArray());
}
}

}
}

Reference documentation for DocBook XSL transforms
HTML edition of book explaining the use of DocBook XSL

Friday, December 6, 2013

The Speech I made When I Finished my District Presidency Term at 8th Annual Leo District Conference 306 A2

Mark Twain once said, "The man who does not read good books has no advantage over the man who cannot read them"

Chief Guest of the Occasion District Governor Elect, Lion Asanga Samarasekara MJF JP, all the other dignitaries at the head table, Leo leaders, Lion leaders, Leos, Lions, parents and my friends.
In life, if we do not achieve results, even if we have superb skills, we will be equivalent to those who don’t have them.  Leos, we are in a movement that tech us to achieve results. That’s how Leos get differentiated among others.

11 months back, in a same like District Conference, I did an acceptance speech when I was appointed as the District President of 306 A2. And I have brought that script today. If you can remember, I said there that this will be the “Year of Leadership and Fellowship”. And I asked everyone to take the leadership of their roles, and become “Leaders without Titles”. Leos, as pledged, we now have lots of new blood taking leadership within our district. I was counting the number of new Leos we have inducted during the year, it was around 351 members.I must thank every Leo member of my district, for taking the leadership and for working like one big family for a year.

Looking back the time we have passed, I feel very happy and proud of being able to lead such a wonderful set of brothers and sisters in 306 A2. We have extended 9 Leo Clubs, expanding our horizons to reach the boundary of A2, Monaragala is a milestone of the history. That was the highest number of club extension in the Multiple 306 A2 for the year and we got awarded for it at Lions Multiple Convention. We brought the first ever school clubs to the district, which was in our dreams for many years. Now 306 A2 has reached the milestone of 30 Leo Clubs.

Not only that, Leoistic year 2012/2013 have added lot more milestones to the history. We were able to keep a very strong administration in the district. The actual credit of that goes to all my Club Presidents, Council Officers and District Chairman for Leos for the wonderful job done on that. They are the people who have done the hard work, and I just did cheering to them. The Presidents’ Camp held in Kitulgala in the very beginning of the year, where we empowered Council Officers and Club Presidents with knowledge, right attitude, and built that strong bond of friendship between them.
Leos, the year we have just passed is full of achievements. The Leo Camp held in February was another record breaking event in the history. We had over 120 Leos taking part in the camp in Matale, making it the camp with the highest number of participation in the history. The first every A2 Sports Day, Healing with Talents – talents show, the newsletter “HeartBeat” have added colors to it. Remember Leos, during this short time, we had 2 Leo Camps, a Sports Day, a Talents Show while maintaining all the other annual events.

Not only locally, we have achieved a lot in international level also. District President’s Malaysia visit to attend International Relations Camp was a peak of it. You can see we have brought benefits from there to our Leos, so that we are sending Leo of the Year and Most Outstanding Leo Club President to Malaysia for a Youth Exchange Program in next July.  Our October Membership campaign was a great success bringing 156 new members to the district. We received 52 October Membership Growth Awards for that. As leaders, we can be truly proud of our world class achievements during this year, specially being the world’s top Leo District to win highest number of Leo October Membership Growth Awards. Today, I can very confidently say; as a Leo District, we are at top 10 Leo Districts in the world, if properly evaluated. I owe a lot to my Leos and Lions of A2 for their commitment and the support extended to me in achieving those results. Thank you very much again Lions, Leos and friends for the wonderful work.

So, as I said in the beginning, Leos success is all about using ours skills properly to achieve goals. And we have achieved lots of milestones this year. I am sure by now, our Lions are feeling very proud of their Leo District. As a District President, I am very proud of A2. I know it was not an easy job, specially for me, being a guy from outstation, and without having luxury of transport, being have to walk and travel by bus. But be happy, we all see the results of those today.
We have read good books this year.

Thank You.


The Speech I Made When I Took Over District Presidency in 8th Annual Leo District Installation 306 A2

United We Stand, Divided We Fall

Dignitaries of the head table, my dear Leos, Lions, parents and friends. Well, happy. I am happy to take the captainship of this wonderful district A2. I am even happy to have two special people in my Leoistic career as the chief guest and guest of honor today. Governor Duminda, he is like a father to my own leo club,
Leo Club of University of Moratuwa. And the other side, Shehan Kumar, he was the person who inducted me to the Leoism. Dear Sirs, I am honored to have you two in this head table.

Leos, this year is the “year of leadership and fellowship”. Leadership at every level is one of the key focuses for me. For that, we need to enlighten proper relationships among us. It is easy to work with friendly and open minded people than strangers. So that is where fellowship helps in grooming leaders.

Dear friends, today, as a country, Sri Lanka has a problem with genuine leaders. For over 30 years, we have suffered enough for wrong decisions and private agendas. Though we live in a resourceful country with a talented people appreciated worldwide, we have been a third world country for generations.

We need to ask this question from our hearts, Leos, do you want to tell your children one day, that we are still livening in a third world country. So what is the key; groom yourself first; and the rest of the world will then change. Grooming does not mean just having some leadership training stuff, but having correct
attitudes, and the correct level of knowledge and skills set. There comes another my key focus for this year, education and awareness. Leos, I will send you detailed program later, but I need you to remember these points when do projects.

Now I want to introduce two new Leo Clubs which joined our family yesterday.

As you may already know, these clubs were inducted at yesterday’s Lions Cabinet installation. I want to extend my special thank to Governor Duminda, PDG Lion Kamal and our orientation committee, headed by Leo Charith, and Lions Clubs of Pepiliyana Metro and Katuwawala who put lots of efforts in making it a reality.

Now I may call new clubs, please give a round of applause to welcome them. Leo Club of Pepiliyana Metro and Leo Club of Katuwawala

Leos, if you are still wondering around, your council has already started its engine; as the international president says. This year, one of my target is to bring five quality leo clubs to the district. And we have already laid the foundation stones by inducting two clubs. Leos, please keep 2-3-4th February for Camp and 15th of June for Conference. We have lots of more plans for you. So, start your engines now.

Finally I need to say thank you to set of special people. District presidency is a crown. When it’s worn, it shines. But it has some burden on it. These people helped me a lot in bearing that burden so far. Leo Charith, Leo Lochana, Leo Kasun Pathirana, Leo Kasun Lokugamage, Leo Damith, Leo Naveen, Leo Hiranya, Leo Randika, Leo Sadeepa, Leo Nadeesha; sorry if I missed any names; I owed you a lot brothers and sisters. Lion Keerthi uncle, thank you too for being there for me all the time. Lion Kamal uncle, he was a key person behind these new clubs as well. Thank you very much uncle for guiding me in the correct path for shaping things up for this year. And my dear Leos and past Leos of UOM, thank you very much for being a great strength to me always.

Finally I want to request from all my leos, please make this year a tough one for me, Otherwise I will make it a tough one for you. Anyway, I am expecting your support throughout the year. Remember, individually we are drops. But together we are an ocean. Lets not be drops; let’s be an ocean. May the triple gem bless you all.

Thank you.