asp.net consultancy chorley ASP.NET Server Side Scripting Wigan
Website Development Radcliffe

IT Services And Support

Email : ren@techsolus.co.uk
Mobile Phone : 0788 68 41 411
Answerphone : 01204 469683
bespoke invoice systems Standish Information Technology Advice Worsley
    development systems Ramsbottom
data manipulation Farnworth access databases Whitefield SQL connections Atherton
online accessible software Leyland software compatible Coppull


Get connected and Online Tottington
remote management Makerfield code and design Aspull

ItextSharp HTML to PDF, VB.Net...The Basics - 09/11/2011

Oh...My...Goodness!

People write the most complex, convoluted and incredibly unesseccary code.  All I wanted to do was take some HTML from a screen scrape and save this as a PDF file, whilst using ASP.Net on a website.  There are some commercial bits of software out there and I'm sure they're super clever and super easy to use...but I'm tight and my customer is tight too, so we did not want to spend any money. 

ITextSharp is free to use.  It also appears to be the most popular.  But of course I've never used it before so I'm at the bottom of a steep learning curve.  I spent the best part of a day trawling through endless lines of complex code, forums full of patronising experts and dead end paths to spamming pages.  Here's what I've come up with...

At the top of the page....

<% @ Page Language="vb" Debug="true" validateRequest = "false" %>
<% @ Import Namespace = "System.net" %>
<% @ Import Namespace = "System.io" %>
<% @ Import Namespace = "iTextSharp" %>
<% @ Import Namespace = "iTextSharp.text" %>
<% @ Import Namespace = "iTextSharp.text.pdf" %>
<% @ Import Namespace = "iTextSharp.text.html" %>

And the code...

sub page_load

'do some code here to get the html, that html is saved in the string variable "myhtml"

dim mydoc As New Document(PageSize.A4, 20, 20, 20, 20)

PdfWriter.getInstance(mydoc, new FileStream(server.mappath("\x\pdf\")  & "invoice.pdf", FileMode.Create))

mydoc.Open()

dim myhtmlworker as new simpleparser.htmlWorker(mydoc)

myhtmlWorker.Parse(new StringReader(myhtml))

mydoc.Close()

end sub

 

 

So that's the code, but that may not mean a lot to you.  The first thing is to install iTextSharp into your application.  

Some forums said if you don't know how to put a .DLL file into your site and reference it then you shouldn't be coding!  Well I've been coding for 30 years now, and I've never done it, so screw them.  What you do is go to  http://itextpdf.com/ and follow the links to download a DLL file.  There's a handful of choices, you want the DLL thingy.  A DLL is "Dynamic Link Library"...no...me either...but it's basically a bit of compiled code that's ready to run.  Now if like my your "application" is really just a website running on a windows box then the DLL file needs to be put in the "bin" folder.  Typically that's in the root of the web folder, the bin folder lives in the root of the application.  

So I've put the DLL file in the bin folder, smashing.  Everyone says I need to reference it, I'm thinking the app needs to know it's there and I somehow have to tell something somewhere that it's there.  Nope.  Just be sure to have the right "Imports", or if ya use code behind then "Using"

<% @ Import Namespace = "iTextSharp" %>
<% @ Import Namespace = "iTextSharp.text" %>
<% @ Import Namespace = "iTextSharp.text.pdf" %>
<% @ Import Namespace = "iTextSharp.text.html" 

In this instance I'm trying to convert HTML to PDF.  I do some code to get the HTML, in this case it's a screen scrape, you're might be hard coded HTML, compiled and whatnot, but you need to end up with a string that contains the HTML.  Not going to show that, that's for another article.  We are going to toak the HTML and turn it into a PDF...

dim mydoc As New Document(PageSize.A4, 20, 20, 20, 20)

mydoc is the variable or object that becomes the PDF file, "New Document" is part of itextsharp's code to start the new document

PdfWriter.getInstance(mydoc, new FileStream(server.mappath("\x\pdf\")  & "invoice.pdf", FileMode.Create))

PdfWriter is another itextsharp command, not sure here but I suspect the file is created here....

mydoc.Open()

I guess we're opening the pdf document to put something in there

dim myhtmlworker as new simpleparser.htmlWorker(mydoc)

I think here we create an object myhtmlworker that can turn html into pdf speak

myhtmlWorker.Parse(new StringReader(myhtml))

make the html to pdf translator translate the html string

mydoc.Close()

make sure we close the document, otherwise smelly geeks will tell you off for leaving untidy bits of file lying around in memory

 

So...er...how well does it work?  Not very well.  For simple HTML it works quite well, but so far it does not even attempt to parse CSS inline or otherwise.  It seems to handle an image though, which is good.  There's countless myriads of forums and documention out there, this page is just to get you started rather than wallowing through lines upon lines of super complicated code.

Post A Comment

Name Comment
Gibo thanks for the info. quick, simple idiot proof and it worked for me.
Raz Awesome Man. Thank you.
Paul Thanks very much for this - just what I needed in 2015 :-) Sometimes the simplest solutions are the best!
programming services asp.net specialist
Valid XHTML 1.0 Transitional
Admin
GD