Generate Thumbnail Images from PDF Documents in .NET

Using the Code

The code is quite simple with a try/catch over the main body. It is purposely in one large block so it's easy to see what it happening and to step through and examine with the debugger.

Initially we create an instance of AcroExch.PDDoc using late-binding. The referenced Adobe Acrobat 5.0 Type Library (Acrobat.tlb from C:\Program Files\Adobe\Acrobat 5.0 SDK\InterAppCommunicationSupport\Headers) does not expose a COM class you can create using early-binding. By referencing the type library we can get the Intellisense and strong-typing of the other Acrobat objects.

Pass the filename of the PDF documents to be opened to the PDDoc object, which can then be accessed to get metadata on the document; GetNumPages() and GetInfo() for custom document properties.

' Create the document (Can only create the AcroExch.PDDoc object using
' late-binding)
pdfDoc = CreateObject("AcroExch.PDDoc")

' Open the document
ret = pdfDoc.Open(inputFile)

If ret = False Then
Throw New FileNotFoundException
End If
' Get the number of pages
pageCount = pdfDoc.GetNumPages()

Set a reference to the first page of the document as pdfPage, which is of type Acrobat.CAcroPDPage. From this we can get a rectangle object of the actual page dimensions. One strange point to notice here is that the Adobe Acrobat SDK documentation seems incorrect, as the PDFRect that is returned from the GetSize() method has IDispatch properties x, y but the PDFRect we need to supply to CopyToClipboard must have left, right, top, bottom.

Finally we render the PDF page to the clipboard at full size. We could have Acrobat scale the image down for us by a percentage, but we can get better visual results using the .NET scaling algorithms of the Bitmap class.

It would have been more efficient to render directly to an off-screen bitmap, and also not have overwritten what ever was previously on the clipboard, but I found the clipboard method the most stable way to get a rendered bitmap of the page using Acrobat.

Although it looks like the pdfPage object has a DrawEx method that can take an H<CODE>DC I couldn't get the method to work in a consistently successful way. Calling DrawEx in the paint event of a Windows Forms application did work but it still wouldn't write to an off-screen bitmap directly. Therefore the clipboard method is used and if the process runs on a batch server it won't cause too much worry.

Note: the Draw method is deprecated, as it only works on Win16 systems where hWnd was unique to Windows and not to each process as on NT.

' Get the first page
pdfPage = pdfDoc.AcquirePage(0)

' Get the size of the page
' This is really strange bug/documentation problem
' The PDFRect you get back from GetSize has properties
' x and y, but the PDFRect you have to supply CopyToClipboard
' has left, right, top, bottom
pdfRectTemp = pdfPage.GetSize

' Create PDFRect to hold dimensions of the page
pdfRect = CreateObject("AcroExch.Rect")

pdfRect.Left = 0
pdfRect.right = pdfRectTemp.x
pdfRect.Top = 0

pdfRect.bottom = pdfRectTemp.y

' Render to clipboard, scaled by 100 percent (ie. original size)
' Even though we want a smaller image, better for us to scale in .NET
' than Acrobat as it would greek out small text
' see http://www.adobe.com/support/techdocs/1dd72.htm
Call pdfPage.CopyToClipboard(pdfRect, 0, 0, 100)

Dim clipboardData As IDataObject = Clipboard.GetDataObject()

Grab the rendered page bitmap from the clipboard and based on the pdfRectTemp object determine if it's a portait or landscape document. Set the correct file to load as the template, and if it is landscape, switch the width and height.

Dim pdfBitmap As Bitmap = clipboardData.GetData(DataFormats.Bitmap)

' Size of generated thumbnail in pixels
Dim thumbnailWidth As Integer = 38
Dim thumbnailHeight As Integer = 52
Dim templateFile As String
' Switch between portrait and landscape
If (pdfRectTemp.x < pdfRectTemp.y) Then

templateFile = templatePortraitFile
Else
templateFile = templateLandscapeFile
' Swap width and height (little trick not using third temp variable)
thumbnailWidth = thumbnailWidth Xor thumbnailHeight
thumbnailHeight = thumbnailWidth Xor thumbnailHeight
thumbnailWidth = thumbnailWidth Xor thumbnailHeight

End If

Load the template file as as Bitmap and as an Image. We use both because the Bitmap class supports MakeTransparent and the image can easily be passed to the Graphics.DrawImage() method. It is slightly inefficent but speed isn't the primarly objective for this application.

Render the pdfImage using the GetThumbnailImage() method of the .NET Framework Bitmap class, this provides a very smooth scaled version of the image.

Next create a blank bitmap with room for the template border. Set the templateBitmap to use the bottom-left pixel of the image as the transparency colour using calling MakeTransparent(). See an article on Chris Sells website for more on transparencies in .NET.

Using the new blank bitmap, draw the rendered pdf page image to it and then the template with transparency directly over the top. Because it is transparent the main area of the page template will still appear through.

Finally, save the composited image back as a .png or .gif file, although .png does look better.

' Load the template graphic
Dim templateBitmap As Bitmap = New Bitmap(templateFile)

Dim templateImage As Image = Image.FromFile(templateFile)

' Render to small image using the bitmap class
Dim pdfImage As Image = pdfBitmap.GetThumbnailImage(thumbnailWidth, _
thumbnailHeight, _
Nothing, Nothing)


' Create new blank bitmap (+ 7 for template border)
Dim thumbnailBitmap As Bitmap = New Bitmap(thumbnailWidth + 7, _
thumbnailHeight + 7, _
Imaging.PixelFormat.Format32bppArgb)

' To overlayout the template with the image, we need to set the transparency
' http://www.sellsbrothers.com/writing/default.aspx?
' content=dotnetimagerecoloring.htm
templateBitmap.MakeTransparent()

Dim thumbnailGraphics As Graphics = Graphics.FromImage(thumbnailBitmap)

' Draw rendered pdf image to new blank bitmap
thumbnailGraphics.DrawImage(pdfImage, 2, 2, thumbnailWidth, thumbnailHeight)


' Draw template outline over the bitmap (pdf with show through the
' transparent area)
thumbnailGraphics.DrawImage(templateImage, 0, 0)

' Save as .png file
thumbnailBitmap.Save(outputFile, Imaging.ImageFormat.Png)

Write some feedback to the console as we work through each of the files.

Then actively release the reference code to the COM objects as Acrobat it isn't the best suited application to opening and closing multiple PDF documents without falling over. Luckily the code doesn't cause Acrobat to display any UI that might cause the process to hang waiting for user interaction.

Console.WriteLine("Generated thumbnail... {0}", outputFile)

thumbnailGraphics.Dispose()

pdfDoc.Close()
Marshal.ReleaseComObject(pdfPage)
Marshal.ReleaseComObject(pdfRect)
Marshal.ReleaseComObject(pdfDoc)

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Better train people and risk they leave – than do nothing and risk they stay.” - Anonymous