Python converting PDF to Image

I have a task to generate thumbnails of uploaded PDF's. And seems like there no really solid decisions yet. Just garbage on the surface in google results. after googling for a while I found out about many ways to do so. E.g. use python stdin/out to run external command line tool. It might work for you to. But it seemed not so pythonic for me. So I have searched fo better decision. My current is for now to install ImageMagick and MagicWand binding.

Install ImageMagick.

I have used PIL a while ago to work with images. But it made me cry, before I have met sorl-thumbnails. It helped me a lot. But now I have to deal with PDF's. And ImageMagick seems like a complete decision to master it all. It has to convert pdf to my direct desirables - jpeg. So to install ImageMagick I have used brew. Like this:

brew install imagemagick

However there are many other ways to do so, depending on a platform. But I strongly recommend to look at brew.

Anyway installing ImageMagick is tricky. And in order to have it installed to work with pdf's we need to have freetype and ghostscript packages. In case of absence of ghostscript you could have error like so:
wand.exceptions.DelegateError: Postscript delegate failed 'file.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/682
In case of freetype package absence you will have you PDF rendered without fonts. So be sure those 2 are certainly installed.

Installing Wand

There are several high level bindings for ImageMagick for python, But I have chosen wand as my favorable here.
This is strongly depends on a platform. But nowdays fortunately I can do:

pip install Wand

And I'm happy with it.
Wand is simple enough for my task so I can do convert PDF to image and do simple transformations of my choice with it.

Working with it

Now that we have those things installed we may convert a pdf into image and resize it afterwards.

from wand.image import Image
# Converting first page into JPG
with Image(filename="/thumbnail.pdf[0]") as img:
     img.save(filename="/temp.jpg")
# Resizing this image
with Image(filename="/temp.jpg") as img:
     img.resize(200, 150)
     img.save(filename="/thumbnail_resize.jpg")

I'm sure there are better solutions here. Note this is a simplified example to show the whole point of this method.

Feel free to suggest better solution in comments.

Comments

Mark HenrySun May 31, 08:37:00 AM GMT+3
Hey Lurii! Thanks for sharing this incredible piece of information. Let me tell you about my experience. I have been using this JPG to PDF converter. It is quite nice and moreover, its free and cool.
ReplyDelete
Replies
VeechyMon Jun 01, 10:53:00 PM GMT+3
Thanks for the post! I noticed that when I did this, the image quality was rather poor. I tried resizing the image before saving it, but that ruined the image. It basically looked like a blank white page, with just a few specks here and there. Do yo have any advice? Thanks!
ReplyDelete
Replies
UnknownWed Jun 10, 06:56:00 PM GMT+3
on some systems you will also need to install ghostscript in order to read pdf in imagemagic
ReplyDelete
Replies
UnknownThu Jul 30, 07:25:00 PM GMT+3
I had a problem with quality as well converting a multi-page pdf to a bunch of PNGs. Try this:

Image(filename="/thumbnail.pdf[0]", resolution=300) as img:

300 worked fine for me since I'm just using OCR on the images, you may need to bump resolution up.
ReplyDelete
Replies
UnknownMon Dec 14, 06:57:00 PM GMT+2
PDFDelegateFailed `The system cannot find the file specified.
' @ error/pdf.c/ReadPDFImage/800

this error is coming !! what should i do man !?
ReplyDelete
Replies
UnknownMon Dec 14, 06:58:00 PM GMT+2
PDFDelegateFailed `The system cannot find the file specified.
' @ error/pdf.c/ReadPDFImage/800

i am getting some path error , how should i solve this ???
ReplyDelete
Replies
UnknownTue Dec 15, 06:26:00 AM GMT+2
okay i solved the problem, sometimes the pdf is in binary form so we need to give commands like this.

from wand.image import Image
# Converting first page into JPG
with Image(filename="/thumbnail.pdf[0]") as img:
img.save(blob=response.rendered_content)
# Resizing this image
with Image(filename="/temp.jpg") as img:
img.resize(200, 150)
img.save(filename="/thumbnail_resize.jpg")

response = PDFTemplateResponse(
request,
template='reports/report_email.html',
filename='weekly_report.pdf',
context=render_data,
cmd_options={'load-error-handling': 'ignore'})
ReplyDelete
Replies
UnknownFri Apr 01, 02:39:00 PM GMT+3
Awesome thanks
ReplyDelete
Replies
UnknownSun Dec 18, 11:11:00 AM GMT+2
Dude I want to hug you :)....
ReplyDelete
Replies

Add comment

Programmer blog

Search This Blog