Skip to main content

Python converting PDF to Image

I have a task to generate thumbnails of uploaded PDF's. And seems like there no really solid decisions yet. Just garbage on the surface in google results. after googling for a while I found out about many ways to do so. E.g. use python stdin/out to run external command line tool. It might work for you to. But it seemed not so pythonic for me. So I have searched fo better decision. My current is for now to install ImageMagick and MagicWand binding.

Install ImageMagick.

I have used PIL a while ago to work with images. But it made me cry, before I have met sorl-thumbnails. It helped me a lot. But now I have to deal with PDF's. And ImageMagick seems like a complete decision to master it all.  It has to convert pdf to my direct desirables - jpeg. So to install ImageMagick I have used brew. Like this:
brew install imagemagick
However there are many other ways to do so, depending on a platform. But I strongly recommend to look at brew.

Anyway installing ImageMagick is tricky. And in order to have it installed to work with pdf's we need to have freetype and ghostscript packages. In case of absence of ghostscript you could have error like so:
wand.exceptions.DelegateError: Postscript delegate failed 'file.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/682
In case of freetype package absence you will have you PDF rendered without fonts. So be sure those 2 are certainly installed.

Installing Wand

There are several high level bindings for ImageMagick for python, But I have chosen wand as my favorable here.
This is strongly depends on a platform. But nowdays fortunately I can do:
pip install Wand
And I'm happy with it.
Wand is simple enough for my task so I can do convert PDF to image and do simple transformations of my choice with it.

Working with it

Now that we have those things installed we may convert a pdf into image and resize it afterwards.
from wand.image import Image
# Converting first page into JPG
with Image(filename="/thumbnail.pdf[0]") as img:
     img.save(filename="/temp.jpg")
# Resizing this image
with Image(filename="/temp.jpg") as img:
     img.resize(200, 150)
     img.save(filename="/thumbnail_resize.jpg")
I'm sure there are better solutions here.  Note this is a simplified example to show the whole point of this method.

Feel free to suggest better solution in comments.

Comments

  1. Hey Lurii! Thanks for sharing this incredible piece of information. Let me tell you about my experience. I have been using this JPG to PDF converter. It is quite nice and moreover, its free and cool.

    ReplyDelete
  2. Thanks for the post! I noticed that when I did this, the image quality was rather poor. I tried resizing the image before saving it, but that ruined the image. It basically looked like a blank white page, with just a few specks here and there. Do yo have any advice? Thanks!

    ReplyDelete
  3. on some systems you will also need to install ghostscript in order to read pdf in imagemagic

    ReplyDelete
  4. I had a problem with quality as well converting a multi-page pdf to a bunch of PNGs. Try this:

    Image(filename="/thumbnail.pdf[0]", resolution=300) as img:

    300 worked fine for me since I'm just using OCR on the images, you may need to bump resolution up.

    ReplyDelete
  5. PDFDelegateFailed `The system cannot find the file specified.
    ' @ error/pdf.c/ReadPDFImage/800

    this error is coming !! what should i do man !?

    ReplyDelete
  6. PDFDelegateFailed `The system cannot find the file specified.
    ' @ error/pdf.c/ReadPDFImage/800

    i am getting some path error , how should i solve this ???

    ReplyDelete
  7. okay i solved the problem, sometimes the pdf is in binary form so we need to give commands like this.

    from wand.image import Image
    # Converting first page into JPG
    with Image(filename="/thumbnail.pdf[0]") as img:
    img.save(blob=response.rendered_content)
    # Resizing this image
    with Image(filename="/temp.jpg") as img:
    img.resize(200, 150)
    img.save(filename="/thumbnail_resize.jpg")

    response = PDFTemplateResponse(
    request,
    template='reports/report_email.html',
    filename='weekly_report.pdf',
    context=render_data,
    cmd_options={'load-error-handling': 'ignore'})

    ReplyDelete

Post a Comment

Popular posts from this blog

Pretty git Log

SO you dislike git log output in console like me and do not use it... Because it looks like so: How about this one? It's quite easy... Just type: git log - - graph - - pretty = format : '%Cred%h%Creset -%C ( yellow ) %d%Creset %s %Cgreen ( %cr) %C ( bold blue ) <%an>%Creset' - - abbrev - commit - - It may be hard to enter such an easy command every time. Let's make an alias instead... Copypaste this to your terminal: git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --" And use simple command to see this pretty log instead: git lg Now in case you want to see lines that changed use: git lg - p In order for this command to work remove  the -- from the end of the alias. May the code be with you! NOTE: this article is a rewritten copy of  http://coderwall.com/p/euwpig?i=3&p=1&t=git   and have b

Django: Resetting Passwords (with internal tools)

I have had a task recently. It was about adding a forms/mechanism for resetting a password in our Django based project. We have had our own registration system ongoing... It's a corporate sector project. So you can not go and register yourself. Admins (probably via LDAP sync) will register your email/login in system. So you have to go there and only set yourself a password. For security reasons you can not register. One word. First I've tried to find standart decision. From reviewed by me were: django-registration and django password-reset . These are nice tools to install and give it a go. But I've needed a more complex decision. And the idea was that own bicycle is always better. So I've thought of django admin and that it has all the things you need to do this yourself in no time. (Actually it's django.contrib.auth part of django, but used out of the box in Admin UI) You can find views you need for this in there. they are: password_reset password_reset_

Time Capsule for $25

The real article name might be something like:  Configuring Raspbery Pi to serve like a Time Capsule with Netatalk 3.0 for Mountain Lion.  But it's too long ;) Here I will describe the process of using Raspberry Pi like a Time Machine in my network. To be able to backup your MAC's remotely (Like it would be NAS of some kind). It assumes you have a Raspberry Pi and have installed a Raspbian there and have a ssh connection, or somehow having access to it's console. Refer to my previous article for details . Now that we have a Pi that is ready for action let's animate it. So to make it suit you as a Time Capsule (NAS) for your MAC's you need to do those basic steps: - connect and configure USB hard drive(s) - install support of HFS+ filesystem to be able to use MAC's native filesystem - make mount (auto-mount on boot) of your hard drive - install Avahi and Netatalk demons - configure Netatalk daemon to make it all serve as a Time Machine - configure