It is often convenient to pour a series of JPEG (or PNG, or GIF) files into a PDF, for example for printing or for e-mailing. Given the power of the Linux command line, this is surprisingly difficult, but I found a fairly straightforward way to do it. Skip to the bottom if you just want the oneliner.
Many websites will tell you the following:
convert *.jpg output.pdf
Easy, no? Don't do this. Why? Look at this:
-rw-r--r-- 1 thomas thomas 129826204 2009-06-15 15:29 output.pdf -rw-r--r-- 1 thomas thomas 947022 2009-06-15 15:04 page1.jpg -rw-r--r-- 1 thomas thomas 962956 2009-06-15 15:05 page2.jpg -rw-r--r-- 1 thomas thomas 925291 2009-06-15 12:54 page3.jpg -rw-r--r-- 1 thomas thomas 952717 2009-06-15 12:54 page4.jpg -rw-r--r-- 1 thomas thomas 642471 2009-06-15 15:08 page5.jpg
The original JPG files are less than 5 MB altogether, but the resulting PDF is a whopping 124 MB! Clearly, convert
(from the otherwise excellent ImageMagick bundle) re-encodes the images somehow, instead of embedding them straight into the PDF file.
Enter the little-known utility sam2p
. It comes in an Ubuntu package of the same name. In its simplest form, it converts a single image file into a PDF by embedding the image file into the PDF file. For example:
sam2p page1.jpg page1.pdf
One of the shortcomings of sam2p
is that it does not allow you to set the page size directly, so you'll end up with PDFs that exactly fit the original images.
Now we can generate all the pages as separate PDFs, but sam2p
cannot create a PDF with multiple pages. Enter pdfjoin
from the pdfjam
package (available in Ubuntu under that name). It is simple to use:
pdfjoin page*.pdf --outfile output.pdf
This will use a consistent page size, so it is no problem that sam2p
spit out pages of arbitrary size. It defaults to A4 paper; specify --paper letterpaper
to use the Letter format.
Because I'm lazy, I wrote a little bash
oneliner to do the trick, then let my readers improve upon it (thanks Mark, thanks Eamon!). It is now a twoliner, but who cares:
find . -maxdepth 1 -iname 'page*.jpg' -exec sam2p '{}' '{}'.pdf \;
This assumes that your input images are named
pdfjoin page*.pdf --outfile output.pdfpage1.jpg
, page2.jpg
etcetera, and that there are no files named like page*.pdf
in the current directory. If you have more than 9 pages, remember to prefix a zero to keep them in order. If you want to do this for PNG or other images, remember to change the extension in both places.