Gnuritas

aug 17, 2009

Recompressing (optimising) PDF files

There are at least three ways to do this in Ubuntu. You will need the packages ghostscript (for all methods, but installed by default) and pdftk (for method 2), and optionally a Java Runtime Environment (for method 3).

Method 1: ps2pdf

The ps2pdf script that comes with Ghostscript is meant to convert PostScript to PDF, but it will happily take PDF-files as input. Just try: ps2pdf input.pdf output.pdf

You can add GhostScript options to control the PDF-output. To get smaller files you can try adding one of the preset-options: -dPDFSETTINGS=/ebook or -dPDFSETTINGS=/screen

For more advanced settings: http://pages.cs.wisc.edu/~ghost/doc/cvs/Ps2pdf.htm

Note that the screen preset converts all images to sRGB and converts to PDF 1.3 which does not support all types of gradients and transparency. In some cases this may cause text to be converted into an image. Also, GhostScript does not seem to do a great job at converting CYMK colours to sRGB, so if your colours come out looking all weird after conversion your original document probably used CYMK colours. In such cases I first convert to a PDF-file with device-independent colours (where possible, and otherwise they should be kept as CYMK), and then run a second pass in which I compress the file but keep it as PDF 1.4 (or a higher version, which you should specify with a switch such as -dCompatibilityLevel=1.5):

ps2pdf -dColorConversionStrategy=/UseDeviceIndependentColor -dUseCIEColor input.pdf input-ciecolor.pdf

ps2pdf -dColorConversionStrategy=/UseDeviceIndependentColor -dUseCIEColor -dColorImageDownsampleType=/Bicubic -dColorImageResolution=72 -dGrayImageResolution=72 -dMonoImageResolution=300 -dDownsampleColorImages=true -dDownsampleGrayImages=true -dDownsampleMonoImages=true -dOptimize=true -dProcessColorModel=/DeviceCMYK input-ciecolor.pdf output-screen.pdf

At least this shouldn’t make things worse with regard to colour, and it also retains your text as text.

Method 2: The compress-newsletter script

The above methods works well, but you will probably lose any metadata in the document, including bookmarks, author information, etc.

If you need to keep metadata, you can try the following perl-script. It was especially written to recompress Scribus PDF output without losing metadata:

http://www.capca.ucalgary.ca/~wdobler/utils/compress-newsletter.html

Method 3: Multivalent

A Java-based browser and toolbox for digital documents: http://multivalent.sourceforge.net/

You’ll need to download the jar-file. It has a recompression class that you can run from the commandline. For example:

CLASSPATH=":/usr/local/lib/Multivalent20060102.jar" java tool.pdf.Compress -jpeg input.pdf

(This assumes of course that you’ve copied the jar-file to /usr/local/lib first, e.g. sudo mv ~/Desktop/Multivalent20060102.jar /usr/local/lib)

This method is easy, but may or may not work well depending on the input file so be sure to always check your output!