PDF Tools: Merging, Extracting and Cropping

13 Apr 2012

Category: utility. Tags: pdf, ghostscript, pdfcrop.

PDF manipulations such as merging, extracting and croping are the most ordinary things in eveyday life. But people feel difficult to do such jobs because of unawareness of some exordinary PDF tools. I’m gonna to introduce you a few PDF tools.

Ghostscript

The first tool (or toolkit, more precisely) is Ghostscript. In Archlinux, simply use pacman to install it: pacman -S ghostscript.

Ghostscript is very powerful and complicated, PDF manipulation is only a small part of it. The main concept in Ghostscript toolkit is DEVICE, which instruct the output format. Fortunately, we need only know pdfwrite device to manipulate PDF.

Merging and Extracting

The command line to merge PDF:

gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dQUIET -sOutputFile=out.pdf in1.pdf in2.pdf

out.pdf is the name of output PDF, and inx.pdf are input PDFs. And the command line to extract page x to page y from a PDF:

gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dQUIET -dFirstPage=x -dLastPage=y -sOutputFile=out.pdf in.pdf

Bounding Box

There is another interseting device in Ghostscipt: bbox. This device output the bounding box for each page in a PDF. The bounding box is the minium page size including all content in a page.

gs -sDEVICE=bbox -dBATCH -dNOPAUSE -dQUIET -dFirstPage=x -dLastPage=y in.pdf

PDFCrop

The second tool is PDFCrop, which is a perl script shipped within tex suites. It’s different from another PDFCrop. PDFCrop can cut the whitespace edge in PDF pages automatically. In default, PDFCrop use Ghostscript bbox device to determine the bounding box for each page and crop that page using the detected bounding box. However, it’s possible to specify a bounding box for all pages instead of detect it every page. You may also set different bounding box for odd and even pages or add margins to each page. For more usage information, refer to pdfcrop --help.

Cropping

To crop a PDF with PDFCrop:

pdfcrop original.pdf cropped.pdf

But there is a new problem: the cropped PDF has a different pagesize compared to the original one. We should resize the cropped PDF to its original pagesize. This is easy with Ghostscript:

gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dQUIET -sPAPERSIZE=a4 -dPDFFitPage -sOutputFile=resized.pdf cropped.pdf

In the above example, the pagesize of original PDF is assumed to A4. And the -dPDFFitPage option make sure the cropped PDF is resized instead of expanded.