PDF Split & Merge with Perl

by Atul Singh on February 20, 2017 in Administration, perl, Script, Shell Script

Sharing two utility scripts (Perl), which can be used for extracting only few pages from a PDF file and also for combining different pdf files into a single pdf files. There might be lot of situation where in, we might require only few pages from a big PDF files. So instead of storing / carrying such a big PDF files, we could extract the page or pages which we require from the original big PDF files. Also we could combine different pdf files into a single pdf file using the second script. As an example, if we have extracted pages 10-15 and pages 100-120 from a big pdf file using the pdf_extract.pl and
we can combine these two pdf files (i.e. pdf which contains pages 10-15 and pdf which contains
pages 100-120) into a single pdf file using pdf_merge.pl

NOTE : These two perl scripts uses a perl module called PDF::API2. If this is not present on your system as part of the perl installation, you can download these modules from www.cpan.org and install. Please see the installation section for more details.

These two scripts can be used on windows, unix or linux. Currently tested on Windows with active perl 5.8.8, but it should work on unix and linux as well. For the pdf_extract.pl script to work on unix and linux, please change the variable called "path_separator" to "/" instead of "\\". This variable can be seen at the starting of the script. pdf_merge.pl can be used both on windows and unix/linux without any modification

Usage:

1) pdf_extract.pl

perl pdf_extract.pl -i <input pdf file> -p <page or page range which needs to be extracted>

where
-i : Please give the full path to the input PDF file
-p : Page Number or Page range which needs to be extracted from the input PDF

example : To extract pages 3 to 5, execute

perl pdf_extract.pl -i /tmp/abc.pdf -p 3-5

example : To extract only page 3, execute

perl pdf_extract.pl -i /tmp/abc.pdf -p 3

Executing with -h option will display the usage onto the screen

Example : perl ./pdf_extract.pl -h

2) pdf_merge.pl

perl pdf_merge.pl <output pdf file with full path> <input pdf file 1> <input pdf file 2> etc

Execute the script with all the pdf file which needs to be merged.
Script will merge in the same order which is given in the input

i.e. If you execute like pdf_merge.pl /tmp/out.pdf /tmp/abc.pdf /tmp/xyz.pdf

then pages from xyz.pdf will be after pages from abc.pdf

Executing with -h option will display the usage onto the screen

Example : ./pdf_merge.pl -h

CodeBase:

README File:
pdf_extract.pl
pdf_merge.pl

Like the below page to get update
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

About Atul Singh
I am a Data Consultant at a Canadian financial firm. My keen interests varies from Data Analytics, ML, Kubernetes, NLP to ETL. I love to blog and travel in my spare time. If you’d like to get in touch, feel free to say hello through any of the social links.

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.

The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.

DataGenX - Atul's Scratchpad

Breaking

Monday, February 20, 2017

PDF Split & Merge with Perl

-

Follow Us

Search This Blog

Blog Archive

Disclaimer