Sharing two utility scripts (Perl), which can be used for extracting only few pages from a PDF file and also for combining different pdf files into a single pdf files. There might be lot of situation where in, we might require only few pages from a big PDF files. So instead of storing / carrying such a big PDF files, we could extract the page or pages which we require from the original big PDF files. Also we could combine different pdf files into a single pdf file using the second script. As an example, if we have extracted pages 10-15 and pages 100-120 from a big pdf file using the pdf_extract.pl and
we can combine these two pdf files (i.e. pdf which contains pages 10-15 and pdf which contains
pages 100-120) into a single pdf file using pdf_merge.pl
NOTE : These two perl scripts uses a perl module called PDF::API2. If this is not present on your system as part of the perl installation, you can download these modules from www.cpan.org and install. Please see the installation section for more details.
These two scripts can be used on windows, unix or linux. Currently tested on Windows with active perl 5.8.8, but it should work on unix and linux as well. For the pdf_extract.pl script to work on unix and linux, please change the variable called "path_separator" to "/" instead of "\\". This variable can be seen at the starting of the script. pdf_merge.pl can be used both on windows and unix/linux without any modification
Usage:
1) pdf_extract.pl
perl pdf_extract.pl -i <input pdf file> -p <page or page range which needs to be extracted>
where
-i : Please give the full path to the input PDF file
-p : Page Number or Page range which needs to be extracted from the input PDF
example : To extract pages 3 to 5, execute
perl pdf_extract.pl -i /tmp/abc.pdf -p 3-5
example : To extract only page 3, execute
perl pdf_extract.pl -i /tmp/abc.pdf -p 3
Executing with -h option will display the usage onto the screen
Example : perl ./pdf_extract.pl -h
2) pdf_merge.pl
perl pdf_merge.pl <output pdf file with full path> <input pdf file 1> <input pdf file 2> etc
Execute the script with all the pdf file which needs to be merged.
Script will merge in the same order which is given in the input
i.e. If you execute like pdf_merge.pl /tmp/out.pdf /tmp/abc.pdf /tmp/xyz.pdf
then pages from xyz.pdf will be after pages from abc.pdf
Executing with -h option will display the usage onto the screen
Example : ./pdf_merge.pl -h
CodeBase:
README File:
pdf_extract.pl
pdf_merge.pl
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/