Archive

Posts Tagged ‘perl’

-e to execute from command line

September 23rd, 2009 2 comments

One of the useful tools in the unix world is the command line. Often you may have a request like

give me a list of all the unique Agencies specified in all the XML files in a given directory tree where an agency is shown in an element <agency id="IJKXYZ">

this can be easily achived (for pretty printed XML documents, each element on it’s own line) with a variety of unix tools like so

find . *xml | xargs grep agency.id $1 | awk -F '"' '{print $2}' | sort | uniq -c

find all the *.xml files from the current directory, for this list execute grep using xargs and grep for the agency.id pattern (the ‘.’ matches any character, in this case a space, ‘ ‘) use awk, delimiting on double quotes, to print out the ID’s, then sort them and run them through uniq with -c flag to get the count of occurrences for every agency ID. Yes, there are no doubt other ways of doing this, and different sort’s of optimisations but this is just the way my brain thinks and wire’s together these commands.

As already mentioned, this will only work if the XML is pretty printed and not if all the white space has been removed, there may also be many cases where the request is more complicated but it feels like opening up a text editor to write a program should not be necessary. This is where the -e option of many scripting languages comes in, it lets you run the language from the command line.

The basic hello world in a couple of such languages

groovy -e "println 'hi'"
perl -e 'print "hi\n"'
ruby -e 'puts "hi\n"'
jruby -e 'puts "hi\n"'

As a more complex example you may want to find out the day of the week for a given date. This can be done like so

$ groovy -e 'println Date.parse("MM/dd/yyyy", "12/13/1974").format("EEEE")'
Friday

of course again there may be a unix command out there which is more concise or which the user may be more familiar with, in which case great, use that instead. If on the other hand it is a situation where the code solution comes to you immediately then why not use it? If this is the way your brain is wired, and you are more competent at using your programming language of choice then why go reading man pages when you can do this simply with a scripting language and the -e flag.

Counter to that argument I had a snippet in PowerShell, the Microsoft Windows shell language for finding the day for a given date and it goes like this

PS> (get-date "12/13/1974").DayOfWeek

Very concise indeed.

Of course the real power of scripting languages is when you use them in conjunction with pipes and unix tools. In this case I want a histogram of how many files I modify for each day of the week for a given directory.

groovy -e 'new File(".").eachFile{file -> println new Date(file.lastModified()).format("EEEE")}' \
 | sort | uniq -c | sort -rn
 
19 Monday
15 Tuesday
10 Wednesday
5 Friday
4 Thursday

Given that it was a work directory then the work day’s only is understandable. Also as I am running this on a Wednesday morning that may skew the results of “last modified” if I have modified most files. Still it seems that Thursday and Friday are the low parts of the week.

Feel free to add how you use scripting languages from the command line in the comments. Do be careful cutting and pasting any code samples and make sure you know what you are doing prior to doing so.

I will look at a few common examples I use regularly in future posts and I hope to see more people using scripting languages from the command line to solve their problems.

How to “stick” several PDF’s into 1 document

February 7th, 2009 No comments

I had some paperwork that I needed to scan and email out for my son’s kinder. Scanning was easy, even with some OCR, the scanner automatically created 4 separate PDF documents. Now to bring them into 1 document, hmm? Let’s try google.

Sure every man and his dog is suggesting methods under searches like “merge pdf documents” from how to articles that say to click on the “combine” menu item, which seems not to apply to my Mac all the way through to a bunch of shareware/freeware/paid software. But how do you know which is best? which one is safe? and which one is worth the purchase? for something this small nothing. I don’t want to go through the pain of working this out, trying some software, only to find out it will water mark the result or something.

Why not write it myself? I am playing around with the groovy programming language at the moment so why not try that? I took a look, “groovy pdf library” of course with a name like that I found a lot of groovy “really cool” PDFs. As Groovy works seemlesslly with Java and the JVM, next stop was searching for “java pdf library” a few other libraries made the front page but everything seemed to point to iText. It looks pretty comprehensive, there is even an “iText in action” book available. I start looking at the documentation and decide it is too much. I just want to merge 4 PDFs into 1 and I don’t want to use any more then 4 lines!

Time to have a look at the root of my scripting knowledge, Perl. Soon enough I found PDF::Reuse and its prDoc command. The result is

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/perl
use warnings;
use strict;
use PDF::Reuse;
 
prFile("kinderNewsletter.pdf");
 
prDoc('kinderScan_01.pdf');
prDoc('kinderScan_02.pdf');
prDoc('kinderScan_03.pdf');
prDoc('kinderScan_04.pdf');
 
prEnd();

I later also found that there is some unix command line tool pdftk which would have done the same but I would need to install that and work out how to use it, maybe next time.