Sharing Our Passion for Technology
& continuous learning
〈  Back to Blog

Automating the Web with WWW::Mechanize

And yes, the double colon does mean Perl. However, I know that Python also has the same class modeled after the Perl module. So even if py- is your favorite prefix, this should still be useful.

WWW::Mechanize gives you basic access to a “web browser” from your Perl scripts. It has the concept of getting, putting, ticking and clicking. Use an image map, or enter text into a text box. It even has a back button! Using all these and more, one can make quite the script to do most anything. I’ve used this before to create a script that logged into a Google Search Appliance and download a backup file. (Since for some reason, there is no way to push backups from within a GSA) More recently, I decided to automate the downloading of PDF statements from my bank’s website. This is a popular use for WWW::Mechanize, and I’ll go through a quick script which will do just this.

Let’s start like any good Perl script should, and also include some needed modules…

use strict;

use WWW::Mechanize;
use HTTP::Cookies;

my $robut = WWW::Mechanize->new();
# look like a real person
$robut->agent('User-Agent=Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: Gecko/20091102 Firefox/3.5.5');
# we need cookies

This simply creates a new Mechanize object and sets a sane user-agent string. Also, we need to save cookies, so we need to create a new cookie jar.

Next, we’ll load the first page and set the credentials. Hopefully, the rest of the code (and the bank I use) is self-explanatory…

# we start at login
$robut->success or die "login GET fail";
my $user = 'woooobar';
my $pass = 'piglet';

# find a fill out the login form
my $login = $robut->form_name("logon");
$login->value('USERID' => $user);
$robut->success or die "login POST fail";

print "Login done\n";

I create a new “form” object, using the form_name call which is simply passed the “name” property of the form from the HTML of the page. Using Firebug in Firefox, this information is easily had.

And because my bank has the really annoying feature of prompting me to answer yet another question, I have the next bit of code to handle that…

my $response = $robut->response->content;

# we have another step
if (index($response, 'Answer your other Question') > 0)
	print "Answer needed...";
	my $ans;
	# and we need to figure out which question was asked
	if (index($response, 'what goes well with foo') > 0) { $ans = 'more foo'; }
	if (index($response, 'where does your mother live') > 0) { $ans = 'not here'; }

	$login = $robut->form_name('challenge');
	$login->value('ANSWER' => $ans);
	$login->value('CHALLENGEANSWER' => $ans);
	$robut->success or die "challenge POST fail";
	print "Question done\n";

This part uses $robut->response->content. This is the HTML of the response, and I’m searching it for various strings to decide on what to do next. Again, Firebug’s net feature was helpful in determining what I needed to submit to the form. WWW::Mechanize will use any default values provided by the page, so you don’t need to repeat every form item. One important thing to note is WWW::Mechanize doesn’t do JavaScript. So, if you have a bunch of JS going on before the form actually posts, you need to make sure your script accounts for that.

Now the password page is here, and it is submitted.

# time for the password
$login = $robut->form_name('password');
$login->value('PSWD' => $pass);
$robut->success or die "password POST fail";
print "Password done\n";

Now we can “click” on the link on the main account page. The follow_link method will locate the first link with the given “text”. This is the text between the <a> tags.

$robut->follow_link(text => 'Online Statements');
$robut->success or die "stmts LINK fail";

From here on out, you can code up logic to determine if you are missing any statements, and download them. Or maybe navigate to a page which offers an OFX file download, which you then load into GnuCash because you do a very good job at keeping track of your finances.

The human must click With mechanize write a script Now CRON does the work

〈  Back to Blog