Crawl you website including login form with Phantomjs

With PhantomJS, we start a headless WebKit and pilot it with our own scripts. Said differently, we write a script in JavaScript or CoffeeScript which controls an Internet browser and manipulates the webpage loaded inside. In the past, I’ve used a similar solution called Selenium. PhantomJS is much faster, it doesn’t start a graphical browser (that’s what headless stands for) and you can inject your own JavaScript inside the page (I can’t remember that we could do such a thing with Selenium).

PhantomJS is commonly used for testing websites and HTML-based applications which content is dynamically updated with JavaScript events and Ajax requests. The product is also popular to generate screenshot of webpages and build website previews, a usage illustrated below.

The official website present PhantomJS as:

  • Headless Website Testing: Run functional tests with frameworks such as Jasmine, QUnit, Mocha, Capybara, WebDriver, and many others.
  • Screen Capture: Programmatically capture web contents, including SVG and Canvas. Create web site screenshots with thumbnail preview.
  • Page Automation: Access and manipulate webpages with the standard DOM API, or with usual libraries like jQuery.
  • Network Monitoring: Monitor page loading and export as standard HAR files. Automate performance analysis using YSlow and Jenkins.

In my case, I’ve used it to simulate users behaviors under high load to create user logs and populate a system like Google Analytics. More specifically, I will introducted a project architecture composed of 3 components:

  1. User-written PhantomJS scripts that I later call “actions”. An action simulates user interactions and could be chained with other actions. For example a first action could login a user and a second one could update its personal information.
  2. A generic PhantomJS script to run sequencially multiple actions passed as arguments.
  3. A Node.js script to pilot PhantomJS and simulate concurrent user loads.

To make things more interesting, the user-written scripts will show you how to simulate a user login, or any form submission. Please don’t use it as a basis to login into your (boy|girl)friend Gmail account.

The user-written scripts

I will write 2 scripts for illustration purpose. The first will login the user on a fake website and the second will go to two user information pages. Those scripts are written in CoffeeScript and interact with the PhantomJS API which borrow a lot from the CommonJs specification. Keep in mind that even if it looks a lot like Node.js, it’s JavaScript after all, it will run in a completely different environment.

The login action

Put it all together

In the end, you might create a Node.js project (simply a directory with a package.json file inside), place all the files described above inside the new directory, declare your “phantomjs” and “each” module dependencies (inside the package.js file), install them with npm install and run your “run.js” script with the command node run.js.

Note about PhantomJs cookies

This is a personal section covering my experience on using the cookies support. PhantomJS accept a “cookies-file” argument with a file path as a value. Basically, a PhantomJS command would look like phantomjs --cookies-file=#{cookies} {more_arguments} {script_path} {script_arguments}.

After a few trials, I wasn’t able to use the cookies file efficiently. Trying to run a second script will not honored the persisted session. However, if I don’t exit PhantomJS with phantom.exit() and force quit the application instead, then the cookie file will work as expected.

This is one of the two reasons why I came up with such an architecture in which I can chain multiple actions. The other reason is speed since the headless Webkit instance is started fewer times. I don’t blame PhantomJS, it could be something I pass over in the documentation.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

By |2018-06-05T22:37:11+00:00July 27th, 2013|Categories: Hack|0 Comments

About the Author:

Passionate with programming, data and entrepreneurship, I participate in shaping Adaltas to be a team of talented engineers to share our skills and experiences.

Leave A Comment