A search spider in Ruby using Capybara-webkit.

When I first looked at Nokogiri, it was a redefining moment(atleast for me!) on how to screen scrap. Recently I found my love with cucumber and capybara-webkit. For newbies to capybara-webkit, it is a capybara driver which simulates a webkit browser for running tests. Perks? You get a simulated browser running in a headless mode, it supports javascript and its bloody fast! For more info, please checkout a previous article on how to get started. I was extremely bored this weekend, and all of a sudden an idea was born. I created a simple search spider using capybara-webkit which would fetch search results from google. And here is how I did it.

require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara-webkit'

Capybara.run_server = false
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.google.com/"

module Spider 
  class Google
    include Capybara::DSL

    def search
    visit('/')
    fill_in "q", :with => ARGV[0] || "I love Ruby!"  
    click_button "Google Search"
    all("li.g h3").each do |h3| 
      a = h3.find("a")
      puts "#{h3.text}  =>  #{a[:href]}"
    end
    end
  end
end

spider = Spider::Google.new
spider.search

The above code is self explanatory and here is how the results looked.


Faster loading times with Ruby 1.9.3 and Rails 3.2

There are lots of performance improvements which got pushed with the release of Ruby 1.9.3 and Rails 3.2. I am writing about 2 major changes which has improved the Rails bootup time by upto 36% and faster development mode.

Problem with Ruby 1.9.2
When you look at the codebase of Rails, it is pretty huge and it has lot of ‘require’ statements hanging around. So what is the problem? We all know that, ruby’s require statement by itself is quite interesting. How? When you do a require in ruby, ruby will actually require the content only if it is not required before. Which means if your require something which was already required, ruby skips it. Let me show you an example.

When there were talks on how to improve the rails load time, they found that the problem was with Ruby core itself. The way ruby manages the required file information was quite aweful. Thanks to Xavier for his patch, which did the magic.

#ruby 1.9.2 way of requiring the files
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded.push(file)
end

#ruby 1.9.3 way of requiring the files
def require(file)
  return false if $loaded[file]
  load(file)
  $loaded[file] = true
end

The above code represents the rubyish way what actually Xavier’s patch is actually doing. It was that hash look-up which entirely changed the way ruby handled the require(s) and improved the performance by upto 36%. I did a benchmark by myself, both with 1.9.2 and 1.9.3, and I found it to be pretty fast. Checkout my screenshots below. Awesome! So upgrade to 1.9.3 and get a 36% faster rails bootup time.

Faster development mode in Rails 3.2
Atlast I am happy that it was not just me who got pissed of with slower developement mode in Rails. ActiveReload is the new candidate in Rails 3.2 which fastens your development mode. How? Well Rails is little stupid, it just forgets the code it executed after it processed the request. For the next request Rails again loads your code entirely. I hope you know this is how you have been skipping your server restart in dev. Now the big deal is imagine if you have changed only 1 out of 10 models, Rails is going to load all the 10 models again for the next request. ActiveReload does something interesting, it tells rails to reload only the files which has changed. So only the changed file gets loaded again. Thats cool!! Checkout Rails 3.2 release notes for further information. Happy coding.


cancan permissions on backbone.js

Writing a plugin/gem which does authorization is quite challenging, especially when it comes to a multi-tenant application, it is quite complicated. Sometime back we wrote a plugin overriding authlogic, which takes of both authentication and setting up authorization rules for a user. We called it watchman. Its pretty old now. And then we found our love with cancan. While working on a backbone.js project, we needed the use of the can? method provided by cancan on the javascript layer.And that application involved user permissions which are controlled by a site admin for different roles. Basically we store the permissions on the database with a set of Booleans. First approach was to write a javascript helper which will do an ajax call and find out of the access. It was slowwwwwwwwwwwwwww.The next approach was to create a backbone model which holds the permissions. It worked really well!! An example on how we did that.

//views/layouts/application.html.haml
:javascript
  -if current_user
    :javascript
      $(document).ready(function() {
        window.Permission = Backbone.Model.extend({
          defaults: {
          "canDestroyBlog": #{can? :destroy, Blog},
          "canCreateComment": #{can? :create, Comment},
          "canEditComment": #{can? :edit, Comment},
          "canCreateProject": #{can? :create, Project},
          "canViewProject": #{can? :view, Project}
          }
        });
      }); 
//in your javascript
permission = new window.Permission
permission.attributes.canEditComment
=> true

permission.attributes.canCreateProject
=> false

PS: The permissions don’t get updated unless there is a page refresh at the user side.
This may not be a perfect solution, but it reduces much noise and complexity at the javascript layer.


Git bisect is my friend

Recently when I did a merge on one of my projects, large number of commits got merged. I found that my tests were broken. It became a tedious task for me to find out which commit exactly broke the tests among those which got merged.

Into the scene: Vagmi Mudumbai(CTO of Dharana)

“Git-bisect should be your friend.”

I have never tried git-bisect, I thought I will give a try. Tasting the sweet of awesomeness. Git-bisect takes in a good commit(a commit where your tests passed) and a bad commit(a commit where your tests failed, mostly HEAD). Git-bisect allows you to run tests in between your commits and will ask you to mark them good or bad based on which you can narrow down your search. Aaahh, now I see you remember binary search!!!! Yeah, thats exactly what git-bisect does and here is how you do it.

$ git bisect start
$ git bisect good 9ef9b9a64a
$ git bisect bad b5e38ab73b
> Bisecting: 60 revisions left to test after this (roughly 6 steps)

What happens now is little interesting, git will put you on to a ‘bisect’ branch that is somewhere in between your good and bad commit. Now all you have do is to run the tests and find out if it still fails or not. Lets assume your tests failed, so you have to tell git that this commit is bad.(If it passes you have to say its good.)

$ git bisect bad
> Bisecting: 29 revisions left to test after this (roughly 5 steps)

You have now got a different commit to test, and you have only 29 commits left to check. The idea is to make you not to step through every single commit to find that bad commit. Now run the tests again and tell git if this commit is good or bad. When you continue doing this you will eventually find out the commit which broke the tests, and wait git will also estimate how many steps are roughly required to find out that bad commit. Awesome!!!!!

> 6247d67284798e40182034500305b282dccf4a6a is the first bad commit

Tadaaaaa, now you can go and fight with that guy who broke all your tests. Finally if you see that your boss was the one who broke your tests, and you want to go back to current state and fix it all by yourself, you have to say,

$ git bisect reset

Happy fighting, happy testing.


Cucumber + Capybara show rails exceptions

Whenever there is an exception that is raised in Rails, Capybara will show only a blank page on the simulated browser. It becomes quite difficult to find out what went wrong. To get the exception on your console while running the test, just this patch to your features/support/env.rb.

#If your running mongrel
Capybara.server do |app, port|
  require 'rack/handler/mongrel'
  Rack::Handler::Mongrel.run(app, : Port => port)
end

#If your are running thin
module Thin::Logging
  def log_error(e=$!)
    STDERR.print "#{e}\n\t" + e.backtrace.join("\n\t")
  end
end

Happy testing.


Cucumber + Capybara driver gotchas

Cucumber and Capybara has always been our best buddies when it comes to writing integration tests. We use it quite actively on most of our projects. The power of Capybara is that it gives you options for several browser simulators, equipping you with a flexible toolkit for testing all parts of your application, from the simplest to the most complex pages and also with javascript heavy pages.

Capybara comes with “rack_test” as the default driver which doesn’t support javascript. I am not going to talk much about which driver to choose, it is upto you based on your need. You can change which driver you want to use in features/support/env.rb. It can be quite confusing when you find out there are 3 different Capybara settings can play around different drivers while running the test.

#Driver settings
Capybara.default_driver(uses rack_test as default)
Capybara.javascript_driver(uses selenium as default)
Capybara.current_driver

#features/sample.feature
Scenario: Creating a user
Given I go to the new user page
And I fill in “Name” with “TDD Ninja”
And I fill in “Email” with “gmail@ninja.com”
And I press “Create”
Then I should see “User created”

@javascript
Scenario: AJAX pagination
Given 10 blogs exist
When I go to the blogs page
Then I should see 5 blogs
When I follow “More”
Then I should see 10 blogs

In the above example, the first scenario runs with ‘rack_test’ and the second scenario runs with “selenium”. Suppose you want to set :webkit as your javascript driver, all you have to do is to set

Capybara.javascript_driver = :webkit

And from then on whenever you use @javascript in your scenario, Capybara will use :webkit as its javascript_driver instead of :selenium. Capybara also allows you to change your driver temporarily using Capybara.current_driver setting.

Capybara.current_driver = :webkit # temporarily select different driver
... tests ...
Capybara.use_default_driver # switch back to default driver

Please note that switching between drivers will create a new session, so make sure where you switch! The ideal place its to switch in the Before hook. Capybara makes it really convenient to switch between the drivers. Apart from that you can also tweek with your own settings using the API it exposes. This is how we tweeked and made selenium to use chrome.

#features/support/env.rb

Capybara.register_driver :selenium_chrome do |app|
Capybara::Selenium::Driver.new(app, :browser => :chrome)
end

Capybara.javascript_driver = :selenium_chrome

There are tons of other information in Selenium wiki.


Gearing up cucumber and capybara for testing javascript.

With the explosion of rich internet applications, the usage of javascript is kind of becoming mandatory  for all of the web applications these days. We at Dharana, work extensively on backbone.js which is a javascript MVC framework. While we worked on backbone.js, we also found that writing a tests for javascript can be extremely exciting and challenging. Being a cucumber and capybara fan boys, we looked at many solutions like seleniumcelerityculerityenvjsakephalos browser simulators which supports javascript.

We found selenium to work well with our previous projects. But the problem with selenium is that, it is little heavy and most of all it simulates the entire browser on the GUI which makes it extremely slow. Thanks to thoughtbot for releasing the capybara-webkit. Capybara-webkit is just a rendering engine coupled with a DOM implementation and full javascript support. And it runs on a headless mode(no GUI). Here is why you should capybara-webkit,

1. Runs bloody fast.

2. Runs on a headless mode.

3. console.log support. You can see the javascript console.log outputs while running the tests from your  console.

4. Uses webkit.

5. Setting up is super easy.


capybara-webkit depends on qtwebkit, make sure you have qtwebkit before you install capybara-webkit.
qtwebkit:
apt-get install libqt4-dev libqtwebkit-dev


Gemfile:
capybara-webkit


features/support/env:
Capybara.javascript_driver = :webkit

In my next post I will probably write about some useful tips which I learnt during writing javascript tests. Thanks.


Follow

Get every new post delivered to your Inbox.