A search spider in Ruby using Capybara-webkit.

When I first looked at Nokogiri, it was a redefining moment(atleast for me!) on how to screen scrap. Recently I found my love with cucumber and capybara-webkit. For newbies to capybara-webkit, it is a capybara driver which simulates a webkit browser for running tests. Perks? You get a simulated browser running in a headless mode, it supports javascript and its bloody fast! For more info, please checkout a previous article on how to get started. I was extremely bored this weekend, and all of a sudden an idea was born. I created a simple search spider using capybara-webkit which would fetch search results from google. And here is how I did it.

require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara-webkit'

Capybara.run_server = false
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.google.com/"

module Spider 
  class Google
    include Capybara::DSL

    def search
    visit('/')
    fill_in "q", :with => ARGV[0] || "I love Ruby!"  
    click_button "Google Search"
    all("li.g h3").each do |h3| 
      a = h3.find("a")
      puts "#{h3.text}  =>  #{a[:href]}"
    end
    end
  end
end

spider = Spider::Google.new
spider.search

The above code is self explanatory and here is how the results looked.


Faster loading times with Ruby 1.9.3 and Rails 3.2

There are lots of performance improvements which got pushed with the release of Ruby 1.9.3 and Rails 3.2. I am writing about 2 major changes which has improved the Rails bootup time by upto 36% and faster development mode.

Problem with Ruby 1.9.2
When you look at the codebase of Rails, it is pretty huge and it has lot of ‘require’ statements hanging around. So what is the problem? We all know that, ruby’s require statement by itself is quite interesting. How? When you do a require in ruby, ruby will actually require the content only if it is not required before. Which means if your require something which was already required, ruby skips it. Let me show you an example.

When there were talks on how to improve the rails load time, they found that the problem was with Ruby core itself. The way ruby manages the required file information was quite aweful. Thanks to Xavier for his patch, which did the magic.

#ruby 1.9.2 way of requiring the files
def require(file)
  $loaded.each do |x|
    return false if x == file
  end
  load(file)
  $loaded.push(file)
end

#ruby 1.9.3 way of requiring the files
def require(file)
  return false if $loaded[file]
  load(file)
  $loaded[file] = true
end

The above code represents the rubyish way what actually Xavier’s patch is actually doing. It was that hash look-up which entirely changed the way ruby handled the require(s) and improved the performance by upto 36%. I did a benchmark by myself, both with 1.9.2 and 1.9.3, and I found it to be pretty fast. Checkout my screenshots below. Awesome! So upgrade to 1.9.3 and get a 36% faster rails bootup time.

Faster development mode in Rails 3.2
Atlast I am happy that it was not just me who got pissed of with slower developement mode in Rails. ActiveReload is the new candidate in Rails 3.2 which fastens your development mode. How? Well Rails is little stupid, it just forgets the code it executed after it processed the request. For the next request Rails again loads your code entirely. I hope you know this is how you have been skipping your server restart in dev. Now the big deal is imagine if you have changed only 1 out of 10 models, Rails is going to load all the 10 models again for the next request. ActiveReload does something interesting, it tells rails to reload only the files which has changed. So only the changed file gets loaded again. Thats cool!! Checkout Rails 3.2 release notes for further information. Happy coding.


cancan permissions on backbone.js

Writing a plugin/gem which does authorization is quite challenging, especially when it comes to a multi-tenant application, it is quite complicated. Sometime back we wrote a plugin overriding authlogic, which takes of both authentication and setting up authorization rules for a user. We called it watchman. Its pretty old now. And then we found our love with cancan. While working on a backbone.js project, we needed the use of the can? method provided by cancan on the javascript layer.And that application involved user permissions which are controlled by a site admin for different roles. Basically we store the permissions on the database with a set of Booleans. First approach was to write a javascript helper which will do an ajax call and find out of the access. It was slowwwwwwwwwwwwwww.The next approach was to create a backbone model which holds the permissions. It worked really well!! An example on how we did that.

//views/layouts/application.html.haml
:javascript
  -if current_user
    :javascript
      $(document).ready(function() {
        window.Permission = Backbone.Model.extend({
          defaults: {
          "canDestroyBlog": #{can? :destroy, Blog},
          "canCreateComment": #{can? :create, Comment},
          "canEditComment": #{can? :edit, Comment},
          "canCreateProject": #{can? :create, Project},
          "canViewProject": #{can? :view, Project}
          }
        });
      }); 

//in your javascript
permission = new window.Permission
permission.attributes.canEditComment
=> true

permission.attributes.canCreateProject
=> false

PS: The permissions don’t get updated unless there is a page refresh at the user side.
This may not be a perfect solution, but it reduces much noise and complexity at the javascript layer.


Git bisect is my friend

Recently when I did a merge on one of my projects, large number of commits got merged. I found that my tests were broken. It became a tedious task for me to find out which commit exactly broke the tests among those which got merged.

Into the scene: Vagmi Mudumbai(CTO of Dharana)

“Git-bisect should be your friend.”

I have never tried git-bisect, I thought I will give a try. Tasting the sweet of awesomeness. Git-bisect takes in a good commit(a commit where your tests passed) and a bad commit(a commit where your tests failed, mostly HEAD). Git-bisect allows you to run tests in between your commits and will ask you to mark them good or bad based on which you can narrow down your search. Aaahh, now I see you remember binary search!!!! Yeah, thats exactly what git-bisect does and here is how you do it.

$ git bisect start
$ git bisect good 9ef9b9a64a
$ git bisect bad b5e38ab73b
> Bisecting: 60 revisions left to test after this (roughly 6 steps)

What happens now is little interesting, git will put you on to a ‘bisect’ branch that is somewhere in between your good and bad commit. Now all you have do is to run the tests and find out if it still fails or not. Lets assume your tests failed, so you have to tell git that this commit is bad.(If it passes you have to say its good.)

$ git bisect bad
> Bisecting: 29 revisions left to test after this (roughly 5 steps)

You have now got a different commit to test, and you have only 29 commits left to check. The idea is to make you not to step through every single commit to find that bad commit. Now run the tests again and tell git if this commit is good or bad. When you continue doing this you will eventually find out the commit which broke the tests, and wait git will also estimate how many steps are roughly required to find out that bad commit. Awesome!!!!!

> 6247d67284798e40182034500305b282dccf4a6a is the first bad commit

Tadaaaaa, now you can go and fight with that guy who broke all your tests. Finally if you see that your boss was the one who broke your tests, and you want to go back to current state and fix it all by yourself, you have to say,

$ git bisect reset

Happy fighting, happy testing.


Cucumber + Capybara show rails exceptions

Whenever there is an exception that is raised in Rails, Capybara will show only a blank page on the simulated browser. It becomes quite difficult to find out what went wrong. To get the exception on your console while running the test, just this patch to your features/support/env.rb.

#If your running mongrel
Capybara.server do |app, port|
  require 'rack/handler/mongrel'
  Rack::Handler::Mongrel.run(app, : Port => port)
end

#If your are running thin
module Thin::Logging
  def log_error(e=$!)
    STDERR.print "#{e}\n\t" + e.backtrace.join("\n\t")
  end
end

Happy testing.


Cucumber + Capybara driver gotchas

Cucumber and Capybara has always been our best buddies when it comes to writing integration tests. We use it quite actively on most of our projects. The power of Capybara is that it gives you options for several browser simulators, equipping you with a flexible toolkit for testing all parts of your application, from the simplest to the most complex pages and also with javascript heavy pages.

Capybara comes with “rack_test” as the default driver which doesn’t support javascript. I am not going to talk much about which driver to choose, it is upto you based on your need. You can change which driver you want to use in features/support/env.rb. It can be quite confusing when you find out there are 3 different Capybara settings can play around different drivers while running the test.

#Driver settings
Capybara.default_driver(uses rack_test as default)
Capybara.javascript_driver(uses selenium as default)
Capybara.current_driver

#features/sample.feature
Scenario: Creating a user
Given I go to the new user page
And I fill in “Name” with “TDD Ninja”
And I fill in “Email” with “gmail@ninja.com”
And I press “Create”
Then I should see “User created”

@javascript
Scenario: AJAX pagination
Given 10 blogs exist
When I go to the blogs page
Then I should see 5 blogs
When I follow “More”
Then I should see 10 blogs

In the above example, the first scenario runs with ‘rack_test’ and the second scenario runs with “selenium”. Suppose you want to set :webkit as your javascript driver, all you have to do is to set

Capybara.javascript_driver = :webkit

And from then on whenever you use @javascript in your scenario, Capybara will use :webkit as its javascript_driver instead of :selenium. Capybara also allows you to change your driver temporarily using Capybara.current_driver setting.

Capybara.current_driver = :webkit # temporarily select different driver
... tests ...
Capybara.use_default_driver # switch back to default driver

Please note that switching between drivers will create a new session, so make sure where you switch! The ideal place its to switch in the Before hook. Capybara makes it really convenient to switch between the drivers. Apart from that you can also tweek with your own settings using the API it exposes. This is how we tweeked and made selenium to use chrome.

#features/support/env.rb

Capybara.register_driver :selenium_chrome do |app|
Capybara::Selenium::Driver.new(app, :browser => :chrome)
end

Capybara.javascript_driver = :selenium_chrome

There are tons of other information in Selenium wiki.


Gearing up cucumber and capybara for testing javascript.

With the explosion of rich internet applications, the usage of javascript is kind of becoming mandatory  for all of the web applications these days. We at Dharana, work extensively on backbone.js which is a javascript MVC framework. While we worked on backbone.js, we also found that writing a tests for javascript can be extremely exciting and challenging. Being a cucumber and capybara fan boys, we looked at many solutions like seleniumcelerityculerityenvjsakephalos browser simulators which supports javascript.

We found selenium to work well with our previous projects. But the problem with selenium is that, it is little heavy and most of all it simulates the entire browser on the GUI which makes it extremely slow. Thanks to thoughtbot for releasing the capybara-webkit. Capybara-webkit is just a rendering engine coupled with a DOM implementation and full javascript support. And it runs on a headless mode(no GUI). Here is why you should capybara-webkit,

1. Runs bloody fast.

2. Runs on a headless mode.

3. console.log support. You can see the javascript console.log outputs while running the tests from your  console.

4. Uses webkit.

5. Setting up is super easy.


capybara-webkit depends on qtwebkit, make sure you have qtwebkit before you install capybara-webkit.
qtwebkit:
apt-get install libqt4-dev libqtwebkit-dev


Gemfile:
capybara-webkit


features/support/env:
Capybara.javascript_driver = :webkit

In my next post I will probably write about some useful tips which I learnt during writing javascript tests. Thanks.


Polymorphic Many-to-Many Associations in Rails

Rails has various Active Record associations that can be defined between models.

First of all why do we need associations between models? Association between models helps us to perform certain common operations very simple and easy in our Rails code.

Basically Rails supports six types of Associations namely:

  • belongs_to
  • has_one
  • has_one :through
  • has_many
  • has_many :through
  • has_and_belongs_to_many

In the above has_many and belongs_to associations are both commonly used Active Record associations in Rails.

Recently I learnt one more association in Rails which is known as Polymorphic association. I found it very interesting and would like to share it.

Polymorphic Association

In polymorphic association a model belong to more than one other model on a single association.

In one of the project which I was working on, I had to implement polymorphic many-to-many association between the models. The implementation of polymorphic many-to-many association in Rails is very simple.

Let me explain about polymorphic many-to-many association with an simple example.

Consider an example application which has two models namely Book and Article. These two models are associated to a common model called Tag. Book and Article has many tags. But a tag can have many books and also a tag can have many articles. In this case we have many-to-many relationship.

The implementation of above scenario will be like below.

First we have the models Book and Article


class Book < ActiveRecord::Base

has_many :tags, :through => :taggings

has_many :taggings, :through => :taggable

end

class Article < ActiveRecord::Base

has_many :tags, :through => :taggings

has_many :taggings, :through => :taggable

end

Then we have the Tag model


class Tag < ActiveRecord::Base

has_many :books, :through => :taggings, :source => :taggable, :source_type => "Book"

has_many :articles, :through => :taggings, :source => :taggable, :source_type => "Article"

has_many :taggings

end

Finally we have our join model Tagging


class Tagging < ActiveRecord::Base

belongs_to :tag

belongs_to :taggable, :polymorphic => true

end

From the above implementation we can access tags of books and articles as


book.tags

article.tags

We can also access the books and articles based on a tag


tag.books

tag.articles

We can also get the taggings and taggable of a tag


tag.taggings

tag.taggable

In the above example we have join model which has the fields namely name, taggable_id and taggable_type. The taggable_type field defines that to which model the tag is associated.

Finally let me explain how we get books or articles with respect to a tag. In tag model we have defined has_many relations to books and articles. But there are other options mentioned.

The :through option defines through which model we can access the books or articles. In the above example it is taggings.

The :source option tells the Rails to consider the taggable association instead of expecting an association on taggings.

The :source_type specifies the class of type of polymorphic association that we want to retrieve.

The above is an simple example explaining the implementation of polymorphic many-to-many association in Rails. I hope the above article is interesting and useful.


Sending Meeting Requests with Rails and Action Mailer

As a part of our Recruitment Tracking Solution, we send interview appointments as meeting requests to integrate with various calendar solutions like Outlook, Apple iCal and Google Calendar.

If you want to send a meeting request through email, you can do it in two ways. First, you can directly mention the meeting date and time in the email body which people do it usually. Second, you can send a calendar request along with the email which looks like this

Isn’t it cool!!

To implement this in ruby on rails, you need icalendar gem, a ruby library for dealing with icalendar files having extension .ics. It’s pretty simple to implement. Add the following line in your gemfile

gem 'icalendar'

then run bundle

Suppose you have a mailer MeetingNotification, you would just need to add the format.ics block as mentioned below

class MeetingNotification < ActionMailer::Base
   def meeting_request_with_calendar
     mail(:to => "any_email@example.com", :subject => "iCalendar",
                  :from => "any_email@example.com") do |format|
       format.ics {
       ical = Icalendar::Calendar.new
       e = Icalendar::Event.new
       e.start = DateTime.now.utc
       e.start.icalendar_tzid="UTC" # set timezone as "UTC"
       e.end = (DateTime.now + 1.day).utc
       e.end.icalendar_tzid="UTC"
       e.organizer "any_email@example.com"
       e.uid "MeetingRequest#{unique_value}"
       e.summary "Scrum Meeting"
       e.description <<-EOF
         Venue: Office
         Date: 16 August 2011
         Time: 10 am
       EOF
       ical.add_event(e)
       i cal.publish
       ical.to_ical
       render :text => ical, :layout => false
      }
    end
  end
end

If you want to know little more about  how it happens, then lets delve into this. If you run the above code in rails console, you would see

It’s almost self explanatory. Although I would like to point out few things like the DTEND and DTSTART which has Z(indicates UTC) appended, it’s there because we have set the timezone to UTC.

iCalendar auto-generates UID with value as combination of date and time. But recently Google has updated it’s code and  it does not parse it properly. So you have to explicitly add UID . Also You should add unique value to e.uid like “MeetingRequest-#{unique_value}” inorder to distinguish between the request you send and update otherwise you might end up messing up the wrong request.


Fasten your TDD red-green-refactor cycles

Having recently been bit by the TDD bug, I’ve been researching a lot about best practices when it comes to testing and speeding up the red-green-refactor process and I recently came across a really neat way to fasten the cycles further.

Guard - is a command line tool that allows you to handle events on file modifications. What that means is you can essentially configure it to monitor for modifications to specific files on your file system and perform corresponding actions. It is a great utility and can be used for carrying out a wide variety of actions.

There is a big list of ‘Guards’ available, which are essentially extensions for guard to guard specific types of files, you can find the list here in the Guard Wiki

Installation :

I would recommend you to use bundler to maintain your gem dependies, if you are doing so, simply add the following in your Gemfile

gem 'guard'
gem 'guard-rspec'
gem 'guard-cucumber'

gem 'rb-inotify'
gem 'libnotify' # This is optional for notifications on linux, for other OS' please look at https://github.com/guard/guard#readme for details for other OS'

Or if you’re not using bundler, simply install the above mentioned gems
manually. After having done that you can perform the following steps to setup
guard and configure it for both rspec & cucumber

    $ guard init          # This creates an empty Guardfile in the project's root, which it uses for configurations.
    $ guard init rspec    # This adds rspec guard to the Guardfile
    $ guard init cucumber # This adds cucumber guard to the Guardfile

If you now look at your Guardfile, you will see something like this :

    # A sample Guardfile
    # More info at https://github.com/guard/guard#readme

    guard 'cucumber' do
      watch(%r{^features/.+\.feature$})
      watch(%r{^features/support/.+$}) { 'features' }
      watch(%r{^features/step_definitions/(.+)_steps\.rb$}) { |m| Dir[File.join("**/#{m[1]}.feature")][0] || 'features' }
    end

    guard 'rspec', :version => 2 do
      watch(%r{^spec/.+_spec\.rb$})
      watch(%r{^lib/(.+)\.rb$}) { |m| "spec/lib/#{m[1]}_spec.rb" }
      watch('spec/spec_helper.rb') { "spec" }

      # Rails example
      watch(%r{^spec/.+_spec\.rb$})
      watch(%r{^app/(.+)\.rb$}) { |m| "spec/#{m[1]}_spec.rb" }
      watch(%r{^lib/(.+)\.rb$}) { |m| "spec/lib/#{m[1]}_spec.rb" }
      watch(%r{^app/controllers/(.+)_(controller)\.rb$})  { |m| ["spec/routing/#{m[1]}_routing_spec.rb", "spec/#{m[2]}s/#{m[1]}_#{m[2]}_spec.rb", "spec/acceptance/#{m[1]}_spec.rb"] }
      watch(%r{^spec/support/(.+)\.rb$}) { "spec" }
      watch('spec/spec_helper.rb') { "spec" }
      watch('config/routes.rb') { "spec/routing" }
      watch('app/controllers/application_controller.rb') { "spec/controllers" } # Capybara request specs
      watch(%r{^app/views/(.+)/.*\.(erb|haml)$}) { |m| "spec/requests/#{m[1]}_spec.rb" }
    end

The Guardfile is quite self-explanatory, but you can have a look at
https://github.com/guard/guard#readme for more details.

After having done that, all you need to do is start guard by issuing :

guard [start] # start is optional

Now whenever you create or modify either your spec or cucumber feature files, guard will automatically run the tests for your for the individual file that has been modified / added. This is really cool since all you need to do is write your tests and save the files and it will automatically run your tests.

If you have installed the library for notifications (libnotify in linux), you will see a cool notification of the tests instantly, indicating with the help of an icon whether the tests passed or failed. This is indeed a faster approach to following the red-green-refactor TDD cycles.


Follow

Get every new post delivered to your Inbox.