A search spider in Ruby using Capybara-webkit.

When I first looked at Nokogiri, it was a redefining moment(atleast for me!) on how to screen scrap. Recently I found my love with cucumber and capybara-webkit. For newbies to capybara-webkit, it is a capybara driver which simulates a webkit browser for running tests. Perks? You get a simulated browser running in a headless mode, it supports javascript and its bloody fast! For more info, please checkout a previous article on how to get started. I was extremely bored this weekend, and all of a sudden an idea was born. I created a simple search spider using capybara-webkit which would fetch search results from google. And here is how I did it.

require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara-webkit'

Capybara.run_server = false
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.google.com/"

module Spider 
  class Google
    include Capybara::DSL

    def search
    visit('/')
    fill_in "q", :with => ARGV[0] || "I love Ruby!"  
    click_button "Google Search"
    all("li.g h3").each do |h3| 
      a = h3.find("a")
      puts "#{h3.text}  =>  #{a[:href]}"
    end
    end
  end
end

spider = Spider::Google.new
spider.search

The above code is self explanatory and here is how the results looked.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.