Syntax Highlighting with vim for Keynote

Syntax Highlighting code in a presentation can be a boring ordeal. Vim already supports the :toHTML command that outputs the code to HTML. But that file has to be saved, opened in a browser, copied to the clipboard and pasted in Keynote to get the same syntax highlighting. But this is too cumbersome. I wanted something that I can visually highlight and run a command against it to copy it as RTF to the clipboard and then paste the code into Keynote. I know that the cool kids using TextMate have Dr Nic’s Copy as RTF bundle . But I wanted it to work with the editor that I love. So I wrote the rtf-highlight vim plugin. It works with the “highlight”  command line tool. If you like it, fork it and submit patches.

BTW, this works for OpenOffice as well.


Managing Background Jobs with Foreman and Upstart

We use Passenger 3 and MySQL to run our Rails app. They get started as soon as the application reboots. But every Rails app will have some background jobs like parsing documents, sending mails and such. The need to be monitored and rebooted. There are several libraries like bluepillmonit and god. But the one that we are using is Foreman. Foreman simply allows for a way to start a bunch of processes. It is quite simple to use but does not monitor and restart the service when it is down. However, it allows us to export our jobs as upstart services. Upstart is the new way most Linux distros (including Ubuntu) deal with deamons. The thing you have to note is that Foreman will only work with services that use the console. Foreman will not work with processes that do not interact with the console like *cough cough* soffice in headless mode. Getting stated with Foreman is really simple. You simply have to add the following line to the Gemfile.

gem 'foreman'

You then have to create a simple text file in the root of your Rails application called the Procfile. Our Procfile for KeepRecruiting looks something like this.

mailer: RAILS_ENV=production kr_stalk lib/mailer_jobs.rb
converter: RAILS_ENV=production kr_stalk lib/converter_jobs.rb
parser: RAILS_ENV=production jruby_kr_stalk lib/parser_jobs.rb

We use RVM on our production boxes and use multiple rubies for the same project. For example, KeepRecruiting runs primarily on Ruby 1.9.2 but also uses JRuby 1.6 to parse word/pdf documents to make them searchable by our fulltext search engine (Sphinx). So there are three stalker jobs. One to send out the mailer, the other to convert resumes to PDF and SWF for viewing it in the browser and the third to parse the content of the resume and make it accessible via search. To have the “stalk” script available from multiple rubies without having to switch gemsets, we use RVM wrappers. An RVM wrapper sets the necessary environment to execute the target script in the chosen gemset. We did something like this to create our wrappers.

deploy@server $ rvm wrapper 1.9.2-p180@keeprecruiting kr stalk
deploy@server $ rvm wrapper jruby-1.6.0@keeprecruiting jruby_kr stalk

This way both kr_stalk and jruby_kr_stalkpoint to the right version of the stalk script. You can now export these jobs to upstart and have them managed with the service command.

deploy@server $ rvmsudo foreman export upstart /etc/init -a keeprecruiting -u deploy

You can now manage all your services like this.

$ sudo service keeprecruiting start
$ sudo service keeprecruiting-mailer restart
$ sudo service keeprecruiting-parser stop

But that said, monit and god have its place. Upstart is not designed for things that God or Monit can do. For example, restarting a process automatically, if it reaches arbitrary memory limits and such. But for everything simple, foreman and upstart are pretty cool.


Setting up a Ruby based DNS Server

While you are in development or staging, you might want to setup servers that are only accessible internally like yourproduct.dev, yourproduct.test, yourproduct.demo and so on. We wanted this for KeepRecruiting as we isolate our development, staging and demo environments. But having servers running on a physical box inside the office network is super inefficient. So we usually set these up on Linodes and have entries in our /etc/hosts file. As the number of subdomains grow, maintaining these hosts files across the entire team becomes really hard.

That is when we started look at setting up my own DNS server with BIND. But we quickly got buried under tons of boring documentation. We just wanted something really simple and preferable in Ruby. The alternative we found was RubyDNS. RubyDNS provides a very simple syntax to setup a fully functional DNS server and also allows forwarding to a standard DNS server like Google. Now you can have your internal rubygems server or any number of your fake servers sitting publicly on the internet but with non existent domains like myproduct.gems, sub.myproduct.dev, sub.myproduct.test and sub.myproduct.demo.

The following piece of code is shamelessly stolen from RubyDNS’ documentation and slightly altered for readability. Thanks to Sam for producing a brilliant piece of work.

#!/usr/bin/env ruby

# rubydns_server.rb - The Ruby DNS server

# Copyright (c) 2009 Samuel Williams. Released under the GNU GPLv3.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

require 'rubygems'

require 'rexec'
require 'rexec/daemon'

require 'rubygems'
require 'rubydns'

require 'timeout'

# Run as user "daemon"
RUN_AS="daemon"

# Cache DNS entries for 5 minutes
CACHE_TIME=60*5

# We need to be root in order to bind to privileged port
if RExec.current_user != "root"
  $stderr.puts "Sorry, this command needs to be run as root!"
  exit 1
end

# Helper
Name = Resolv::DNS::Name

YOURIP = "LINODEIP"
GOOGLE = "8.8.8.8"

# The Daemon itself
class Server < RExec::Daemon::Base
  @@var_directory = File.dirname(__FILE__)

  def self.run
    # Don't buffer output (for debug purposes)
    $stderr.sync = true

    # Use upstream DNS for name resolution
    # Scooter DNS
    $R = Resolv::DNS.new(:nameserver => GOOGLE)

    $CACHE = {}

    # Start the RubyDNS server
    RubyDNS::run_server do
      on(:start) do
        RExec.change_user(RUN_AS)
      end

      # setup A records for custom.dev, custom.test, *.custom.dev and *.custom.test
      # brilliant ruby goodness
      match(/(.*\.)?custom.(dev|test)$/, :A) do |match, transaction|
        transaction.respond!(YOURIP)
      end

      # Default DNS handler
      otherwise do |transaction|
        key = [transaction.name, transaction.resource_class]
        cache = $CACHE[key]

        if cache and (Time.now - cache[1]) < CACHE_TIME
          logger.info "Cached: #{transaction.name}..."
          transaction.answer.merge!(cache[0])
        else
          logger.info "Lookup: #{transaction.question.to_s}"
          transaction.passthrough!($R) do |reply, reply_name|
            $CACHE[key] = [reply, Time.now]
          end
        end
      end
    end
  end
end

# RExec daemon runner
Server.daemonize

Now if you are using RVM and installed rubydns into non-system ruby, just remember to use rvmsudo instead of plain sudo. Alternatively, you can generate a wrapper. That is a topic for another blog post.

$ rvmsudo ruby rubydns_server start

 

$ rvmsudo ruby rubydns_server stop

Securing your Ubuntu VPS for hosting a Rails Application

So you have got your new shiny VPS and are all set to deploy your Rails application. However, you need to guard you server against the bad guys. Rails framework does a lot of right things to secure your application. However, you need to manage the security of your server. With a few simple rules you can easily secure your server. I have written this guide for a Linode VPS running on Ubuntu 10.04. But the steps should be fairly similar for other distributions. The first and foremost is to setup a deploy user and give it sudo access. This is just to ensure that you do not accidentally run scripts under root.

# adduser deploy
answer the relevant questions
# visudo
add deploy ALL=(ALL) ALL just below the root's line

SSH into the VPS as deploy and verify if you are able to log in and that you can successfully sudo. Now to change a few SSH details. You would not want to tunnel clear text passwords for your server login. This encourages people to share passwords and makes it difficult to perform access control. Instead, a better approach is to use the SSH public key authentication. To do this, add your public key from your machine to the deploy user’s ~/.ssh/authorized_keys file. Ensure that you are able to login with your public key before proceeding.

Also, it is better to disallow root from directly logging in via SSH. To do these you would have to edit the /etc/ssh/sshd_config file.

$ sudo vim /etc/ssh/sshd_config

Look for the lines containing PubKeyAuthentication, PermitRootLogin and PasswordAuthentication. Change the lines should read as below.

PermitRootLogin no
PubKeyAuthentication yes
PasswordAuthentication no

If you lock yourself out, don’t worry. Most VPS providers allow you to login as root from a web console. Restart the SSH service and verify if you are able to login without a password.

The next thing you need to do is the single most important thing. Setting up a firewall. It is easier to run the following commands as root so I am sudo into a root shell.

$ sudo su - root

Allow SSH, HTTP/S and PING incoming connections. Accept all incoming connections from 127.0.0.1.

# iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# iptables -A INPUT -p tcp --dport 80 -j ACCEPT
# iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# iptables -A INPUT -p icmp -j ACCEPT
# iptables -A INPUT -s 127.0.0.1 -j ACCEPT

If there are services already connnected, do not drop them.

# iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A FORWARD -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A OUTPUT -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT

Reject everything else.

# iptables -A INPUT -j REJECT
# iptables -A FORWARD -j REJECT

The problem with IPTables is that it forgets your rules once you reboot. You need to save them and restore them during reboot when the network interface comes up. First, dump all the rules to a file using iptables-save.

# iptables-save > /etc/iptables.rules

Now you need to add it just before the network interface comes up. You can do that by editing the /etc/network/interfaces file.

# vim /etc/network/interfaces

Just after the definition of the eth0 interface add the a line for pre-up. This runs a command specified just before bringing up the interface. The last couple of lines of the file should now look something like this.

iface eth0 inet dhcp
  pre-up iptables-restore < /etc/iptables.rules

Now you can reboot the system and verify if the rules apply after reboot.

$ sudo iptables -L

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:https
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     all  --  localhost.localdomain  anywhere
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere            state NEW,RELATED,ESTABLISHED

Now, we’re all done.


Sthaladharshanam – explore places enroute

One of our clients wanted us to summarize the knowledge we had about Google Maps in a document. But we are hackers. Why write a document when you can write code? We were at RubyConfIndia at Bangalore for the weekend and we had a couple of hours to kill before the night train. Shanthi had mentioned that, as a child, she used to get dragged down to various temples by her family. So I thought I could make a quick app out of that. Thus was born SthalaDarshanam. Given two cities it finds out the temples in the vicinity and helps you plan a route to cover the temples in the most efficient way possible.

  • Visit Sthala Dharshanam.
  • Enter a city in from and to, for example madurai and trichy and view a list of temples along the way.
  • Click on the temples in the list and let us plan the route for you.
  • If you would like to not visit a temple after selecting it, remove it from your iternary by clicking on the name in the list below the map. The routes will be instantly calculated.

You can find the code for this app on sthaladharshanam’s github repository. This is the story of how we did it.

We were hanging around in the Royal Orchid hotel after the conference and we got slightly hungry by 7 PM. We decided to walk to Papa John’s and work on this while feasting on a pizza. But our bags were a little heavy and we were too lazy and tired to walk that far. We found a place enroute. It was called the Cafette restaurant. Thanks to its notoriously bad service and really long wait times, the restaurant was empty and quiet. It turned out to be a perfect office for the rest of the evening for us. I was with Shanthi, Dhruv, Dhiren and Deepak.

As we started hacking, we ordered for some food. I can say it positively that everything in the restaurante tastes sweet save the green chillies. I did not pay too much attention to the food as Shanthi graciously managed ordering for the team there while I was busy hacking with Dhruv and Deepak. By then, the restaurant guys had become edgy. It is likely that they were not accustomed to customers actually enjoying their experience there. They even gave us the bill as a polite gesture to throw us out. But we were busy hacking and were in no mood to get out. We said that we would order for a couple of beverages later(which we did) and continued to hack on. The guy who got us  the bill was visibly upset by this. However, we were in no mood to head out. We were having too much fun. After three hours and an almost empty laptop battery we managed to push the basic search for places and the directions service up to the github repo. Our cabbie had come to drop us to the railway station.

The 2AC coach that we got into had a broken power plug and I could not get it to work until the next day morning. Once I had the power socket working I spent about 30 minutes to get the waypoint directions API integrated. I then spent another hour to get the listing cleaned up by ignoring duplicates. And by the night I spent another hour or so styling it and added support for removing waypoints. In all the project took about 6 hours of work. The client was understandably pleased with the effort that we put in. It was way more effective than a document and I had a lot of fun putting it together.

Hope you enjoy it. I thank Dhruv and Deepak for sitting beside me patiently and helping me out on the most difficult parts of the project and I thank Shanthi for ordering the right food and prompting for the right features and most of all I thank the client who was open to entertain working code over a document. I am indeed fortunate to work with such fine people and such clients.


I am speaking at the Chennai Java Summit

I would be speaking tomorrow at the Chennai Java Summit.

Chennai Java Summit is an initiative by JUG-Chennai to bring the java enthusiasts of various levels together on a two day action packed event. We are trying to bring as much as valuable technology possible with the help of speakers who are mentors, hard core professionals and evangelist in their respective areas. They are coming from various corners of the world for this wonderful occasion to enrich us all with their vast expertise in the field.

I will be giving a talk on JRuby. I will post the slides here after the talk.


A 404 Maybe- A Heisenbug story

We were deploying our webapp on one of the client’s servers today. Our webapp is frontended by Nginx which then proxies to a bunch thin servers. Initially everything just worked fine. We cloned the repo from github, copied the nginx config to sites-enabled, ran the rake task to compile our Sass files to CSS files and start the thin servers. The app ran without a hitch. All this time we had been logged in to the system and watching htop, tailing logs and such. Once we were convinced that everything was working fine, we shot an email to the client that they could start using the server now and logged out of the ssh session.

The moment we logged out, the stylesheets and other static assets were not being served by nginx. The client called up and said that the UI looks screwy. We visited the site and we could not believe that all the static resources where throwing up 404s. We logged in again to the machine wondering if any of us had accidentally deleted the files but we found that they were all there. We checked the server again and we found everything was working. We looked at the logs and we found that when we were logged in the requests are served directly by Nginx however when we were logged out of the ssh session, the requests were not served by nginx and as a fallback were attempted to be served by the thin cluster, which failed as well.

This was really odd. They were just static assets – images, js and css. And the worst part is that any attempt made to study the bug rectified it. We knew were dealing with a Heisenbug. We figured that there had to be something that was running when we login and killed when we log out. We looked at .bashrc, .profile and everywhere else. We could not find it. Out of sheer guess work, we looked at /etc/mtab. There it was the home directory of the user was encrypted using ecryptfs. We maintain different apps on the servers under different user accounts. During ubuntu server installation the person who installed it from the client end had chosen to encrypt the home folder by mistake. Thin was serving fine as it had loaded the code on to memory while Nginx read files on demand and was unable to serve any static assets.

All we had to do was move the code out of the home folder and it started serving fine. One heisenbug successfully squashed. Or as Cecelia likes puts it, we squished it with a 10 ton hammer.

Comic shamelessly stolen from PhD Comics.

 


Frankenware

A recent post on McSweeney’s blog struck a chord with me. We are building a Personal Health Record for one of our clients and we’re having a tough time integrating with HIS systems with a particularly large hospital chain in India. The HIS app is based on Windows and Microsoft.NET with code written in VB.net which was auto-migrated from VB 6. They construct SQL strings from forms and the app has 1500 tables of which only 50-70 tables are actually used. The application is rife with all sorts of security issues and is a nightmare to maintain. The design, if I may call it so, looked more like a result of natural selection rather than a well engineered product. You know humans have several unnecessary body parts and mechanisms as vestiges of our evolution, the HIS was no different from that. Most of the tables and forms were there because they were part of the evolutionary process and they had to fight the brutal evolutionary cycle to survive on the front-desks and the database server.

On closer inspection of the code, one can only conclude that the HIS was written by a bunch of intoxicated simians. It seems that their manager was aiming for the Ballmer peak but over shot it by a fair amount.

XKCD: Ballmer Peak

There are a few gems that I want to share with you. Guess where the bill details are stored

  • bill_main
  • bill_details
  • opd_bill_main
  • opd_bill_main_details
  • opbill
  • ipbill
  • ipd_bill_main
  • ipd_bill_main_details
  • dbo.xxxx_bill (where xxxx is a random 0-7 characters string)

The answer: Depending on when the patient enrolled

This table is for storing “special” test results.

  • lab_regno
  • some
  • irrelevant
  • fields
  • item1
  • value1
  • item2
  • value2
  • item100
  • value100

The values may be or may not be stored in contiguous item numbers. I was interested how this came about and after a little digging around I got the answer. They copy-pasted existing forms to create new forms and did not want to alter the binding of the old controls so the created new ones and deleted the old ones. In short, you need to know which columns to select depending on the test that you want to load.

This one is a classic. Each one of the modules is a separate executable but is supposed to act like a single app. Once you click on a menu item or a button it is supposed to open a new executable. Guess where the state is stored. Exactly… the registry. If you locate the individual executable and open it, no authentication, viola. But the IT department has given strict orders that their front-desk guys login with the shortcut on the desktop.

In the end, you feel like gouging out your eyes after seeing such frankenware.

I cannot believe that people are dumb enough to buy such software. So, what should a IT manager do before they buy software? You should read Joel Test. If the company does not follow these practices, do not buy their software. I repeat. DO NOT BUY THEIR SOFTWARE . You will be making this world a much better place and saving a lot of good developers from undeserved grief.

Images licensed under creative commons. Click on the images to know the source.

 


Jack and the Beanstalkd

Every webapp needs to do some background tasks like sending emails, processing stats, creating PDF documents and so on. The rule of the thumb is that if a requests takes more than a couple of seconds to process, turn it into a background process. There are several alternatives in the ruby world. From delayed_job to RabbitMQ. However, we at Dharana love simplicity. One such really simple solution for processing background jobs is beanstalkd.

Beanstalk is a simple, fast work queue.

Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.

It has a simple text based protocol inspired by memcached to put in messages that can then be run with workers. It is persistent and most of its clients support connections to multiple beanstalkd servers. There are different choices of ruby clients. While there is the standard ruby beanstalk client, there is also an asynchronous EventMachine version called em-jack. You can choose your pick depending on the type of jobs. If the jobs are IO bound and not too much processing, EM-Jack is a good choice. If the jobs are fairly performance intensive, then you are better off using the standard loopy constructs or implement your own threaded client.

Tubes

In addition to the standard global queue, beanstalk supports the concept of tubes. Tubes are independent job queues. You can imagine an app having multiple producers each pushing jobs to a queue and multiple consumers consuming and processing the jobs from these queues. This is not as exhaustive as RabbitMQ’s queues, exchanges and keys but it is simple enough for most web apps.

Usage

Connecting to beanstalkd and creating a job is is fairly simple.

email={:to=>['email@example.com'], :subject=>'Some Subject', :body=>body}
beanstalk = Beanstalk::Pool.new(['localhost:11300'])
beanstalk.use "emailtube"
beanstalk.yput(email) # yput converts the object to yaml before putting it in the queue

Processing the job is equally simple.

beanstalk = Beanstalk::Pool.new(['localhost:11300'])
beanstalk.use "emailtube"
loop do
  job = beanstalk.reserve
  email=YAML.load(job.body)
  # send the email
  job.delete
end

We have used beanstalkd in a couple of applications as a replacement to RabbitMQ. And so far, we have had a great experience using it. Try it out and let me know if it works for you.


How to store passwords?

Recently Gawker’s database got hacked. About 1.3 million passwords are out in the open. This has led to much furore on various sites. In fact, (some sites)[http://nakedsecurity.sophos.com/2010/12/13/gawker-gizmodo-lifehacker-password-change/] have suggested users to set strong passwords and not reuse passwords.

Although this is good advice for users, I would like to give 3 important points for developers storing passwords in their database.

  • Use BCrypt
  • Use BCrypt
  • Use BCrypt

BCrypt is a hashing scheme based on the Blowfish algorithm. It is excellent for password hashing as it is dog slow. The usual general purpose hashing algorithms like MD5 or SHA1 are really fast. They are meant to hash a lot of content and return a unique hash efficiently. This makes them an easy target for brute force attacks. Bcrypt on the other hand is very slow. On my machine, it is about 4 orders slower than MD5 when the number of rounds is 10. This is what wikipedia has to say about bcrypt.

Blowfish is notable among block ciphers for its expensive key setup phase. It starts off with subkeys in a standard state, then uses this state to perform a block encryption using part of the key, and uses the result of that encryption (really, a hashing) to replace some of the subkeys. Then it uses this modified state to encrypt another part of the key, and uses the result to replace more of the subkeys. It proceeds in this fashion, using a progressively modified state to hash the key and replace bits of state, until all subkeys have been set.

Provos and Mazieres took advantage of this, and actually took it further. They developed a new key setup algorithm for Blowfish, dubbing the resulting cipher “Eksblowfish” (“expensive key schedule Blowfish”). The key setup begins with a modified form of the standard Blowfish key setup, in which both the salt and password are used to set all subkeys. Then there is a configurable number of rounds in which the standard Blowfish keying algorithm is applied, using alternately the salt and the password as the key, each round starting with the subkey state from the previous round. This is not cryptographically significantly stronger than the standard Blowfish key schedule; it’s just very slow.

The number of rounds of keying is a power of two, which is an input to the algorithm. The number is encoded in the textual hash.

Ruby developers can use bcrypt-ruby. bcrypt-ruby also salts your passwords to ensure that it is safe fromRainbow Table attacks. You can install it using the following command.

$ gem install bcrypt-ruby

And here is an example of how to use it.

require 'bcrypt'
 
class User
  include BCrypt
  attr_accessor :password_hash
 
  def password
    @password||=Password.new(password_hash)
  end
 
  def password=(passwd)
    @password = Password.create(passwd)
    password_hash = @password
  end
 
end
 
#usage
 
user = User.new
user.password = "password"
 
user.password == "password" # true

Let’s make the web safer for our users.


Follow

Get every new post delivered to your Inbox.