How I Made My Ruby Project 10x Faster

This post is about how I got a 10x speedup for my ruby gem contracts.ruby.

contracts.ruby is my project to add code contracts to Ruby. Here’s what it looks like:

  Contract Num, Num => Num
def add(a, b)
  a + b
end

Now, whenever add is called, the parameters and return value will get checked. Cool!

20 seconds

This weekend I benchmarked this library and found that it has abysmal performance:

                                       user     system      total        real
testing add                      0.510000   0.000000   0.510000 (  0.509791)
testing contracts add           20.630000   0.040000  20.670000 ( 20.726758)

This is after running both functions 1,000,000 times on random input.

So adding contracts on a function results in a massive (40x) slowdown. I dug into why.

8 seconds

I got a big win right away. When a contract passes, I call a function called success_callback. The function is completely empty. Here’s the full definition:

  def self.success_callback(data)
end  

That’s something I had put in “just in case” (yay futureproofing!). Turns out, function calls are very expensive in Ruby. Just removing this call saved me 8 seconds!

                                       user     system      total        real
testing add                      0.520000   0.000000   0.520000 (  0.517302)
testing contracts add           12.120000   0.010000  12.130000 ( 12.140564)

Removing a lot of other extraneous function calls took me to 9.84 -> 9.59 -> 8.01 seconds. This library is already more than twice as fast!

Now things get a little more complex.

5.93 seconds

There are multiple ways to define a contract: lambdas, classes, plain ol’ values, etc. I had a big case statement that checked to see what kind of contract it was. Based on the type of contract, I would do different things. I saved some time by changing this to an if statement, but I was still spending unnecessary time going through this decision tree every time the function was called:

  if contract.is_a?(Class)
  # check arg
elsif contract.is_a?(Hash)
  # check arg
...

I changed this to go through the tree once, when the contract is defined, and build lambdas:

  if contract.is_a?(Class)
  lambda { |arg| # check arg }
elsif contract.is_a?(Hash)
  lambda { |arg| # check arg }
...

Then I would validate an argument by passing it into this pre-calculated lambda, bypassing the branching logic completely. This saved me another 1.2 seconds.

                                       user     system      total        real
testing add                      0.510000   0.000000   0.510000 (  0.516848)
testing contracts add            6.780000   0.000000   6.780000 (  6.785446)

Pre-computing some other if statements saved me almost another second:

                                       user     system      total        real
testing add                      0.510000   0.000000   0.510000 (  0.516527)
testing contracts add            5.930000   0.000000   5.930000 (  5.933225)

5.09 seconds

Switching out .times for .zip saved me almost another second:

                                       user     system      total        real
testing add                      0.510000   0.000000   0.510000 (  0.507554)
testing contracts add            5.090000   0.010000   5.100000 (  5.099530)

Turns out,

  args.zip(contracts).each do |arg, contract|

is slower than

  args.each_with_index do |arg, i|

which is slower than

  args.size.times do |i|

.zip spends unnecessary time copying and creating a new array. And I think .each_with_index is slower because it just yields to .each in the background, so it involves two yields instead of one.

4.23 seconds

Now we’re getting into some nitty-gritty stuff. The way the contracts library works is, for every method, a new method is added using class_eval (class_eval is faster than define_method). This new method contains a reference to the old method. When the new method is called, it checks the arguments, then calls the old method with the arguments, then checks the return value, and then returns the return value. All of this happens with two method calls to the Contract class: check_args and check_result. I eliminated these two method calls and did the checks right in the new method instead. This got me another .9 seconds:

                                       user     system      total        real
testing add                      0.530000   0.000000   0.530000 (  0.523503)
testing contracts add            4.230000   0.000000   4.230000 (  4.244071)

2.94 seconds

Earlier I had explained how I was creating lambdas based on the type of Contract and then using those to check arguments. I switched this out to generate code instead, which would then get eval’d when I used class_eval to create my new method. An awful hack! But it avoided a bunch of method calls and saved me another 1.25 seconds:

                                       user     system      total        real
testing add                      0.520000   0.000000   0.520000 (  0.519425)
testing contracts add            2.940000   0.000000   2.940000 (  2.942372)

1.57 seconds

Finally, I changed how I called overridden method. My earlier approach was using a reference:

  # simplification
old_method = method(name)

class_eval %{
    def #{name}(*args)
        old_method.bind(self).call(*args)
    end
}

I changed it to use alias_method:

  alias_method :"original_#{name}", name
class_eval %{
    def #{name}(*args)
        self.send(:"original_#{name}", *args)
      end
}

This bought me a crazy 1.4 seconds. I have no idea why alias_method is so fast…I’m guessing because it skips a method call as well a call to .bind.

                                       user     system      total        real
testing add                      0.520000   0.000000   0.520000 (  0.518431)
testing contracts add            1.570000   0.000000   1.570000 (  1.568863)

Results

We managed to get from 20 seconds to 1.5 seconds! Is it possible to do better than this? I don’t think so. I wrote this test script that shows that a wrapped add method will be 3x as slow as a regular add method, so these numbers are really good.

But it’s only 3x as slow because the method is so simple that more time is spent in calling the method. Here’s a more realistic example: a function that reads a file 100,000 times:

                                       user     system      total        real
testing read                     1.200000   1.330000   2.530000 (  2.521314)
testing contracts read           1.530000   1.370000   2.900000 (  2.903721)

Very little slowdown! And I think most functions will see very little slowdown…the add function is an outlier.

I decided not to use alias_method because it pollutes the namespace and those aliased functions would show up everywhere (documentation, auto-complete for IDEs, etc).

Some takeaways:

  1. Method calls are very slow in Ruby! I enjoy making my code very modular and reusable, but it might be time for me to start inlining more code.
  2. Benchmark your code! Removing a simple unused method took me from 20 seconds to 12 seconds.

Other things to try

Method combinators

One feature that didn’t make it into Ruby 2.0 was method combinators, which would have allowed you to write

  class Foo
  def bar:before
    # will always run before bar, when bar is called
  end

  def bar:after
    # will always run after bar, when bar is called
    # may or may not be able to access and/or change bar's return value
  end
end

This would’ve made it easier to write decorators, and might have been faster too.

keyword old

Another feature that didn’t make it into Ruby 2.0, that would have allowed you to reference an overwritten method:

  class Foo
  def bar
    'Hello'
  end
end 

class Foo
  def bar
    old + ' World'
  end
end

Foo.new.bar # => 'Hello World'

redefining a method with redef

This one got rejected by Matz who said:

To eliminate alias_method_chain, we introduced Module#prepend. There’s no chance to add redundant feature in the language.

So if redef is a redundant feature, maybe prepend could be used to write a decorator?

Other implementations

So far, all of this has been tested on YARV. Maybe Rubinius would allow me to optimize more?

References

blog comments powered by Disqus