Introducing Codebench

April 18, 2009

Codebench is a Kohana module for benchmarking PHP code. It comes in really handy when tweaking regular expressions.

For a long time I have been using a quick-and-dirty benchmark.php file to optimize bits of PHP code, many times regex-related stuff. The file contained not much more than a gettimeofday function wrapped around a for loop. It worked, albeit not very efficiently. Something more solid was needed. I set out to create a far more usable piece of software to aid in the everlasting quest to squeeze every millisecond out of those regular expressions.

Codebench goals

Benchmark multiple regular expressions at once

Being able to compare the speed of an arbitrary amount of regular expressions would be tremendously useful. In case you are wondering—yes, I had been writing down benchmark times for each regex, uncommenting them one by one. You get the idea. Those days should be gone forever now.

Benchmark multiple subjects at once

What gets overlooked too often when testing and optimizing regular expressions is the fact that speed can vastly differ depending on the subjects, also known as input or target strings. Just because your regular expression matches, say, a valid email address quickly, does not necessarily mean it will quickly realize when an invalid email is provided. I plan to write a follow-up article with hands-on regex examples to demonstrate this point. Anyway, Codebench allows you to create an array of subjects which will be passed to each benchmark.

Make it flexible enough to work for all PCRE functions

Initially I named the module “Regexbench”. I quickly realized, though, it would be flexible enough to benchmark all kinds of PHP code, hence the change to “Codebench”. While tools specifically built to help profiling PCRE functions, like preg_match or preg_replace, definitely have their use, more flexibility was needed here. You should be able to compare all kinds of constructions like combinations of PCRE functions and native PHP string functions.

Create clean and portable benchmark cases

Throwing valuable benchmark data away every time I needed to optimize another regular expression had to stop. A clean file containing the complete set of all regex variations to compare, together with the set of subjects to test them against, would be more than welcome. Moreover, it would be easy to exchange benchmark cases with others.

Visualize the benchmarks

Obviously providing a visual representation of the benchmark results, via simple graphs, would make interpreting them easier. Having not to think about Internet Explorer for once, made writing CSS a whole lot more easy and fun. It resulted in some fine graphs which are fully resizable.

Below are two screenshots of Codebench in action. Valid_Color is a class made for benchmarking different ways to validate hexadecimal HTML color values, e.g. #FFF. If you are interested in the story behind the actual regular expressions, take a look at this topic in the Kohana forums.

Codebench screenshot Benchmarking seven ways to validate HTML color values

Codebench screenshot Collapsable results per subject for each method

Working with Codebench

Download Codebench at GitHub. Since Codebench is a module for the Kohana PHP framework, I suggest you install Kohana first if you have not already done so. Add Codebench to the list of activated modules in application/config.php.

Creating your own benchmarks is just a matter of creating a library that extends the Codebench class. Put the code parts you want to compare into separate methods. Be sure to prefix those methods with “bench”, other methods will not be benchmarked. Glance at Valid_Color.php in the download for a real example.

Here is another short example with some extra explanations.

// libraries/Ltrim_Digits.php
class Ltrim_Digits extends Codebench {
 
    // Some optional explanatory comments about the benchmark file.
    // HTML allowed. URLs will be converted to links automatically.
    public $description = 'Trimming leading digits: regex vs ltrim.';
 
    // How many times to execute each method per subject.
    // Total loops = loops * number of methods * number of subjects
    public $loops = 100000;
 
    // The subjects to supply iteratively to your benchmark methods.
    public $subjects = array
    (
        '123digits',
        'no-digits',
    );
 
    public function bench_regex($subject)
    {
        return preg_replace('/^\d+/', '', $subject);
    }
 
    public function bench_ltrim($subject)
    {
        return ltrim($subject, '0..9');
    }
}

And the winner is… ltrim. Happy benchmarking!

4 comments RSS feed

  1. spirit —May 7, 2009

    Really nice! Template looks good to! What about sharing it on the projects repository?

  2. Geert De Deckere —June 11, 2009

    Thanks, spirit. I’ll be sharing it on github only, though. Having two repositories to keep in sync doesn’t really sound appealing. Maybe a link section could be created on the Kohana projects site, to point to external projects.

  3. Doru —June 24, 2009

    Great stuff. Thanks for sharing :)

  4. mikaweb —September 27, 2009

    NIce stuff.
    I looks very good.
    And it’s works good to.

Comments are closed.