Saturday, June 14, 2008

Diminishing returns

With some minor changes to my Wide Finder 2 submission, I've been able to eek out an almost 10% improvement in performance. The first change was pretty straightforward, and is a lesson in trusting blind premature optimizations (or optimisations, as the case may be. The scalac compiler has a switch called -optimise which is described in the help text as "Generates faster bytecode by applying optimisations to the program." I have no idea what this option actually does, and a quick google search revealed no helpful information. I had used it blindly expecting that it would at least do no harm. Well, I can say for sure that it didn't help. My fastest times yet were achieved without using this switch. Whether it reduced performance is debatable, but I don't plan to use it again until I learn more about what it does.

The next change was to move the hit detection out of my main regular expression and into a spot that reduced the number of times it was applied (only for 200 and 304 status requests), and then to reduce its effect further by conditionally applying it only when the requested URL satisfies two requirements that are very efficient to check:

  1. The URL is at least 30 characters long, and
  2. the 28th character is a digit

By applying these checks first, many of the non-matching URLs can be quickly disqualified without the need to do any regular expression matching.

The final change was to revisit the decision to not use AtomicLongs in the Counter. Originally the difference in performance was negligible and the increase in complexity substantial. For example, in the comparator used for sorting map keys, I would have to change from:

map(b) - map(a)

to either this:


or this:

map(b).longValue - map(a).longValue

This wasn't the only place in the code where such changes would be required. I really didn't like having so much of the implementation details of Counter leak out into the rest of the code. I even considered various ways to wrap a Map[String,AtomicLong] in a MapWrapper[String,Long], but the incompatible type parameter complicated things. Given that the performance difference was negligible at the time, and other parts of the code required more attention, I just left the counter using Long values.

I was reminded of all of this when Erik Engbrecht left me a comment asking why I hadn't used AtomicLongs. So I went back and took another look, and sure enough, there is a simple way to use AtomicLongs in the Counter without leaking them all over the rest of the code, and by now, performance was tight enough that the occasional lock contention incurred in ConcurrrentHashMap.replace() was becoming noticeable. Here's how I did it:


import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.atomic.AtomicLong

import Config.{counters,segments}

class Counter(initSize: Int, segs: Int) {
    def this() = this(counters,segments)
    val h = new ConcurrentHashMap[String,AtomicLong](initSize,0.75f,segs)
    def size = h.size

    def inc(k:String, i:Int) =
        if(h.containsKey(k) || h.putIfAbsent(new String(k), new AtomicLong(i)) != null)
    def apply(key:String) = {
        val ret = h.get(key)
        if(ret == null) 0L else ret.longValue
    def keys = new Iterator[String] {
        val i = h.keySet.iterator
        def next =
        def hasNext = i.hasNext

Basically, I realized that Counter doesn't need to be a Map, so it doesn't need to implement MapWrapper. It just needs a few Map-like methods, namely size, apply, and keys. The apply method is then free to translate the underlying AtomicLong to a Long and the caller is none the wiser. Thanks Erik!

The best performance to date with this new code so far is:

real    14:04.6
user  6:58:06.7
sys      9:35.3

Given the diminishing returns I'm getting from small tweaks here and, I've been looking at changing something more fundamental, and hope to post about that effort soon.

No comments:

Post a Comment