I've read several articles describing Shark, the tool with comes with CHUD toolkit from Apple. I've never used it before, but I've heard that it's quite powerful tool.
First of all, I decided to use Rails application I've been working on lately as a test-bed for measuring the performance. I started it under ruby-debug using rdebug script:
$ rdebug ./script/server
When my application was up and running I launched Shark.

Notice that on the main screen I selected my running ruby process from the dropdown menu. There are several profiles available, but the default Time Prifile is exactly what I needed. It's possible change several options in the config pane, but I decided that defaults would work for me well.
Then I simply pressed Start button and started using my application while Shark was sitting in the background collecting profiling information.
In thirty seconds Shark presented me with the following screen:

Well, that's nice. It loaded the symbol table successfully, and I can see the call stacks as well. At the bottom of the screen you can select top-down or bottom-up view. It just worked and it don't have to recompile my Ruby and ruby-debug with some special compiler directives or do some other magic.
But wait a minute, I didn't expect to see my debug_event_hook function somewhere at the bottom of the list. All I could see was memory management functions along with garbage_collect function. It means that there are a lot of objects being created somewhere, which triggers so much of a garbage collection overhead. Also I saw my lovely debug_event_hook function right there in the call stack of garbage_collect method.
Let's see if I can get more info about it. Right-clicked on debug_event_hook function and selected Focus Symbol debug_event_hook. Shark made my function the root of the call tree and removed call stacks that do not contain my function.

OK, that's better. Now I double-clicked on debug_event_hook function call stack and Shark displayed this screen:

Wow! Notice these two lines 522 and 523. I didn't expect that they are going to impose so much of a performance overhead. Now it's clear, there are too many String objects being created and most of the time these objects are not referenced after this function returns.
Ok, I started one of my favorite text editors and in a couple of hours and several similar iterations I've got this picture:

Now, this is much better! There is no garbage_collect in sight and most of the time debug_event_hook spends in searching for the current thread info. I could probably change the algorithm and data structure that I use in order to keep this information, but it's already good enough. Now my benchmarks show that instead of 300% of the overhead, ruby-debug imposes only 90%. It's still slower than Cylon though and it would be good to know about the implementation details of Cylon debugger.
When you are profiling, you need to know when to stop! And I'm going to repeat it: premature optimization is the root of all evil. It makes your code more obscure and should be used with discretion.
And the bottom line is. Shark is very powerful tool. It's free and very easy to use.
Kent, nice write up and great work on ruby-debug. I can't wait to try out the new, faster version. Keep it up!
(Have you talked to the rubinius folks yet? I'll think this kind of tool would be a great reason for them to get RNI finished. Hmm, I'll be the JRuby guys would be interested too. I know some of them are working on debugging too.)
Yeah, rubinius looks very interesting and I'm planing to take a look at it more closely.