Measure Twice, Code Once


I have had the pleasure of spending the last few days improving application performance. Specifically, my job is to improve the ‘speed’ dimension of the application. In doing this, I’ve been getting reacquainted with some well used tools: SQL Query Analyzer and dotTrace Profiler. (For those of you familiar with when SQL Query Analyzer was last available under that name, I’m working with a completely functional, happy, SQL Server 2000 installation.) It’s been quite a few months since I last did this. Given that not many people get a chance to do performance analysis and improvement on a regular basis, I think that now might be a good time to rehash some common mistakes and the way to fix those mistakes.

Mistake 1: Lack of Focus

When asked to work on performance, one will often be told to just make the product faster. This typically leads a developer to work on the areas of the application known best to the developer. Even then, the focus may be on features that don’t get used too often. At the end of the day, some obscure button click may work a bit faster and no user, no manager, no customer gets a better product. How do you get focus?

To find out where to make things faster, I spend some time finding out which features are used the most frequently. I then find out which features seem to be slow in responding. Of those features, I work with a user representative to rank the features in order of importance. Finally, I do some analysis to see which features have common code paths. If five of the top ten items on the list all have a common code path, I can deliver a big improvement with a smaller effort. At this point, I have some focus and I’m ready to make mistake 2.

Mistake 2: Failure to Measure

A lot of times, a developer will not have the tools or the knowledge of how to use some tools to measure their code. Instead, they will single step through the code looking for lines that seem to take “a while” to execute. The developer then takes pains to make the associated code take “less time” to execute. For what it’s worth, I’m not exaggerating here. “a while” and “less time” are actual measures I see folks use. When I hear a co-worker using this type of measurement, I know who I’ll be mentoring for the next few days. (Over the last ten or so years, I’ve discovered that cursing the people and calling them ‘morons’ doesn’t make me feel any better and doesn’t improve the situation. Mentoring builds relationships and improves the strength of the team. “Be part of the solution” and all thatJ.) To overcome this issue, we have to overcome the reason people don’t perform good measurements. The reason people fail to measure is because they don’t know what measurement will give them the right answer. So, what measurement is that?

I will start out by saying that I learned the fine art of how to measure from the very patient and capable performance team on WCF v 3.0. The first thing they taught me was to spread as wide a net as is reasonable and collect lots of data. Here, we can measure working set, execution time, or both. Working set measures how many resources, typically memory, that the application consumes. Execution time measures clock time to complete a task. On desktop and enterprise/web applications, the focus is typically on execution time. Here, where do you spread the net?

I fire up a tool like dotTrace Profiler, NProf, or, when I was a Microsoftie, OffProf (aka Office Profiler-Microsoft, please release this tool into the wild as part of the Debugging Tools for Windows!!!). All of these tools collect data about method execution times, time relative to the rest of the call tree execution, and so on. With the tool attached to your target, cause the application to follow one of the ‘slow paths’ and then stop data capture. You’ve just measured and identified the parts of the code that are interesting. For example, these tools will show when a particular method consumes an inordinate amount of processing time in the code path. Following the code path, you may see some of the following behavior:

  1. Method gets called an inordinately large number of times.
  2. Method consumes a large percentage of overall processing time.
  3. Method is waiting a long time for some synchronous item to return. (Think database, web service, or other out of process call.)

All of the tools I mention will show processing times, percentage of data collection time spent in a method, and number of times a method gets called. Because the tools gather data, they will make things take longer to execute. That said, they make everything, except out of process calls, take proportionately longer to execute (every call pays the same n% penalty). Armed with this information, you can go and make the application faster.

Mistake 3: Fearing Database Analysis

Once people measure their code, they may see that a database call causes the execution speed issues. At this point, they start looking at the query. Most queries are under 30 lines of SQL. As a result, the developer applies the same tactics they did when measuring: they look at the code and guess where issues lie. The thing is, SQL typically performs set based operations over thousands or millions of records. The SQL will not tell you that the table requires a table scan, bookmark lookup, or unusually large hashtable to compute the result set. In this case, you need to take out a tool like SQL Query Analyzer for SQL Server 2000 or the Database Engine Tuning Advisor for SQL Server 2005/8. In Query Analyzer, make sure to select ‘Query Analysis’ and then execute the queries that are running slowly. You will be looking for the items that take up a significant part of the execution time and then tune those items. You will add new indexes, precompute values, etc. Like with your source code, your measurements will tell you what to address. The Database Engine Tuning Advisor will go a few steps further and suggest the changes you should make.

Do you have any tips you’d like to share?

%d bloggers like this: