Let’s Do Some Engineering Pt. 2: Software Metrics

By Loren Segal on April 6th, 2010 at 1:52 AM

The Common Metrics

These are the really basic metrics that most of us know about:

These metrics don’t really give you all that much useful information. However, they do make up a good set of building blocks for some numbers that can give us much more insight into our code. For all the Rubyists out there, you might be familiar with tools like RCov, Flog, Flay, Roodi, and the like. These tools basically check the above metrics (there’s also a metric for "duplication" in there, that one’s a little less basic).

Let’s talk about these guys:

Lines of Code (LoC or KLoC)

This is probably the hardest metric to interpret because the way we collect this information can vary greatly. We already know that the very same algorithm can have wildly different LoC counts depending on which language it is implemented in. We also know that an algorithm can be rewritten in the same language with wildly different LoC counts. This means from the outset, LoC can be an unreliable metric depending on how it is used.

There are actually a few ways to count lines of code. The simplest is to count the number of uncommented, non-whitespace lines in your source files. However, everyone but a Python programmer knows that we can stuff a lot of complexity into a single line of source, so this is not telling us the whole story. The more accurate measure of LoC (good metrics tools will do this) is to not count "lines", but "statements". This allows us to treat each statement as if it was on its own line (something Python programmers already expect). For example, the following Ruby code could either represent 1 LoC or 5 LoC:

10.times { result = flip_coin; tell_user(result); again = gets; break if again == "N" }

Yes, we can golf that Ruby down to 3 LoC, but that’s not something we’ll worry about. In fact, by counting statements rather than lines we can compare this value to our complexity and get a pretty useful result, but I won’t ruin the surprise. We’ll talk about that later. The point here is that there are many ways to define what a "line" is, so you should know how your tools define this metric. This might change how you can make use of the data.

As I pointed out, LoC can vary greatly, but that does not make it useless. Each codebase has its own patterns and conventions, so as long as you maintain consistency in your code, LoC should be able to give you useful information; but it will only apply to your code, not anyone else’s.

Cyclomatic Complexity (VG)

In case you’re confused, Cyclomatic Complexity will be abbreviated as VG because it is abbreviated as the function v(G) in the static analysis tool McCabe IQ. Cyclomatic Complexity, for those who don’t know, was a metric coined by McCabe, so v(G) is what I would consider the "official" abbreviation.

In short, cyclomatic complexity is defined as "the number of independent paths in a function", or "the number of conditional branches + 1". It’s a fairly easy one to calculate, though realize that languages with closures are a little harder to deal with since we can’t immediately tell if the closure is a "conditional" or "unconditional" branch. This is an important point for dynamic languages with closures (like Ruby), because it means our VG value will be a little less reliable, and we need to take that into account.

The great thing about an accurate VG measure is that it should be equal to the minimum number of tests for complete C1 code coverage. This means we can easily tell which modules are under-tested by simply comparing this value to the number of tests we have written for said module (again, ballpark, there are other factors). Note that while runtime tools like Ruby’s RCov can tell you your C1 coverage, it cannot tell you if your tests are organized properly. Mapping each code path to a separate test will give you better organized tests, and only VG can help you do this.

Note that Ruby’s "Flog" tool is not a true "Cyclomatic Complexity" measurement. It uses its own heuristics to give you a "score", but you cannot use this number to verify against your tests.

Test Coverage (TC)

I said we would only discuss static code metrics, but I think this one deserves mentioning because it’s probably one of the most useful. TC is a runtime metric and requires a tool that can profile your tests and tell you how much of your codebase they cover (as a percentage). The Ruby tool for this is RCov as we previously mentioned. There’s not much to say here except that TC gives you a good place to start looking when trying to find under-tested components. Remember, this only gives you test coverage, not test quality, so 100% coverage is not equal to "no bugs". Also, as mentioned in the previous section, coverage does not mean your tests are well organized. If you want to properly organize your tests, you should look at mapping each test to an individual code path, which requires the VG metric.

These are the basic metrics that most of us probably make use of daily. But as I said, these kinds of metrics are really just the building blocks for the really fun stuff, so let’s talk about those.

Questions? Comments? Follow me on Twitter (@lsegal) or email me.