Let’s Do Some Engineering Pt. 2: Software Metrics
Better Metrics
We can get some pretty useful information out of our source code that some people might not think about tracking. Data like coupling (afferent and efferent), module stability, essential complexity, essential LoC, defects/loc, weighted methods per class, lack of cohesion of methods. Most of these metrics come from some pretty great static analysis tools such as McCabe IQ and NDepend. These tools support most static languages (plus Perl for some reason), but don’t have much support for dynamic languages like Python or Ruby. Unfortunate, but hopefully tools like these can be built for those languages. Anyway, let’s talk about these metrics.
Afferent Coupling (Ca) and Efferent Coupling (Ce)
Coupling is the measure of how much a module depends on, or is depended on by, other modules. Afferent Coupling is the measure of dependencies on a module, and Efferent Coupling is the measure of how many other modules a module depends on. This great graphic from codebetter.com should illustrate the two:
As you see, Afferent Coupling tells us how integral a module might be to the functioning of other modules. One thing you would want to make sure is that modules with a high Ca value are properly tested, as bugs in these modules are likely to have much higher impact. A high Ce value would tell you that perhaps your module has too many responsibilities and should be split up. Of course "high" here means relative to other classes in your codebase. As with most metrics, you should pay most attention to statistical outliers on a relative scale, not absolute scales.
Module Stability, or Instability (I)
Now that we’ve discussed Ca and Ce, we can talk about instability. This is where we see how these building block metrics really shine. While Ca and Ce are useful on their own, the Instability metric can give us some extra insight.
Instability is simply defined as: I = Ce / (Ca + Ce), or in plain English, the ratio of modules this module depends on versus the total coupling of that module. As the metric name suggests, what this tells us is how stable a module is. How does that really work? Well, if a module has many dependencies (both in and out) and most of them are outward dependencies (Ce), it’s not going to be resilient to changes in your system as a whole. On the other hand if only a small amount of the module’s total coupling are outward dependencies, it’s going to be more resilient to changes in your system.
In short, this metric can help you identify classes to double check when you plan on doing larger refactoring runs.
Essential Complexity (ev(G))
This one is specifically of interest to those of us who use dynamic languages. McCabe IQ defines this metric as:
Essential Complexity (ev(G)) is a measure of the degree to which a module contains unstructured constructs. This metric measures the degree of structuredness and the quality of the code. It is used to predict the maintenance effort and to help in the modularization process.
Unstructured constructs is a little vague, and just like a LoC metric might differ depending on the tool (and language). In Ruby, however, we would consider unstructured constructs to be things like lambdas/procs, blocks/closures and eval(). These methods would be seen as more complex and harder to test, so they should get extra attention. In static languages (and those without closures/blocks) the threshold of acceptance of this value would be a lot lower, and we would probably plan on refactoring any of the outliers.
Essential LoC (eLoC)
I don’t know of any tools that explicitly perform this metric calculation, though it is something you may have calculated informally before. Essential LoC is defined on a per method basis as eLoC = LoC / v(G), or in plain english, the number of statements in the method divided by the cyclomatic complexity (aka. number of code paths). This would tell us, on average, how many non-control-flow statements our methods have. A well refactored codebase would have this ratio as close to 1 as possible (even though it might have many code paths). The idea is also that methods with similar complexity should have similar LoC counts. The threshold on this would be lower than usual, but using this kind of a metric, it would be easy to identify methods that have similar complexity but take more LoC to perform the task. Outliers on a comparison of methods with equal complexity is likely to give you methods that could be refactored more easily than outliers on a pure LoC metric.
Defects Per LoC
Again, not really used formally in tools because there’s no way to automate this calculation, but this metric is pretty useful. Basically, it is the ratio of the number of reported defects over the total LoC. This data can be collected in a more granular basis, ie. per team member, per iteration, per feature. A good table showing an example can be found here. Although this one is a little more tedious to implement, it can give you some good numbers about team productivity, code quality and effectiveness. If you do Agile/Scrum, tracking these numbers is a great way to keep your team motivated on self-impovement (so long as you keep the environment positive and not overly competitive). These numbers also give you great data with which to estimate the duration of future projects. If you make your living working on many similar short-term contracts, this metric can make a big difference on your bottom line.
Weighted Methods Per Class (WMC)
WMC is basically the sum of all v(G) of a class. This is basically cyclomatic complexity as a less detailed view of your system which should make it easier to pinpoint your complex modules at a glance, rather than sifting through individual methods.
Lack of Cohesion of Methods (LCOM)
Cohesion of a module tells us the degree to which a module implements one single function; in short, the Single Responsibility Principle (SRP). LCOM, although a confusing name, tells us the opposite; the degree to which a class implements more than one function. NDepend has a great overview of this metric, but in short, lower is better, and above a certain threshold is bad.
