What’s Missing from TomDoc

By Loren Segal on May 05, 2010 at 512:1022:316 AM

TomDoc was made public by @mojombo yesterday. It’s an interesting documentation markup syntax that tries to make docs more readable. For the most part it does a good job, but there are some failings that make the syntax more limiting than the more robust YARD syntax I’ve been pushing over the years.

Why YARD Still Wins

First, let’s look at some of the features YARD provides us, because TomDoc does not cover all of the same use cases. YARD’s syntax might not focus on readability, but there are some benefits from using the tag style syntax.

1. Ubiquity

YARD’s syntax is the same used by countless tools, from Javadoc to Doxygen to Apple’s Objective-C documentation and many Javascript tools. It’s a de-facto standard and easy to follow, so the argument that its “strictness” lends to confusion is something I don’t buy. In fact, TomDoc’s syntax is certainly more strict than YARD, requiring a very specific ordering and presence of sections (we’ll see that later).

2. Extensibility

@tag syntax has the benefit of being able to represent arbitrary metadata. TomDoc, on the other hand, is limited to a very specific set of declarations and cannot be easily extended unless the specification is changed. Your mileage may vary here, but there are actually a handful of YARD plugins in the wild that make good use of extended metadata, like yard-sinatra, yard-rest and more. People are extending the syntax all the time, so who are we really helping by taking this functionality away?

3. Succinctness

So it’s not as “natural”, but why can’t that be a good thing? Even Ruby uses shorthand like def rather than “define”, or attr instead of “attribute”. Are Rubyists really that averse to shorthand? That doesn’t seem right. The issue is whether or not the shorthand is understandable, and in my opinion it is.

Lets take TomDoc’s syntax for specifying a return value. TomDoc would say:

# Returns a String that ...

In YARD’s syntax, it’s simply:

# @return [String] ...

But why bother with the capitalization, the “a” and other natural English syntax when you can just stick to the stuff that matters? Everything in YARD’s syntax is there for a reason (yes, even the []), so the stuff that isn’t is gone. YARD lets you focus on the documentation itself, not the grammar of the syntax. I have enough faith in Rubyists that I don’t believe they need full sentences to be able to read documentation in a source file.

The other beautiful thing about YARD is that occasionally you can combine the @return and description into one simple message. TomDoc requires the following to be written out twice:

# Duplicate some text an abitrary number of times.
#
# ...
#
# Returns the duplicated String.

YARD can do this in one single line:

# @return [String] the text duplicated +count+ times.

In fact this form is recommended in situations like these. Again, you cannot do this in TomDoc, since the description section and returns section are both required. The TomDoc version might be easier on the eyes, but there’s way less noise with YARD’s syntax because it does not enforce form.

Specific TomDoc Omissions

The current TomDoc specification is also missing solutions to a few issues that will probably put you in many tough situations. Let’s look at a few places where TomDoc misses the mark.

1. No Overload Support

A lot of the time your method behaves differently depending on the number (or types) of arguments passed in. In other languages we call this method overloading. Because Ruby does not directly support overloading, people hack it with splat args in a single method, and it becomes especially hard to document such methods. YARD handles this cleanly with an @overload tag, and RDoc has similar support (but not as robust) with a call-seq: directive. TomDoc has no support for these methods.

The problem is that most frameworks and libraries have tons of these methods. Think Rails’ find, which can be of the form Model.find(1) or Model.find(:all, ...). In YARD we would document this as:

# @overload find(index)
#   @param [Fixnum] index the row index in the database
#   @return [Base] the model corresponding to the row
#   @return [nil] if the row is not found
# @overload find(which, hash = {})
#   @param [Symbol] which either :all or :first
#   (...hash options here...)
#   @return [Array<Base>] if :all is specified, a list of objects 
#   @return [Base] if :first is specified, the first object in the list

There is more to the above docstring, but you get the idea. Currently I see no way to document such a method cleanly in TomDoc. You would have to fall back on your description to explain this all in plaintext, which is not the ideal mechanism for declaring arguments and types.

2. Multiple Returns’

We just saw this above. YARD elegantly allows you to list as many returns clauses as you like. This lets you semantically separate each return according to the type value you’d expect. Using TomDoc you’d have to once again fall back on plaintext to explain what would happen if X or if Y, all using natural language. This is all too common with RDoc style syntax, where we would often see:

# Returns a list of Base objects if :all is specified, or a 
# Base object corresponding to the first object in the list if 
# :first is specified.

Were you really able to parse that as quickly as the YARD version?

3. Deprecated Is Not A Modifier

The specification says to begin any description with “Deprecated:” to indicate that it is, in fact, deprecated. So you have your existing method that says:

# Duplicate some text an abitrary number of times.

And then you deprecate it. From the sound of the spec it should now look like this:

# Deprecated: Duplicate some text an abitrary number of times.

It’s deprecated because you duplicate some text? The spec seems to miss telling you to specify a reason (or alternative), something YARD does. It should look like this:

# Deprecated: Use THISOTHERMETHOD instead.
# 
# Duplicate some text an arbitrary number of times

I hope this is what the specification means for you to do.

4. Doesn’t Respect Visibility

TomDoc seems to be opt-in on the idea that methods are not part of the public API unless they are explicitly declared as such. The weird part about this one is that we already know whether a method is public by Ruby’s own visibility rules. YARD plays well with Ruby and respects visibility so as to minimize the work for a documentation writer. YARD assumes that if it is public in your class, it is public in your API (unless otherwise noted). TomDoc does not do this.

In my obviously biased opinion, YARD has the right idea here. This makes you think about your code as spec, rather than your doc as spec. A method that is accessible in Ruby-land but private in your API simply does not make much sense (excluding a few very specific outlying cases). More importantly, though, by ignoring visibility you end up duplicating a lot of the work you wasted deciding on method visibility by having to explicitly re-declare a method as public in your documentation. If you need to say that a method is both public to Ruby and public to your API, we might as well throw Ruby’s visibility out the window, since Ruby visibility is barely enforced anyway.

Actually, on that note, the TomDoc article mentioned @private being used to hide information from documentation. Although @private can do this, you have to jump through a few hoops to do so, it’s extremely discouraged, and it’s not the main use case. The main use case of @private is to declare an object as private when Ruby’s visibility cannot apply. Classes, modules and constants are one specific case, since Ruby has no visibility on those objects. Again, you should not use @private to arbitrarily hide methods using YARD.

5. Doesn’t Understand Complex API

Of course, API is more than just public and private. Large frameworks have developer APIs, test APIs, plugin APIs and more that each expose separate methods to separate people/use-cases. You may even want to generate separate HTML documentation for each of these use-cases. YARD supports arbitrary APIs using the @api tag:

# @api developer (or whatever)

The field is free-form text, so you can specify any API you’d like using any convention/terminology you like. There is no equivalent of this with TomDoc; you only have two choices, public or “not public”.

6. Backtracking On Types

A lot of the good that YARD does is based on being very strict about how types are specified, making them parseable by a machine. This functionality is important to separate description content from type information so that a) users can more consistently get type information, b) you can format this type information in templates however you like and c) you can use this type information in other places (testing, tooling). A @return tag can actually be tested automatically in this way.

TomDoc’s spec only goes halfway in telling you that your type “SHOULD” be listed in the description, but it is not specific as to where. This means it is impossible to accurately find this information, which brings us back to the days of RDoc when this information was not consistently specified. Type information can now only be readable by humans. That can end up biting you, and sort of sets us back a bit in moving towards consistent docs.

It’s also important to point out that YARD’s type specification accepts duck-types. Although TomDoc also does, the only way to specify such a type is by writing it out in plain English:

# myarg - An object that responds to #read that is used to...

By ignoring all of the rules inherited by English grammar, YARD makes describing these types way simpler:

# @param [#read] myargs used to ...

The same goes for multiple and complex types. You can actually see the YARD syntax vs. the English versions on yardoc.org’s type parser page; an experimental parser to show you what a YARD type specification will probably mean. In many cases, the English copy is way more complex than YARD’s notation.

7. Missing Constant Support

Not sure if this is just another omission, but there’s no support for documenting constants in TomDoc. I wonder why?

8. Attributes Get Messy

TomDoc tells you that using attr_accessor is bad form and to get documentable code you should split them into an attr_reader and attr_writer combo. This is overly verbose and unnecessary, as we can infer from a readwrite accessor that it both returns and sets an object of the same type. In fact, YARD handles this elegantly:

# @return [String] the filename to write to
attr_accessor :file

The above YARD syntax can understand that #file returns a String and #file= sets a String. In fact, using a separate reader/writer in this case would be a waste of your time. TomDoc uses the justification that separating the declarations makes you think about your API, but I don’t buy it.

Verbose

All in all, TomDoc will probably be of use to many who are looking for a human-readable alternative to RDoc syntax. However, I think what you get out of TomDoc is actually more verbose than what you get with YARD, verbose not being a good thing in this case. Documentation should not be too brief, but making your docs seem more “readable” by injecting more noise in the form of grammar is not going to improve documentation. Thinking about what you are documenting and being clear about your API will.

But You Can Use It If you Want

All that said, YARD is great because it allows you to customize how you use the tool. It makes few rules and assumptions. If you don’t like YARD-style syntax, you don’t need to use it. You can override YARD’s docstring parsing with a plugin that supports TomDoc syntax (or any syntax you want). And hey, I wrote it.. in like 20 minutes.. and like 12 lines of code (thanks to @defunkt for providing the parser stuff). Check it out on Github: http://github.com/lsegal/yard-tomdoc.

It doesn’t really get you as much flexibility as YARD syntax does (because of the above caveats), but feel free to fork and improve the support.

Questions? Comments? Follow me on Twitter (@lsegal) or email me.