Readability and Naming Things

Many people think that the readability of code has to do with the letters and symbols used. They believe it is the adding, removing, or changing of those symbols that makes code more readable. In some sense, they’re right. However, the underlying principle is:

Readability of code depends primarily on how space is occupied by letters and symbols.

What does that mean? Well, it means two things:

Code should have the proper amount of white space around it. Not too much, not too little.

There should be the proper amount of space within a line of code to separate out the different parts. Separate actions should generally be on separate lines. Indentation should be used appropriately to group blocks of code.

With this principle, it’s actually the absence of code that makes things readable. This is a general principle of life–for example, if there was no space at all between letters and words in a book, it would be hard to read. On the other hand, it’s easy to see the moon against the clear night, because there’s a lot of clear black space that isn’t the moon. Similarly, when your code has the right amount of space in it, you can tell where and what the code is easily.

For example, this code is hard to read:

x=1+2;y=3+4;z=x+y;print"hello world";print"z is"+z;if(z>y+x){print"error";}

Whereas with the proper spacing in, around, and between the lines, it becomes easy to read:

x = 1 + 2;
y = 3 + 4;
z = x + y;
print "hello world";
print "z is" + z;
if (z > y + x) {
    print "error";
}

There can also be too much or wrong space, however. This code is also hard to read:

    x            =          1+        2;
y = 3            +4;


  z = x    +      y;
print    "hello world"         ;
 print "z is " + z;
if (z  >     y+x)
 {        print "error" ;
        }

Code itself should take up space in proportion to how much meaning it has.

Basically, tiny symbols that mean a lot make code hard to read. Very long names that don’t mean much also make code hard to read. The amount of meaning and the space taken up should be closely related to each other.

For example, this code is unreadable because the names are too small:

q = s(j, f, m);
p(q);

The space those names take up is very little compared to how much meaning they have. However, with appropriately-sized names, it becomes more apparent what that block of code is doing:

quarterly_total = sum(january, february, march);
print(quarterly_total);

On the other hand, if the names are too long compared to how much meaning they represent, then the code becomes hard to read again:

quarterly_total_for_company_x_in_2011_as_of_today = add_all_of_these_together_and_return_the_result(january_total_amount, february_total_amount, march_total_amount);
send_to_screen_and_dont_wait_for_user_to_respond(quarterly_total_for_company_x_in_2011_as_of_today);

This principle applies just as well to entire blocks of code as it does to individual names. We could replace the entire block of code above with a single function call:

print_quarterly_total();

And that is even more readable than any of the previous examples. Even though the name we used–print_quarterly_total–is a bit longer than our other names for things, that’s okay because it represents more meaning than other pieces of code do. In fact, it’s even more readable than our block of code was, by itself. Why is that? Because the code block took up a lot of space for, effectively, very little meaning, and the function takes up a more reasonable amount of space for the same meaning.

If a block of code takes up a lot of space but doesn’t actually have much meaning, then it’s a good candidate for refactoring. For example, here’s a block of code that handles some user input:

x_pressed = false;
y_pressed = false;
if (input == "x") {
    print "You pressed x!";
    x_pressed = true;
}
else if (input == "y") {
    if (not y_pressed) {
        print "You pressed y for the first time!";
        y_pressed = true;
        if (x_pressed) {
            print "You pressed x and then y!";
        }
    }
}

If that were our whole program, that would probably be readable enough. However, if this is within a lot of other code, we could make it more readable like this:

x_pressed = false;
y_pressed = false;
if (input == "x") {
    handle_x(x_pressed);
}
else if (input == "y") {
    handle_y(x_pressed, y_pressed);
}

And we could make it even more readable by reducing it to this:

handle_input(input);

Reading “handle_input” in the middle of our code is much easier than trying to read that whole first block, above, because “handle_input” is taking up the right amount of space, and the block is taking up too much space. Note, however, if we’d done something like h(input) instead, that would be confusing and unreadable because “h” is too short to properly tell us what the code is doing. Also, handle_this_input_and_figure_out_if_it_is_x_or_y_and_then_do_the_right_thing(input) would not only be annoying for a programmer to type, but would also make for unreadable code.

Naming Things

It was once said by a famous programmer that naming things was one of the hardest problems in computer science. These principles of readability give us some good clues on how to name things, though. Basically, the name of a variable, function, etc. should be long enough to fully communicate what it is or does, without being so long that it becomes hard to read.

It’s also important to think about how the function or variable is going to be used. Once we start putting it into lines of code, will it make those lines of code too long for how much meaning they actually have? For example, if you have a function that is only called once, on one line all by itself, with no other code in that line, then it can have a fairly long name. However, a function that you’re going to use frequently in complex expressions should probably have a name that is short (though still long enough to fully communicate what it does).

-Max

40 Comments

  1. Excellent “Space Coding Principle”.
    I think this is one chapter of your up-coming book.
    You may need more examples like comparing between two codes:

    Is it better to code

    SELECT *
    FROM table
    WHERE x = y

    or

    SELECT *
    FROM table
    WHERE x = y

    and why ?

    Thanks,

    Ahmed.

    • Thanks! Yeah, that’s one of my goals with all of my blogs, is to state exactly what these things that some of us “just know” are, and in doing so to also help us understand them more exactly and be able to communicate them to everybody.

      -Max

  2. A similar example where I find spacing makes a big difference in readability is lining up similar lines of code:

    firstX = rect.width + 5;
    lastX = rect.width + 50;
    firstY = rect.height + 10;
    lastY = rect.height + 20;

    This is especially useful when writing several lines of related formulas; it lets you see what is similar and different between them almost at a glance. Code like this almost begs to put into some kind of tabular format; wouldn’t it be nice if your IDE, code editor, or some other tool could do that for you?

    • There are editors doing that. I know TextMate does (Cmd-Alt-]), and unless my memory is really foggy, Emacs also gives you that option.

  3. Argh, it looks like the code tag doesn’t preserve whitespace like pre does, and it looks like pre isn’t allowed in comments. (Maybe you could add something like “code { white-space: pre }” to your css? And maybe add a “Preview Comment” feature to your blog?).

    Here is my code example again, with whitespace converted to underscores:

    firstX_=_rect.width__+__5;
    _lastX_=_rect.width__+_50;
    firstY_=_rect.height_+_10;
    _lastY_=_rect.height_+_20;

  4. I have never understood this “lining up” argument, although I have seen it over and over. All it does for me is make my eyes want to read the code vertically, when its semantic meaning is to gained horizontally. My code is not columnar, I don’t understand why it should be formatted as such.

    • To Michael Campbell:

      Actually, I think that you are hitting on the exact reason why this kind of vertical layout IS important — because it makes it clear when the semantic meaning is vertical rather than horizontal.

      Consider the code, as given:


      firstX_=_rect.width__+__5;
      lastX__=_rect.width__+_50;
      firstY_=_rect.height_+_10;
      lastY__=_rect.height_+_20;

      From the point of view of the compiler, we have issued 4 assignment statements, which may be completely unrelated to each other. But that is not the TRUE semantic meaning of the code. The true semantic meaning of the code is something closer to “set bounds on all 4 sides of the point given by the rect, with a certain amount of padding“.

      In a really expressive programming language we might be able to write this differently… something like this:


      xyBounds = boundsWithMargin(center=rect, leftPad=5, rightPad=50, topPad=10, bottomPad=20);

      I think we could all agree that this is much more clear, as well as being less prone to bugs, but unfortunately most of the time I use languages that aren’t powerful enough to use this clear a syntax. Lining up the fields in columns is a poor-man’s way of indicating that the lines are strongly related… are really just different pieces of the same larger process.

      • I agree, but even with expressive language syntax there are still cases where you will have several strongly related lines of code, e.g. in the implementation of your boundsWithMargin function, or in cases like this:


        xyBounds1 = boundsWithMargin(center=rect1, leftPad=5, rightPad=50, topPad=10, bottomPad=20);
        xyBounds2 = boundsWithMargin(center=rect2, leftPad=0, rightPad=15, topPad=20, bottomPad=10);

        I’m used to coding up a lot of mathematical formulas; that’s a case where you often have similar lines of code, and seeing the symmetry between them is important.

  5. I agree about white space management, but most of the time we don’t deviate much from the output of a pretty printer.

    I disagree, however, with your criterion for the length of the names. My rule would be to correlate the length of the name with the frequency of their use, and the scope of the named thing.

    The more frequent the use, the shorter the name. The shorter the scope, the shorter the name. For instance, a loop index can easily be one letter, because it’s a frequent idiom, and the scope is very short (inside the loop).

    There are things that bear much meaning, which nevertheless have short names. map and fold are a good example, quite meaningful, broad scope (they often are global names in a standard library), but very frequently used.

    Now this is nitpicking: there is indeed a very strong negative correlation between the complexity behind a name, and its usage frequency. So your rule will mostly work.

    • Hey Loup. I did talk about the frequency of the use of the name, if you look at the end of the blog.

      I disagree with “the shorter the scope, the shorter the name” as a general principle–sometimes even loop variables need longer names to disambiguate them (although I would agree that that is an unusual case).

      “map” and “fold” actually have very little meaning, relatively. Having a short name for them is fine.

      -Max

      • I’m with Loup on the “smaller scope lets you use shorter names” thing; When you can see every use of the variable in a single screenful of text, what’s the advantage of having a name longer than a letter or two? That’s just clutter. We use longer variable names for things with larger scope because we have less of the context in which the variable is used.

        I’m also rather astonished that you’d say “map” and “fold” have little meaning; that strikes me as like saying that the words “addition” and “multiplication” have little meaning. Map and fold are basic building blocks of operation on sets of values. (Admittedly, many programmers seem not to know about this, and prefer things like “for” loops which produce code that’s longer, more complex and more difficult to understand.

        cjs@cynic.net

  6. The other common misconception is that readability comes entirely from comments.

    Comments can often be useful, but like identifiers, they should be just long enough to communicate the vital meaning. Also, they should only be used when a clarifying statement is really necessary. Good code should largely speak for itself.

    When I see code which has a 10 line comment block above every function name, and a comment on every other line describing what it’s doing, I die a little inside. That’s not readable code. That’s just verbosity. The actual structure gets lost in the comments.

    • Yeah, it’s funny that you mention that. In my book, this whole article was originally a single paragraph talking about how space is important. Right below that paragraph is another paragraph talking about how comments should generally only explain *why* you did something, not *what* it is doing.

      I think people adding so many useless comments is a great example of how oversimplification actually leads to complexity, which I wrote about in a blog a while back as a single sentence in one paragraph, but haven’t really expanded upon since then.

      -Max

      • I mostly agree, but there are times when explaining what you’re doing is useful. Primarily, when dealing with very performance-critical pieces of code, where you’re forced to structure it in a bizarre and non-intuitive way, for maximum speed. Also, where you’re just doing something particularly complex. In these cases, the actual function of the code – what it does – may not be immediately apparent, no matter how good your identifier names and whitespace may be. In these cases, a comment that explains what you’re doing, not just what you’re doing, can be very valuable.

        I also find comments are sometimes handy for breaking up somewhat ungainly blocks of code. When you have a function that’s 200 lines long (sometimes it’s unavoidable), breaking it up into a few blocks and placing a short comment in front of each block, giving it’s basic purpose, can do wonders for readability. In this case, the comment is as much about visually spacing things out and breaking them up as about the actual content.

        But comments, like anything, can be easily overused, and a lot of people who make a lot of noise about good practices seem to champion comment strategies that just seem insane to me. They leave the code so cluttered with comments that it’s actual structure is lost.

        As a side note, I’m currently working on implementing the Perlin noise function (and moving on to Perlin simplex noise once I have that working). Reading through Ken Perlin’s reference implementation is painful. He would have benefitted greatly from this article. (See here: http://www.flipcode.com/archives/Perlin_Noise_Class.shtml)

        • Oh yeah, I totally agree with you that explanation is frequently required. For example, in Perl we use a lot of regular expressions, and those often need comments to be easily understandable.

          I agree about the breaking up a long function, too. I’ve had some of those, for sure. (Although I do usually manage to break them up at least a bit, but not always, it’s true.)

          Hahaha, that’s too bad about the Perlin algorithm. 🙂

          -Max

  7. Maybe you can rethink this a little bit.
    I’m pretty sure there is some cultural aspect to this topic.
    In other times (in medieval times eg there were no spaces between words)
    and in other cultures (Japanese language anyone?) are different “defaults” for spacings.

    Also it depends on the context.
    If you’re coding (and understand yourself) as a computer scientist you have the spacing context like
    a mathematic. If you’re a coding as a programmer you have a different context.

    See coding conventions in (for examples LISP-like ) Languages where the functions looks like they were taken straight from a scientific paper.

    Compare it with the java code conventions. Way more whitespace.

  8. With he commenting, I’m sadly in the middle of the group. I tend to explain at the top of the code what the basics are, when I wrote it, and what changes I made in case I need to revert at some point. Then, going though, try to comment code in blocks to say in ten words or less what the next block of lines are going to do. I think that’s pretty standard.

    One thing that slips through many times in the world of outsourcing, however, is to name your variables in the language of your client. It doesn’t matter how clear your names are if the individual supporting it doesn’t grasp your language. I had the pleasure of taking over support of a xBase program where a contracted employee ad written up several thousand lines of code for some serious data modeling, but did all the code with hungarian notation in a mix of English and Slovakian. Absolutely brilliant code, but having to debug with a slovak-english dictionary was quite time consuming.

  9. […] On the other hand, it is a real pleasure to crack open software that reads like prose. Even if I don’t understand the nuance or logic of every method, I get the gist by simply reading the variable, object and method names. The consistent application of style and indents make it look beautiful on the page as well as make the logical structure obvious. I admire code like this for the care and workmanship its author imparted to it. Readability is certainly one factor that goes into highly-crafted, artistic code. Here is a great blog on the readability of code with some good and bad examples: http://www.codesimplicity.com/post/readability-and-naming-things/. […]

  10. All the advice given in Clean Code book about naming convention holds good. I was initially confused with long but descriptive name, but it has it’s merit, only if you choose really good long name which form sentence.

  11. Reynolds Architecture takes pride in the relationships
    it has with the top fabricators, installers and manufacturers of everything from plumbing fixtures, widows, marble, ceramic tile, and custom cabinetry to name a few.

    It is that sturdy and durable that you will rarely have to take it to the service
    center for repairs. One of the best kitchen machines invented since the toaster, the
    stand mixer has become a standard in every cook’s kitchen.

    my web site: Electrolux Assistent Original

  12. Special massage and head therapy as included in Panchakarma is
    reported to improve the blood circulation within the male sexual organs and hence help to
    boost their health. Reducing oxidative stress, which is
    implicated in the urinary stone formation. It gets this property because of
    its cold potency action.

  13. As you may possibly all know eight Ball Pool by Miniclip is the biggest and very
    best multiplayer Pool match on-line! It offers you the capacity to perform for totally free
    against other opponents. It is challenging to get coins when you really don’t have the eight Ball final device 4.3 this is why you will need to
    have this instrument to show off in the multiplayer
    tournaments and can brag with the 8 ball ultimate hack four.3 you
    had been able to get all kinds of cues and table pores and skin.
    With this 8 Ball multiplayer cheat you capable to enter the
    pool shop and buying anything at all to personalize your profile to your eyes.
    With two clicks you will have Cost-free Miniclip Credits,
    Unlimited Free 8 Ball Multiplayer Pool Details, Free 8 Ball
    Multiplayer Awards and very last but not minimum Cost-free 8 Ball
    Pool Multiplayer Accuracy Hack. You will practically usually
    vehicle-acquire your 8ball game titles. With the eight Ball Pool Ultimate Hack
    4.three, You will no for a longer time have to invest your entire working day attempting to make all achievements or squander your beneficial cash on pool cues, tables or
    boosts! This eight Ball Multiplayer Hack will help save you treasured time by
    obtaining all what you need in a single simply click,
    You will be capable to change amount of cash you wish much.

    It gives a lot of features that no other tools will provide.
    What are you ready for? Make the most of eight Ball Pool Multiplayer!

  14. The AEDs are easy to use and are highly accurate in determining if a shock
    is needed. The course enables one to be well equipped professionally to become a certified American Heart ACLS practitioner.
    As you probably know, some varieties are deep fat fried and they have little nutritional value.

  15. Hmm it looks like your blog ate my first
    comment (it was super long) so I guess I’ll just sum it up what
    I submitted and say, I’m thoroughly enjoying your blog.
    I as well am an aspiring blog writer but I’m still new to everything.
    Do you have any helpful hints for newbie blog writers?

    I’d really appreciate it.

Leave a Reply