Measuring Developer Productivity

Almost as long as I have been working to make the lives of software engineers better, people have been asking me how to measure developer productivity. How do we tell where there are productivity problems? How do we know if a team is doing worse or better over time? How does a manager explain to senior managers how productive the developers are? And so on and so on.

In general, I tended to focus on focus on code simplicity first, and put a lower priority on measuring every single thing that developers do. Almost all software problems can be traced back to some failure to apply software engineering principles and practices. So even without measurements, if you simply get good software engineering practices applied across a company, most productivity problems and development issues disappear.

Now, that said, there is tremendous value in measuring things. It helps you pinpoint areas of difficulty, allows you to reward those whose productivity improves, justifies spending more time on developer productivity work where that is necessary, and has many other advantages.

But programming is not like other professions. You can’t measure it like you would measure some manufacturing process, where you could just count the number of correctly-made items rolling off the assembly line.

So how would you measure the production of a programmer?

The Definition of “Productivity”

The secret is in appropriately defining the word “productivity.” Many people say that they want to “measure productivity,” but have never thought about what productivity actually is. How can you measure something if you haven’t even defined it?

The key to understanding what productivity is is realizing that it has to do with products. A person who is productive is a person who regularly and efficiently produces products.

The way to measure the productivity of a developer is to measure the product that they produce.

That statement alone probably isn’t enough to resolve the problem, though. So let me give you some examples of things you wouldn’t measure, and then some things you would, to give you a general idea.

Why Not “Lines of Code?”

Probably the most common metrics that the software industry has attempted to develop have been centered around how many lines of code (abbreviated LoC) a developer writes. I understand why people have tried to do this—it seems to be something that you can measure, so why not keep track of it? A coder who writes more code is more productive, right?

Well, no. Part of the trick here is:

“Computer programmer” is not actually a job.

Wait, what? But I see ads all over the place for “programmer” as a job! Well, yes, but you also see ads for “carpenter” all over the place. But what does “a carpenter” produce? Unless you get more specific, it’s hard to say. You might say that a carpenter makes “cut pieces of wood,” but that’s not a product—nobody’s going to hire you to pointlessly cut or shape pieces of wood. So what would be a job that “a carpenter” could do? Well, the job might be furniture repair, or building houses, or making tables. In each case, the carpenter’s product is different. If he’s a Furniture Repairman (a valid job) then you would measure how much furniture he repaired well. If he was building houses, you might measure how many rooms he completed that didn’t have any carpentry defects.

The point here is that “computer programmer,” like “carpenter,” is a skill, not a job. You don’t measure the practice of a skill if you want to know how much a person is producing. You measure something about the product that that skill produces. To take this to an absurd level—just to illustrate the point—part of the skill of computer programming these days involves typing on a keyboard, but would you measure a programmer’s productivity by how many keys they hit on the keyboard per day? Obviously not.

Measuring lines of code is less absurd than measuring keys hit on a keyboard, because it does seem like one of the things a programmer produces—a line of code seems like a finished thing that can be delivered, even if it’s small. But is it really a product, all by itself? If I estimated a job as taking 1000 lines of code, and I was going to charge $1000 for it, would my client pay me $1 if I only delivered one line of code? No, my client would pay me nothing, because I didn’t deliver any product at all.

So how would you apply this principle in the real world to correctly measure the production of a programmer?

Determining a Valid Metric

The first thing to figure out is: what is the program producing that is of value to its users? Usually this is answered by a fast look at the purpose of software—determine what group of people you’re helping do what with your software, and figure out how you would describe the result of that help as a product. For example, if you have accounting software that helps individuals file their taxes, you might measure the total number of tax returns fully and correctly filed by individuals using your software. Yes, other people contribute to that too (such as salespeople) but the programmer is primarily responsible for how easily and successfully the actual work gets done. One might want to pick metrics that focus closely on things that only the programmer has control over, but don’t go overboard on that—the programmer doesn’t have to be the only person who could possibly influence a metric in order for it to be a valid measurement of their personal production.

There could be multiple things to measure for one system, too. Let’s say you’re working on a shopping website. A backend developer of that website might measure something about the number of data requests successfully filled, whereas a frontend developer of a shopping cart for the site might measure how many items are put into carts successfully, how many people get through the checkout flow successfully every day, etc.

Of course, one would also make sure that any metric proposed also aligns with the overall metric(s) of the whole system. For example, if a backend developer is just measuring “number of data requests received at the backend” but not caring if they are correctly filled, how quickly they are filled, or whatever, they could design a poor API that requires too many calls and that actually harms the overall user experience. So you have to make sure that any metric you’re looking at, you compare it to the reality of helping your actual users. In this particular case, a better solution might be to count, say, how many “submit payment” requests are processed correctly, since that’s the end result. (I wouldn’t take that as the only possible metric for the backend of a shopping website, by the way—that’s just one possible thought.)

What About When Your Product Is Code?

There are people who deliver code as their product. For example, a library developer’s product is code. But it’s rarely a single line of code—it’s more like an entire function, class, or set of classes. You might measure something like “Number of fully-tested public API functions released for use by programmers” for a library developer. You’d probably have to do something to count new features for existing functions in that case, too, like counting every new feature for a function that improves its API as being a whole new “function” delivered. Of course, since the original metric says “fully tested,” any new feature would have to be fully tested as well, to count. But however you choose to measure it, the point here is that even for the small number of people whose product is code, you’re measuring the product.

What About People Who Work on Developer Productivity?

That does leave one last category, which is people who work on improving developer productivity. If it’s your job to help other developers move more quickly, how do you measure that?

Well, first off, most people who work on developer productivity do have some specific product. Either they work on a test framework (which you would measure in a similar fashion to how you would measure a library) or they work on some tool that developers use, in which case you would measure something about the success or usage of that tool. For example, one thing the developers of a bug tracking system might want to measure is number of bugs successfully and rapidly resolved. Of course, you would modify that to take into account how the tool was being used in the company—maybe some entries in the bug tracker are intended to live for a long time, so you would measure those entries some other way. In general, you’d ask: what is the product or result that we bring about in the world by working on this tool? That’s what you’d measure.

But what if you don’t work on some specific framework or tool? In that case, perhaps your product has something to do with software engineers themselves. Maybe you would measure the number of times an engineer was assisted by your work. Or the amount of engineering time saved by your changes, if you can reliably measure that (which is rarely possible). In general, though, this work can be much trickier to measure than other types of programming.

One thing that I have proposed in the past (though have not actually attempted to do yet) is, if you have a person who helps particular teams with productivity, measure the improvement in productivity that those teams experience over time. Or perhaps measure the rate at which the team’s metrics improve.

For example, let’s say that we are measuring a product purely in terms of how much money it brings in. (Note: it would be rare to measure a product purely by this metric—this is an artificial example to demonstrate how this all works.) Let’s say in the first week the product brought in $100. Next week $101, and next week $102. That’s an increase, so it’s not that bad, but it’s not that exciting. Then Mary comes along and helps the team with productivity. The product makes $150 that week, then $200, then $350 as Mary continues to work on it. It’s gone from increasing at a rate of $1 a week to increasing at a rate of $50, then $100, then $150 a week. That seems like a valid thing to measure for Mary. Of course, there could be other things that contribute to that metric improving, so it’s not perfect, but it’s better than nothing if you really do have a “pure” productivity developer.

Conclusion

There are lots of other things to know about how to measure production of employees, teams, and companies in general. The above points are only intended to discuss how to take a programmer and figure out what general sort of thing you should be measuring. There’s a lot more to know about the right way to do measurements, how to interpret those measurements, and how to choose metrics that don’t suck. Hopefully, though, the above should get you started on solving the great mystery of how to measure the production of individual programmers, teams, and whole software organizations.

-Max

14 Comments

  1. This is a nice blog, as a software developer I can track on my self development day-to-day basis, to become more productive. Thanks for writing.

  2. Great blog…i will define more way this one is given an unknown piece of code in a programming language a programmer understands, and a bug report, how fast can they understand the source of the bug and fix it.How fast can a person debug problems and How well is a programmer able to work on a task in a focused manner by this we can judge Developer Productivity…though nice article thanks for sharing

  3. Great Post! You skirt around a key point in your post and that is that productivity only matters if value is derived from the end result. Said another way, if what is built is not useful and does not provide value to the target audience then it does not matter how productive someone is.

    I started https://www.codemonkey.ai to measure developer productivity by connecting into the entire software development toolchain and measuring developers based on the work items (Features & Bugs from Issue Tracking Systems) they complete via Commits (Source Control) and the impact those changes have to users (Application Performance Management/Logging).

  4. Really great article. Determining a valid metric has always been the most difficult part for our company since we work on so many different types of projects (and usually with various partners who may influence certain metrics).

  5. Great article. Got me started thinking…

    I know the metrics you described are just examples, but are you really measuring productivity when you look at how many items are put into the shopping cart successfully or how many people get through a checkout process without error? That’s more like measuring the quality or success of the product and not how regularly and efficiently the team produces products (or product increments).

    Dont’ get me wrong: I fully agree that a team should be measured by such success metrics! The team should probably even be motivated to achieve good success metrics by providing incentives like gamification mechanics or plain bonuses on their paycheck.

    But to measure productivity I would rather fall back to metrics like “number of features put into production per week” or “number of bugs fixed per week”. And to motivate the team to make those features and bug fixes high quality (instead of just quick and dirty just to be productive), I would combine them with a success metric like “low error rate” or “successfully finished business transactions” like those checkouts.

    However, I have no clue yet how to implement that combination of productivity and success metrics in a specific project. I will think on that a little longer… :). Thoughts anybody?

    • “Number of features put into production per week” and “number of bugs fixed per week” are both traditional metrics that have been tried for decades and fail for various reasons.

      If you look up “function points” you will find the most complex system ever devised for trying to metricize “features.” It is so complex that nobody can actually do it reliably, making it a poor metric. Less complex systems fail because “a feature” is not a well-defined object, and “features” all come in different sizes.

      Measuring the number of bugs fixed is problematic because people don’t track them well, different bugs are of different sizes, some bug fixes are more valuable to users than others, sometimes it’s more valuable to do feature work, people game the metric, etc. There can be reasons to keep track of the number of open issues and the rate at which they are getting handled, but it should not be used to measure developer productivity.

      If you really do want to know productivity, you have to define the word itself and measure that, and you will end up after much searching over many years with the blog post that I wrote above. I would guess that between me and others, I have been involved in hundreds of person-years spent on this problem, and what I wrote above is the best solution I’m aware of.

      -Max

  6. What would you do with such a metric? Start from any objective and I believe I can find something that will come closer to achieving that objective than a faux productivity metric.

Leave a Reply