Heuristics, Self-Improvement, Self-Learning, Testing

Legacy code retreat

Have you ever worked with code that literally brought tears to your eyes? Not in the good sense, mind you. I’m talking about code that is such a hassle to work with it makes you rethink some of your career choices. If that’s the case, a legacy code retreat might be just what you need to stop your fear of legacy code and instead start to appreciate the opportunities for improvement it provides. Legacy code can be a joy to work with, if you tackle it the right way.

A couple of weeks ago I attended a legacy code retreat facilitated by Erik Talboom. This event has the same format as a regular code retreat but the focus lies less on good software design and practices like Test-Driven Development (TDD) and more on specific techniques to safely work with legacy code. You, like me, may have heard wonderful things about clean code, unit testing and TDD and might be tempted to try out these techniques. Mastering these techniques is a daunting task on its own. Add to that that you’re probably working on an existing codebase that was not written with these things in mind and you are faced with an even greater challenge. In this case a legacy code retreat might be just what you need to turn the tide and finally get started.

Legacy Code?

How exactly do we define legacy code? Right now you might be picturing some ancient application that’s humming along on a mainframe in the basement of your office. Michael Feathers (who wrote the book on legacy code) states that all code that is not covered by automated tests can be considered legacy code. Record scratch. That’s quite the bold statement, but there is some truth in it. I for one feel a whole lot more comfortable when I’m working with code that is properly covered by automated checks to make sure I don’t break any existing behaviour. Other people have a more relaxed defintion of the term, namely profitable code that we’re afraid to change. So what happens when we are asked to fix a bug or implement a new feature in profitable code that we’re afraid to change? That’s where these legacy code rescue techniques come into play. Instead of learning these techniques on the job, a legacy code retreat provides the ideal training ground to practice them.

Before we dive into the nitty-gritty details of how to tango with legacy code, I’ll issue a reminder that I keep on my monitor every time I work with legacy code: always keep in mind that everyone is trying their very best given their current context, knowledge and time constraints. Bob wasn’t trying to screw you over when he wrote the code you’re fighting with today. Alice was under a lot of pressure when she pushed that quick hack to production. Instead of yelling out “What idiot wrote this piece of crap?!”, try to remember we’re all still learning. We’ve all caved under the pressure to deliver and set aside our disciplines at some point in time.

Outline of the day

A legacy code retreat differs from a standard code retreat. The focus lies less on test-driving new code based on the four rules of simple design and more on retrofitting characterization tests on a tangled codebase (J.B. Rainsberger’s horrific trivia codebase which was specifically written for this purpose) and later safely refactor it. A legacy code retreat places no typical constraints on sessions like regular retreats. Rather, each session focuses on one of the techniques to work with legacy code which we’ll discuss below. After each of the sessions everyone participates in a small retrospective. At the end of the day there’s a larger retrospective to summarize everything we’ve learned and to discuss what we’ll be taking with us to our regular jobs the next day.

The day started off with a session that was new for me: code reading. We split up in pairs and spent the first half hour just reading the code. One person acted as the reader and narrated the other through his train of thought as he tried to make sense of the code. The other half of the pair listened intently and took notes. I found this session particularly enlightening. It was the first time I saw another developer trying to make sense of a completely new codebase. Some people started at the program’s Main method, others randomly started opening classes to try and make sense of it all. This reminded me of the times I joined an ongoing project and had to hit the ground running by fixing some low-impact bugs. Another thing that struck me as odd was that no one actually tried to build or even run the code. That’s how I generally get started on an unknown codebase that has zero test coverage and isn’t immediately suited for characterization tests; I just randomly click buttons (preferably following some manual regression testing scenarios) and try to follow the entire application flow through the debugger.

The next sessions were more traditional. We added golden master tests as black box end-to-end tests before attempting to refactor the codebase. A golden master is a set of outputs (like log statements or print-outs of system state) generated by random but reproducible stimuli to the system. Whenever we change the system we can re-run our golden master tests which verify the current output with the verified “golden master” results to make sure we didn’t break anything while refactoring. We used ApprovalTests to quickly generate a large amount of human-readable golden masters and allow for automated verification against the golden masters. We also checked code coverage with NCover to verify that our golden masters provided a large enough safety net.

Why go through the trouble of writing these black box end-to-end tests? As Hamlet D’Arcy put it bluntly: refactoring without tests is just changing sh*t. Simply put: Automated tests are a precondition to refactor safely. I prefer fine-grained unit tests over these high-level integrated tests myself, which provide much more valuable feedback when things break. But as you may have experienced yourself: getting legacy code unit tested sometimes requires invasive changes to the code. There’s always a chicken-and-egg moment when covering a system with tests: you probably have to change the production code in some way to allow for repeatable automated tests, but ideally you would like a safety net of tests before you start touching the code. The golden master technique allows you to get your system under test without performing too much invasive surgery on the production code. This safety net allows you to make more changes later on that allow the code to be properly unit tested. In most cases, you’ll have to break a dependency that makes the current code impossible to unit test, like accessing a real database or an external web service. I like to apply Michael Feathers’ advice here: try to minimize unsafe “pre-factorings” in order to introduce the basic seams that allow for unit testing.

After we had our safety net of golden masters in place we started refactoring the codebase by simply replacing magic numbers with constants and extracting code into explanatory variables. We moved over to extracting methods and pure functions and other techniques to better reveal the intent of the code. It’s amazing how a few well-chosen names can increase a program’s understandability.

We even had a session where we combined TDD as if you meant it with the extract pure function refactoring. This was an interesting exercise but we just ended up rewriting part of the codebase, this time with proper unit test coverage.

We did not touch on some legacy-code classics such as extract-and-override to isolate a dependency that makes a class hard to test, a technique I rely upon heavily as an intermediary refactoring step when dealing with legacy code at my current project. Instead, we went directly for full-on dependency inversion through constructor injection. Whenever the refactoring to constructor injection seems too big a step, I like to take a smaller step with extract-and-override.

So, what’s in it for me?

Is it worth sacrificing an entire saturday to refactor an unknown tangled codebase with total strangers? Below I’ll list some points that for me personally make a legacy code retreat worth the investment.

You’ll get to practice the basic techniques to handle legacy code in a safe and welcoming environment. You will get to learn about and effectively practice techniques like golden master, introduce explaining variable, extract pure function, extract-and-override, dependency inversion,… There’s no looming deadline or other external pressure during a code retreat, so you can focus on the techniques themselves and perfect the motions. Next time you encounter a situation during your day job, you just have to go through those motions again.

You’ll pair and learn with complete strangers. It’s always fascinating to work with other developers. Perhaps you already practice pair programming in your day-to-day team. That’s great, but I can guarantee that you’ll still learn something new when you pair with a complete stranger. If you don’t practice pair programming at work, prepare to have your mind blown. It’s a really fun way to crank out quality code and it’s one of the fastest ways to share knowledge that I have experienced. It might also be a very humbling experience for some people. The first time I saw someone that really knew his IDE inside-out write code, I was awestruck and practiced on code katas for months on end to become just as proficient with mine.

You’ll get to meet passionate developers. Just like you, all these people are sacrificing their saturday to practice writing code. If that doesn’t say something about their motivation, I don’t know what will. These are the people whose noses you’ll want to pick about how to write code properly.

It’s free. You’ll only be paying for lunch and drinks afterward. That’s hands-down the cheapest technical training I have ever attended.

Takeaways

Below I list some of the most interesting points we discussed at this legacy code retreat.

1. Know your tools.

I had never heard of tools like ApprovalTests, NCover or NCrunch before my first legacy code retreat. Just knowing what these tools can do for you might be enough to finally get some nasty legacy code that keeps sprouting new bugs under test.

It’s not just important to know these tools exist. You have to be efficient with them. I intentionally practice with these tools on code katas so that when I recognize a problem in on the job, I can quickly fall back on something I’ve worked with before.

Even something as simple as mastering your IDE’s shortcuts can take you a long way. The moment you don’t need your mouse when writing code your brain starts to become the new bottleneck. That’s a good thing.

2. Take small steps.

This is sound advice in the whole field of software development, but doubly so when working with legacy code. I can’t keep track of all the times I started making a “tiny change”, only to notice several hours later -and without being able to compile- I made a mistake somewhere in the process and have no option but to throw away all that work or start debugging my way out of the problem. Micro-committing is key here. A distributed version control system like git can also help here to avoid cluttering the change history, as will a suite of lightning-fast automated tests to keep your feedback cycles nice and short.

3. (Temporary) duplication is your friend.

When working with particularly nasty legacy code, duplication is your friend. Many developers start incessantly DRY-ing up code the moment duplication arises. When working with legacy code, this might be a sub-optimal approach.

As an example, imagine the following situation: you need to add an argument on a method that’s being called many, many times in the codebase. Rather than performing the automatic “add parameter”-refactoring your IDE supports, consider just copy-pasting the original method and creating an overloaded method. This way, you avoid having to migrate all clients of this method at the same time before being able to even compile. Now you can move one client at a time over to the new method. Also, you are able to compile in between and run your tests to make sure you haven’t broken anything. This helps keep your steps small. Once all clients have been migrated you can safely remove the old method.

Fun fact: if you pick up your dusty copy of Fowler’s book on refactoring, many of the refactorings described utilize this copy-paste-then-migrate-clients-one-by-one strategy. We’re not inventing anything new here, but it’s easy to forget the exact mechanics if you have tools that do all the heavy lifting for you. It’s nice to take a step back once in a while and remember how things work under the hood.

4. Tension between refactoring freedom and test feedback.

Adding golden master tests is a pretty safe step. The changes required to the production codebase are minimal compared to getting a piece of code properly isolated from external dependencies to get it under unit test coverage. Golden masters are however slow as they exercise the entire system and provide little feedback to pinpoint exactly where a bug was introduced when the tests fail.

Isolated unit tests on the other hand provide very detailed feedback when they fail and run blazingly fast. When a piece of code was not written with testability in mind however, it might require Herculean effort to get it under test.

Since unit tests are finer-grained, they are coupled tighter to internal implementation details of the system compared to larger-scale tests like golden masters. This results in having to fix or change unit tests every once in a while when performing large-scale refactorings.

A lot has been written about this tension between feedback and refactoring freedom. Proponents for all possible combinations can be found (only unit test, only test  at the public API, a combination of both). Personally, I still haven’t found the sweet spot on my projects, but I prefer to go for isolated unit tests whenever I have the option.

5. Test-first is easier than test-after.

Getting legacy code under test requires a very specific skillset and is more difficult than writing tests before you churn out new code. This makes the code testable by design, which in turn tends to highlight design problems very early in the development process. If you haven’t given test-first programming a shot, I strongly recommend that you try it out on your next pet project. If it turns out not to be your cup of tea, at least you can say that you gave it a shot.

6. Learning TDD & unit testing takes time but is well worth the investment.

99% of all developers I talk to think that yes, unit testing sounds like good idea but no, I don’t really have the time for that on my current job. There’s a little graph I like to draw in my head every time I get this response:

Drowning in technical debt vs. virtuous cycle.

Drowning in technical debt versus a virtuous cycle of continuous improvement

Those developers are absolutely right. It’s yet another thing to learn next to a new hot language like F# or a new architectural style like event sourcing. But time and time again I’ve seen the delivery of features grind to a screeching halt when a project grows to a certain size and technical debt starts to pile up. It becomes nearly impossible to introduce a new feature in the existing design without weeks of (re)work and a guarantee that you will break some existing functionality you won’t even know about until some of your users start complaining. It might take months to be able to practice these techniques proficiently or even years to master. In the beginning, you will be less productive than before. Once I struggled through this initial hurdle however I did start feeling the benefits. These techniques really start paying themselves back on a long-term project. And let’s be honest, what enterprise software today is written with an expected lifetime of less than a month? Less then a year even? Moreover, these techniques are universal. Once you know how to design clean code or how to write a proper unit test, you carry this knowledge with you wherever you go. Pay once, benefit forever!

Conclusion

When armed with the right tools and skills, legacy code can be quite fun to work with. A legacy code retreat provides the perfect opportunity to learn how to work with legacy code. Better yet, instead of adding to the problem limit the amount of legacy code you write yourself. Instead of writing legacy code, aim to leave a legacy you can be proud of.

See you at the next legacy code retreat?

Featured image by threepanelsoul.

Standard

2 thoughts on “Legacy code retreat

  1. Pingback: Starting in an ongoing software project | jvaneyck

  2. Pingback: ต้องทำอย่างไรดีเมื่อต้องไปแก้ไข Legacy Code ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.