Commit History: Your Project Only Real Documentation

In today’s fast paced world of software development more and more teams are working in lean and agile mode. Maintaining a project documentation of low-level designs is most often considered a waste when following those development methodologies. But even if your team does not follow a strict development process, it is hard and tedious work to keep up your design documents up-to-date with your code base. It is hard to force a team of 50 people to go update a wiki page every time they make a change in the code. And what is the value of that? Barely anyone reads those papers even if they actually exist. There are much more pleasant ways to onboard new people or gain knowledge in a new system. We can do pair programming, ask questions in Slack channels, chat with a co-worker over a beer, read the source code. Also writing some new tests could be much more enjoyable experience than reading through tons of outdated design documents.

A design document is outdated even before it is implemented.

In short: the only artefacts that stand the test of time are your Git commit history and your tests suite. If crafted well these could serve as the most extensive up-to-date documentation of your project design decisions and business use cases. In this blog post I will focus on the first aspect - the importance of clean Git comit history.

As this is not a Git tutorial I would consider that the audience has some basic knowledge of working with Git. If Git techniques such as cherry-picking, rebasing, interactive rebasing, which are used to build the commit history that we want, do not ring any bells with you, it would be much more beneficial for you to get familiar with Git and then re-visit this blog post to get the most out of it.

Craft your Git history just like you craft your code.

Git commit messages are a mean of communication within the team. But even if you work alone on a project they are still a mean of communication between your current and your future self. Write them with utmost care. You write readable code because code is written once but read many times after. The same holds true for Git commit messages: they are written once but read many times later on. You can easily see what the whole dev team is doing by simply browsing through the Git commit history.

What is a good Git commit message?

A commit should ALWAYS have a message body

Always, Always, Always! Even for small commits you can elaborate on why you are doing that change. Do not try to explain what the code does. That should be clear from the code itself. In case it is not - then re-work that code. Do not focus on WHAT, focus on WHY. For example, saying “Fix a minor style issue” is not helpful at all. Explain what was the issue and why it is fixed in that particular way. Why now?

Don’t leave “Fix the build” commits in your project history. Re-write your commit history and squash them. Use fixup commits and autosquash delegating the heavy lifting to git.

Never consider writing a longer commit message as a waste. Most probably you will be the one trying to get more context about a change after several months while debugging issues and you will thank your past self a lot for spending the time on writing a good explanatory message. Git blame with good messages will give you all the context you need to truly understand the problem and solve it quickly and cleanly. So in case you are using git commit -m now may be the right time to troll your own shell and alias that into something that slaps you :)

A commit message should reveal the intent.

Keep the title short (where short is under 50 characters). That helps a lot when someone does git log --oneline as he could quickly skim through small focused messages to quickly find what is needed. Once the commit is found by title he or she could simply git show that commit and get all the context one need about a change. Try to save yours and your colleagues time when searching and reading through commit history.

A commit should be small focused on one change only

Again - the same principles as with writing good code: keep things small. Small is good, small fit in your head, small is simple. As you want to have small objects and functions that do one thing only, you should should do the same with commits. Keep them small and focused. One commit should describe one change only. Long commits are hard to manage (rebase, merge, resolve conflicts). Long commits fail to provide you with context when doing git blame. As you have to read through a whole lot and filter out only what you need. It will take you much longer time to get into the context of a change if that commit message contains numerous other changes as well. Long commits result in large pull requests and those are a pain to review as well as a pain to merge.

Include a link to the requirements document

It is a good practice to link a ticket or a card holding the client requirements to a pull request (or commit message in case you do not have a code review process in place). Anyone who needs more context than what he or she already found in the commit messages can follow that reference to get the full picture including comments in the ticket, discussions and scoping documents attached to it.

You want to be as much descriptive as possible in your commit messages but still if your commit messages are several pages long then no body will read them. And even if they do, it will be a waste of their time to read through tons of text only to filter out the valuable information. Save your time. Save your teammates time. Do not copy and paste the scoping documents or Jira/Trello card descriptions into commit messages. Simply refer to them. So that anyone who needs more context than what is already present in the commit message can go to that card or document and get it from there. Keep the commit message related to the small change it introduces. You don’t need to tell the whole story here. You can either link it or tell it in the pull request description. A long commit message is a smell. It points out that may be too much is being done in a single commit.

But beware! A link to an external resource is not a substitute for a good commit message. Project tracking software could get replaced by another one and all the links become broken. Anyone doing git blame on the code base should be able to understand “why” that change was done without going to external sources. Only if that is not enough and a developer needs the whole picture about the feature, then he could follow external references to collect that info.

Stay concise when writing a commit message.

How to achieve a clean Git commit history?

First and foremost - learn Git. Pro Git is an excellent book on Git. There many videos and tutorials out there. Even if you know the basics, go beyond that. Make sure interactive rebasing is something you do naturally without efforts. Always review your pull request commit messages before merging into master. Re-write your commits with interactive rebasing and force push your feature branch as much as needed until you are completely happy with the story your commits tell about your changes. All should read like a story. It will help reviewers, it will help your team. The time taken is never wasted. It is a very good way of communication and knowledge transfer within your team. Even team members that are not required as reviewers will understand what you are doing and most importantly why you are doing it.

Communicate as if you are a remote company

Being part of a remote company teaches that doing communication right is very important. Things need to be written down as people do not share an office and cannot always communicate directly. You cannot just go over somebody’s shoulder, tap him, interrupt his work and ask him why the heck he has done that thing in that way. Pull requests and commit messages are a great way to communicate with your team and to keep everybody on the same page. So, even if you are not a remote company, you may still benefit from all the good practices employed in a remote company. Keep in mind that the person who wrote that code won’t be always available near you to tap him on the shoulder and ask him “why”. Keep in mind that the person who does not understand the code after a year could be you. And you will very much thank your past self for writing a good explanatory commit message.

An example: Git add –patch

I spend a lot of thought on what should be a good example of crafting clean history as one may easily write a book full of examples. Writing example commit messages may not be that beneficial or interesting. Yet another interactive rebase tutorial is also not needed. So instead I decided to focus on something else that I see often more and more developers not doing it right - splitting changes.

Spliting changes that reside in a single file

Sometimes people introduce two separate changes in a single file while consumed in the working process. Later on when reviewing your commit messages and making sure yout commit history is clean, they notice that a commit message describe two distinct changes. So they decide to split them up. So far so good. The issue arise when the changes to be split are in the same file. That feels unnatural and hard to fit in one’s head as it would mean that one file has to be both staged and unstaged at the same time. How that could be even possible? Let’s see.

Once we are done with our changes we stage and then commit them. However, we may need a bit more control over staging as git add could be a bit more coarse than what you need. Git add --patch to the rescue. That will present you with an interface asking you how to deal with each change. Your changes will be split in hunks and Git will prompt you what to be added and what to be left behind. Hit h at this point and you will see all available commands with their descriptions.

The most common that you would want to remember and use regularly are:

  • y => stage hunk
  • s => split hunk
  • n => leave unstaged

So far so good. You use those three commands to split your changes as a pro. But what if the lines that you want to split are too close together and Git does not know how to split them. For example, you have:

this line goes into commit #1
this line goes into commit #2

Git does not know how to split this “hunk” as the lines are adjacent and to Git that looks like a single hunk. At this point you need like a secret editor where you can manually tell Git which line goes into staging in a line by line mode. Luckily Git provide us with such a secret editor. Simply hit e and you will enter editor mode where you can “edit” what is staged and what is unstaged line by line. In short: space re-adds “-” lines, i.e. they won’t be deleted, while deleting “+” lines removes them, i.e. won’t be staged. Help is inlined into that editor at the bottom so you don’t need to remember those as I don’t and could have messed them up in here :)

# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.

To see whether or not you have staged the right changes you can do: git diff --cached. If everything looks great then you can do the first commit from what you have staged. Proceed in the same way with the next set of changes. And please - don’t freak out if you see the same file both as staged and unstaged. That’s normal :)

Remember

Your project is more than just the code. The code only tells you what the software is doing. But it is not telling you why the system is behaving in that way. It takes time and effort a team to keep a clean project history. But I hope I’ve convinced you that this time is worth it.

Takeaway

Craft your commits as you craft your code. Keep them small, focused and revealing intent.

References: