Commit History: Your Project Only Real Documentation


It requires a good amount of hard work to keep up your internal design documents up-to-date as your codebase evolves. It is even harder to enforce that practice within a large team. Sometimes people forget to update a wiki page after making a change to the code. Sometimes they do not update it on purpose as they believe no one would ever read those documents. And sometimes they are just being lazy.

To put things in perspective, I am talking here only about the project’s internal design documents that are used by the team to communicate implementation details and architectural decisions. I am NOT talking about documenting a public API.

A design document is outdated even before it is implemented.

Obsolescence, the process of becoming obsolete or outdated, is what happens to your design documents right after they are “done”. A design document is never finished. It should change as the implementation changes. But does it? The only artifacts that stand the test of time are your commit history and your test specifications. If crafted well these two could serve as the most extensive up-to-date documentation of your project design decisions and business rules. In this blog post, I will focus on the first aspect - the importance of a clean Git commit history.


I will use Git here for the examples as the most popular source code control system, but the principles discussed are fundamental and apply to any code management system you may use. As this is not a Git tutorial I would consider that the audience has some basic knowledge of working with Git. If Git techniques such as cherry-picking, rebasing, interactive rebasing, which are used to build the commit history that we want, do not ring any bells with you, it would be much more beneficial for you to get familiar with Git and then re-visit this blog post to get the most out of it.

Craft your Git history just like you craft your code.

Good commit messages

Git commit messages are means of communication within the team. But even if you work alone on a project they are still means of communication between your current and your future self. Write them with utmost care. You write readable code because code is written once but read many times after. The same holds true for Git commit messages – they are written once but read many times later on. You can easily see what the whole dev team is doing by simply browsing through the Git commit history.

A commit should ALWAYS have a message body

Always, always, always! Even for small commits, you can elaborate on why you are doing that change. Do not try to explain what the code does. That should be clear from the code itself. In case it is not - then re-work that code. Do not focus on WHAT, focus on WHY. For example, saying “Fix a minor style issue” is not helpful at all. Explain what was the issue and why it is fixed in that particular way. Why now? Don’t leave “Fix the build” commits in your project history. Re-write your commit history and squash them. Use fixup commits and autosquash to let Git do that work for you.

Never consider writing a longer commit message as a waste. Most probably you will be the one trying to get more context about a change after several months while debugging issues and you will thank your past self a lot for spending the time on writing a good explanatory message. Git blame with good messages will give you all the context you need to truly understand the problem and solve it quickly and cleanly. So in case you are using git commit -m now may be the right time to troll your own shell and alias that into something that slaps you across the face.

A commit message should reveal the intent.

Keep the title short (where short is under 50 characters). That helps a lot when someone does git log --oneline as he could quickly skim through small focused messages to quickly find what is needed. Once the commit is found by the title he or she could simply git show that commit and get all the context one needs about a change. Try to save your and your colleagues time when searching and reading through commit history.

A commit should be small focused on one change only

The same principles as with writing good code: keep things small. Small is easy to understand, small fits in your head, small is simple. As you want to have small objects and functions that do one thing only, you should do the same with commits.  Keep them small and focused. One commit should describe one change only. Long commits are hard to manage (resolve conflicts). Long commits fail to provide you with context when doing git blame. As you have to read through a whole lot and filter out only what you need. It will take you a much longer time to get into the context of a change if that commit message contains numerous other changes as well. Long commits result in large pull requests and those are a pain to review as well as a pain to merge.

It is a good practice to link a ticket or a card holding the client requirements to a pull request (or commit message in case you do not have a code review process in place). Anyone who needs more context than what he or she already found in the commit messages can follow that reference to get the full picture including comments in the ticket, discussions, and scoping documents attached to it.

You want to be as much descriptive as possible in your commit messages but still, if your commit messages are several pages long then nobody will read them. And even if they do, it will be a waste of their time to read through tons of text only to filter out the valuable information. Save your time. Save your teammates time. Do not copy and paste the scoping documents or Jira/Trello card descriptions into commit messages. Simply refer to them. So that anyone who needs more context than what is already present in the commit message can go to that card or document and get it from there. Keep the commit message related to the small change it introduces. You don’t need to tell the whole story here. You can either link it or tell it in the pull request description. A long commit message is a smell. It points out that maybe too much is being done in a single commit.

But beware! A link to an external resource is not a substitute for a good commit message. Project tracking software could get replaced by another one and all the links become broken. Anyone doing git blame on the codebase should be able to understand “why” that change was done without going to external sources. Only if that is not enough and a developer needs the whole picture about the feature, then he could follow external references to collect that info.

How to achieve a clean commit history?

First and foremost - learn Git. Pro Git is an excellent book on Git. There many videos and tutorials out there. Even if you know the basics, go beyond that. Make sure interactive rebasing is something you do naturally without effort. Always review your pull request commit messages before merging into master. Re-write your commits with interactive rebasing and force push your feature branch as much as needed until you are completely happy with the story your commits tell about your changes. All should read like a story. It will help reviewers, it will help your team. The time taken is never wasted. It is a very good way of communication and knowledge transfer within your team. Even team members that are not required as reviewers will understand what you are doing and most importantly why you are doing it.

Communicate as a remote company

Being part of a remote company teaches that doing communication right is very important. Things need to be written down as people do not share an office and cannot always communicate directly. You cannot just go over somebody’s shoulder, tap him, interrupt his work and ask him why the heck he has done that thing in that way. Pull requests and commit messages are a great way to communicate with your team and to keep everybody on the same page. So, even if you are not a remote company, you may still benefit from all the good practices employed in a remote company. Keep in mind that the person who wrote that code won’t be always available near you to tap him on the shoulder and ask him “why”. Keep in mind that the person who does not understand the code after a year could be you. And you will very much thank your past self for writing a good explanatory commit message.

An example: git add –patch

I spend a lot of thought on what should be a good example of crafting clean history as one may easily write a book full of examples. Writing example commit messages may not be that beneficial or interesting. Yet another interactive rebase tutorial is also not needed. So instead I decided to focus on something else that I see often more and more developers not doing it right - splitting changes.

Splitting changes that reside in a single file

Sometimes people introduce two separate changes in a single file while consumed in the working process. Later on, when reviewing your commit messages and making sure your commit history is clean, they notice that a commit message describes two distinct changes. So they decide to split them up. So far so good. The issue arises when the changes to be split are in the same file. That feels unnatural and hard to fit in one’s head as it would mean that one file has to be both staged and unstaged at the same time. How that could be even possible? Let’s see.

Once we are done with our changes we stage and then commit them. However, we may need a bit more control over staging as git add could be a bit more coarse than what you need. Git add --patch to the rescue. That will present you with an interface asking you how to deal with each change. Your changes will be split in hunks and Git will prompt you what to be added and what to be left behind. Hit h at this point and you will see all available commands with their descriptions.

The most common that you would want to remember and use regularly are:

y = stage hunk
s = split hunk
n = leave unstaged

So far so good. You use those three commands to split your changes as a pro. But what if the lines that you want to split are too close together and Git does not know how to split them. For example, you have:

this line goes into commit #1
this line goes into commit #2

Git does not know how to split this “hunk” as the lines are adjacent and to Git that looks like a single hunk. At this point, you need a secret editor where you can manually tell Git which line goes into staging in a line-by-line mode. Luckily Git provides us with such a secret editor. Simply hit e and you will enter editor mode where you can “edit” what is staged and what is unstaged line by line. In short: space re-adds “-” lines, i.e. they won’t be deleted, while deleting “+” lines removes them, i.e. won’t be staged. Help is inlined into that editor at the bottom so you don’t need to remember those as I don’t.

# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.

To see whether or not you have staged the right changes you can do: git diff --cached. If everything looks great then you can do the first commit from what you have staged. Proceed in the same way with the next set of changes. And please - don’t freak out if you see the same file both as staged and unstaged. That’s normal :)


Craft your commits as you craft your code. Keep them small, focused, and revealing intent. Your project is more than just the code. The code only tells you what the software is doing. But it is not telling you why the system is behaving in that way. It takes time and effort for a team to keep a clean project history. But I hope I’ve convinced you that this time is worth it.