This blog post aims to explain the “theory” behind version control (Git and Github) in plain English so that you understand the big picture of how software engineers work. No code. Nothing to download. No muss. No fuss. Just words and some not-so-pretty doodles drawn by yours truly.
I’ve always been amazed at the sheer amount of tutorials there are to learn anything online. Git and Github are no different, there are plenty of great resources to get you started with either or both. Here are some of my favourites:
However, I’ve always found that these tutorials skip a lot of the theory and go straight into explaining how to use Git from the command line or via the Github desktop app. Frankly, if all you want is to understand what the hell the developers on your team are talking about, these tutorials go way more in-depth than needed. As stated above, my goal is simply to explain the big picture of version control and hopefully impart upon you how very cool it is.
Let’s start at the very beginning: Version Control
Image credit: weebletheringskite, Wordpress
Version control: learn it, love it, live it. Does pretty much what it says on the tin. Version Control is any system that allows you to understand the history of a file and how it has progressed. Back in my former life as a graphic designer I often had files that looked something like this:
Look familiar? The above system may not have been a good one, but it was definitely some sort of version control system. More sophisticated examples would be something like the Google Docs’ “Revision History” or Photoshop’s “History” tool.
Git is a version control system specifically designed to work well with text files. Because ultimately, that’s all code really is: loads of texts files hooked up together in some fashion. Git is a program you install which allows you to annotate the changes you make to create an easily navigable system history.
(PS: “Git” is also what happens when you let engineers name products, we’re sorry marketing department)
So what does Git do that simply saving a file normally doesn’t? At its very core, saving a file is a simplified version control system, but frankly it’s not a very good one because it only allows you to move forward in time. Sure, you could argue that the “undo” button lets you move backwards in the “life” of the file, but we all know the “undo” button has certain limitations, the most obvious being that the file’s past is usually lost the minute you close the file.
Plus, saving a file is very individualistic. It says nothing of the history of the system at large, only the history of that file. While at this point you may be thinking, “Well I’m not an engineer so I don’t need to be fussed about systems”, I’d like to take a moment to explain how a lot of things that you may not think of as “systems” are in fact exactly that.
Meet Sally, she’s an author writing the next big adventure fantasy book series. Sally has written the first book already, and has passed it on to her editor. Additionally, because she’s such an overachiever, she has also written the first three chapters of the second book whilst waiting for her editor’s feedback. Each book is stored in a separate Word file.
One merry day, Sally’s editor actually gets back to her about the first book. He’s worried that younger readers won’t want to read a series exclusively about Orcs, and asks her to introduce some Elves into the story. At this point, Sally sighs but soon realises that her new Elven characters will allow her to include some previously unplanned conflict and plot twists. She then does the following:
- Writes in new characters and plot changes into the first book
- Off the back of that, makes the necessary modifications to the second book
- Realises that due to all the changes, she’ll need to introduce a certain location in the first book instead of the second
- Re-edits the first book to include the new location
Finally, she pushes her keyboard away, confident that she has properly incorporated the Elves into her magical world.
You see, Sally is actually dealing with a system. Her two books influence each other. Characters, locations, and plot points flow between the two. However, sadly, a month from now, there is nothing in her file system, Word’s “Document History” tool, or the post-its she has stuck around the edges of her screen that will tie all these changes together.
This is where Git shines. If Sally had been using Word combined with Git, she could’ve associated all those changes into one nice neat little summary of “Introduction of Elves into the series”. She could’ve seen the changes made across pages, chapters, files, and books, allowing her to truly understand the impact of the Elves on her fantasy series. This “neat little summary” is what we, in Git-land, refer to as a commit.
Recap. Git is a program that allows you to annotate the history of a system (or group) of files via commits, which are “snapshots” of the differences made to the system at a given point in time.
So if I’m Sally, my commit history might looks something like this:
So far so good, but what happens if Sally has two computers? Good point. Enter Github. Not to be confused with merely Git, Github takes that lovely commit history and hosts it online so that you can access it from any computer. You do this via pushing commits from your local machine (i.e.: the computer you’re currently using) up to Github, and then, from the new/different computer pulling those commits down.
Let’s take the image above as an example workflow for Sally. She starts her day working on her home desktop computer (on the left, orange) and writes a few chapters, does some editing, etc. Along the way she takes three “snapshots” (Git commits) of her work at strategic points.
In the afternoon, Sally often likes to take her laptop (on the right in the image above, blue) to a café and write there. Today is no different, so before she shuts down her home desktop computer, she makes sure to push her current Git commit history up online to Github. Once on Github, the commits live in a remote repository.
Let’s break down the geek-speak here: remote merely means online (vs local, which as we learned earlier means whichever computer you’re currently using). As for repository (often shorted to “repo”), it’s basically a folder with Git superpowers.
So, Github lets you store your work (which is annotated by Git commits) in a special online folder (repo). See? Simple.
It’s after lunch now and Sally pulls out her laptop at her local café. She obviously wants to pick up where she left off at home, so she grabs her latest work from her Github repository. This process of “grabbing her work from Github” is called pulling. Looking at the image above again, you’ll see the Sally pulled down the three commits she wrote at home.
Sally now has a perfectly up-to-date copy of her whole system (the text files containing her fantasy series) on her laptop and can proceed from where she left off. She writes some more and strategically “snapshots” (commits) her work two more times. Finally, Sally ends her day by pushing up those two commits to Github, so that she can access her latest work from her home desktop in the morning.
OK, this all makes sense. But Sally, cool as she is, is only one person. How do engineering teams make sure their work doesn’t overlap?
In short, by making branches. Think of your Git commit history as a tree. The trunk is what we refer to as the master branch. In order for teams to avoid stepping on each other’s toes, they need to work in isolation from everyone else (on a feature branch), whilst ultimately still contributing back to the main codebase (the master branch).
Now, back to Sally. She’s joined a Fantasy Writers’ Guild, where everyone is collaborating to write a single book, “The Fantastic Dictionary of Fantasy Creatures”. Much like a textbook, the Dictionary will have multiple authors: her, Tom and Adam.
Let’s have a look at the online Github repo for “The Fantastic Dictionary of Fantasy Creatures”, as it stands now:
As you can see above, the tree analogy applies perfectly to how the Fantasy Writers’ Guild is collaborating on this project, with the history of the repo moving upwards along the master trunk. The general workflow starts with each author branching off master when they’re doing a piece of work (maybe writing a chapter, or editing one). When the rest of the co-authors have approved the changes, the branch gets merged into the master trunk (keep in mind, it’s the contents of the master branch which will ultimately form the published book).
When a branch gets merged that means that its content overrides the master. So any changes it makes to existing content will supersede what was there before. And any new content will be added in too, of course. In essence, when a branch gets merged into master, its commits get added to the top of the master history.
However, you may be thinking, how does this tie in with the fact that people do their work locally and only then push their changes up to Github?
At this point, it’s important to remember that your remote repository on Github is a mirror of what you have locally. This means that on your computer you have a local Git repository for that project (i.e.: a folder configured to allow Git commits). Inside this local Git repo (again, a fancy term for a specific Git-enabled folder on your computer), you have all the files that are part of this project, in this case, “The Fantastic Dictionary of Fantasy Creatures”.
It works almost like Dropbox does: you have local folders, on different devices (your home computer, your office computer, etc) where you do your work and update your files. This ultimately syncs up to a location online. However, as we know, there are a couple of extra steps involved in the Git/Github workflow. You have to consciously decide to take a “snapshot” of your work as it stands at that moment in time (a commit) and furthermore, you have to deliberately choose to push that commit up to Github. Only then is the work “synced” to the online location (the Github repo).
So why not automate it? Why not make it like Dropbox and have the file update on Github as you update it locally? Well, many reasons. But mainly, because of bugs. In the world of software engineering, as in the world of publishing, you don’t always want to keep everything you write. Sometimes you want to experiment, and should your experiment fail, you want an easy way to go back to the last correctly working state. Which is why a good rule of thumb is that the minute something is working the way you want it to, commit it, before you try to edit it or experiment with a different approach. There’s no harm in committing small chunks of work very frequently, and in fact, many engineers pride themselves on that doing exactly that.
Now back to “The Fantastic Dictionary of Fantasy Creatures”. Due to her in-depth knowledge on Orcs, Sally has been chosen to write that particular chapter. But because she doesn’t want to modify the book without her co-authors’ approval, she creates a local branch and starts writing and committing on that branch. She then pushes the local branch up to Github, where, true to form, the remote repo acts as a mirror and updates to show that Sally has created a branch containing certain commits in it (see image below).
As she continues working on the chapter, Sally writes more commits and pushes them up to the online mirror branch on Github. Eventually, she’s ready for Tom and Adam to review her work. So she creates a Pull Request, which is a Github feature that allows her to explain the changes that her branch makes to the master branch. It also provides an easy place where the co-authors can have a discussion about the branch’s content and request that Sally make any changes necessary before the branch becomes part of the master trunk.
After requesting a few changes, as can be seen above, Tom and Adam are happy with Sally’s branch and decide to merge her work into the master branch on Github. The moment they do this, Sally’s formerly isolated commits, are added to the top of the master branch’s history:
At this point, Sally can move to (or “check out”) the master branch on her local machine and pull down the very commits that were previously isolated in a feature branch (the Orc Chapter branch). She’s now back at square one, with an updated master, and from here can proceed to create a new local branch for her next piece of work, helping poor Tom edit his chapter on Goblins. And so, the process would start all over again:
- Create local branch
- Write some commits within local branch
- Push to Github
- Create Pull Request explaining changes that branch contains
- Merge to master
- Pull down new master commits locally
As you can see, this is a very smooth workflow, the perfect combination of working in isolation as well as collaboratively. Git — on your local machine — provides a brilliant way to create multiple versions of your work, via a rich annotated history, controlled and curated by you. Github — online — is a fantastic resource for not only storing and providing a clear visualisation of said history but also for collaborating and quality-control.
In conclusion, I hope I’ve convinced you to try using Git and Github for any type of project in future. There's no reason why only engineers should benefit from such great tools. The world totally needs more Orcs, after all.
I’d like to thank the Common Craft for inspiring the doodles and explanation style of this blog post. And also, for saving me from the horror of having to explain Twitter to my mum.
Version Control: any system that allows you to understand the history of the file and how it has progressed
Git: a version control program which allows you to annotate the changes you make to create an easily traversable system history
Commit: an annotated “snapshot” of the differences made to the system at a given point in time
Local: refers to the computer you’re working on this very minute
Remote: refers to an online location
Repository (repo): a special folder configured with Git superpowers containing all the files pertaining to your project/system
Github: takes your local commit history and hosts it remotely so that you can access it from any computer
Pushing: the action of taking local Git commits (and whatever work these encompass) and putting them online on Github
Pulling: the action of taking online Github commits and bringing them into your local machine
Master (branch): the “trunk” of the commit history “tree”; contains all approved content/code
Feature branch: an isolated location, based off of master, where you can write a new piece of work safely before reincorporating said changes back to master
Pull Request: a Github tool that allows users to easily see the changes (the difference or “diff”) that a feature branch is proposing as well as discuss any tweaks that said branch might require before it is merged into master
Merging: the action of taking the commits from a feature branch and adding them to the top of master’s history
Checking out: the action of moving from one branch to another