Let's create a Git repository. On a command line, type:
>git init test
You should see something like:
Initialized empty Git repository at C:/Some/Path/test/.git
It's interesting to note that the response refers to the .git directory as the repository. The parent directory, "test", is called the "working directory" in Git documentation. It is tracked by Git, but it is not the repository; it's what the repository tracks and controls. Of course, on a non distributed version control system, the repository would reside in an external location.
Here are the contents of an empty .git repository:
12/21/2015 06:06 AM 157 config
12/21/2015 06:06 AM 73 description
12/21/2015 06:06 AM 23 HEAD
12/21/2015 06:06 AM <DIR> hooks
12/21/2015 06:06 AM <DIR> info
12/21/2015 06:06 AM <DIR> objects
12/21/2015 06:06 AM <DIR> refs
Let's look at how these files and directories make Git work. The file HEAD contains text that shows what the current branch is:
>type .git\HEAD
ref: refs/heads/master
These 23 bytes of ASCII text are how Git knows what you are working on. The "refs/heads/master" is also a text file. Or will be as soon as you make a commit. It contains 40 alphanumeric characters, the hash value of the current commit. When you switch to a new branch in Git, it does two very simple things: (1) it creates a new file in "refs\heads\" with the name of your new branch, and with the 40 character identifier of the current commit, (2) it updates HEAD to "ref: refs/heads/<your-new-branch>". Contrast this to SVN or TFS, where creating branches is a much heavier operation, causing hierarchies to get created on other servers. On Git, branching is just creating a 40 byte file. Which is pretty fast.
The real meat of the repository is the "objects" directory, where Git stores the objects it tracks. Let's put some things in there.
>copy con droid1.txt
R2-D2
^Z
>copy con droid2.txt
C-3PO
^Z
So now we have two files. If Git is going to manage these files, it will need a way to track their contents, to refer to the contents and to track changes. Git uses SHA1 hashes to do this. They are fast to generate, and they are strongly unique.
You can tell the SHA1 hash value for any file (even one not tracked by Git) using git hash-object.
>git hash-object droid1.txt
668b9c33030c59db9c0f11f777029cc3fc0fdaf1
>git hash-object droid2.txt
27c825d7d5393f79c5b14cf0dd719e3dbb391c4e
This version of the command just outputs the 40 character value, but you can also use the "-w" option to store it in the "objects" directory. So after
>git hash-object droid1.txt -w
Git splits the hash into a two character directory name and a 38 character file name, so you would now find a directory named ".git\objects\66" with a file named "8b9c33030c59db9c0f11f777029cc3fc0fdaf1", which would contain a compressed version of the file (with a very short header stating it was a "blob", and how many bytes long it is). Of course, you normally would not use "hash-object" to do this, you would use "git add" or a GUI tool. But this is what Git uses behind the scenes.
Let's go back to normal Git commands, and add these files to Git.
>git add *
>git commit -m "These are not the droids you're looking for"
[master (root-commit) 914ebae] These are not the droids you're looking for
2 files changed, 2 insertions(+)
create mode 100644 droid1.txt
create mode 100644 droid2.txt
Up to now, the hashes in this post should match yours, but this will not be true with this commit hash, because it contains my name, email address, and the current date. So Git uses one hash to guarantee the contents of a file, and a different hash to guarantee that I was the person who added them to the repository.
Let's take a look at the artifacts created by this commit.
>type .git\HEAD
ref: refs/heads/master
No change there. I am still on the "master" branch.
>type .git\refs\heads\master
914ebae549d6f4070184c7db9e1ddbaaf80e1d3b
Ah, now there is a hash value. A more user friendly ways of seeing this is using the command git log:
>git log
commit 914ebae549d6f4070184c7db9e1ddbaaf80e1d3b
Author: dsolovay <dsolovay@gmail.com>
Date: Mon Dec 21 06:54:17 2015 -0500
These are not the droids you're looking for
But what does the commit contain? We can use another git plumbing command, git cat-file, to inspect it:
>git cat-file -p 914e
tree 9b8c255bff7beb1440cc726ebe3346816dc04d67
author dsolovay <dsolovay@gmail.com> 1450698857 -0500
committer dsolovay <dsolovay@gmail.com> 1450698857 -0500
These are not the droids you're looking for
You can see the power of this guarantee when you fire up, for example, a mongod instance. The hash of the git commit of the version getting used appears in the output. This shows (sorry) that this is the mongod you're looking for:
2015-12-21T07:12:33.466-0500 Hotfix KB2731284 or later update is not installed,
will zero-out data files
2015-12-21T07:12:33.472-0500 [initandlisten] MongoDB starting : pid=8748 port=27
017 dbpath=\data\db\ 64-bit host=DanSolovay-PC
2015-12-21T07:12:33.472-0500 [initandlisten] targetMinOS: Windows 7/Windows Serv
er 2008 R2
2015-12-21T07:12:33.473-0500 [initandlisten] db version v2.6.11
2015-12-21T07:12:33.473-0500 [initandlisten] git version: d00c1735675c457f75a12d
530bee85421f0c5548
And now let's inspect the tree that the commit referred to:
>git cat-file -p 9b8c
100644 blob 668b9c33030c59db9c0f11f777029cc3fc0fdaf1 droid1.txt
100644 blob 27c825d7d5393f79c5b14cf0dd719e3dbb391c4e droid2.txt
If you found this exploration entertaining, I highly recommend Chapter 10 of Pro Git, which builds a commit by hand. Just a note that there is a Ruby script at the end of section 2 that does not format correctly on the website, so if want to walk through that, please go to the PDF version. And if you would like a deeper dive into the .git directory, this blog post is very useful.
No comments:
Post a Comment