Git from the Bottom Up – Blobs and Trees - Coding Blocks

All Episodes

Git from the Bottom Up – Blobs and Trees

August 14, 2022 • 102 mins

It’s surprising how little we know about Git as we continue to dive into Git from the Bottom Up, while Michael confuses himself, Joe has low standards, and Allen tells a joke.

The full show notes for this episode are available at https://www.codingblocks.net/episode191.

News

Thanks for all the great feedback on the last episode and for sticking with us!

Directory Content Tracking

Put simply, Git just keeps a snapshot of a directory’s contents.
Git represents your file contents in blobs (binary large object), in a structure similar to a Unix directory, called a tree.
- A blob is named by a SHA1 hashing of the size and contents of the file.
  - This verifies that the blob contents will never change (given the same ID).
  - The same contents will ALWAYS be represented by the same blob no matter where it appears, be it across commits, repositories, or even the Internet.
  - If multiple trees reference the same blob, it’s simply a hard link to the blob.
  - As long as there’s one link to a blob, it will continue to exist in the repository.
A blob stores no metadata about its content.
- This is kept in the tree that contains the blob.
- Interesting tidbit about this: you could have any number of files that are all named differently but have the same content and size and they’d all point to the same blob.
  - For example, even if one file were named abc.txt and another was named passwords.bin in separate directories, they’d point to the same blob.
- This allows for compact storage.

Introducing the Blob

This is worth following along and trying out.

The author creates a file and then calculates the ID of the file using git hash-object filename.
- If you were to do the same thing on your system, assuming you used the same content as the author, you’d get the same hash ID, even if you name the file different than what they did.
git cat-file -t hashID will show you the Git type of the object, which should be blob.
git cat-file blob hashID will show you the contents of the file.
The commands above are looking at the data at the blob level, not even taking into account which commit contained it, or which tree it was in.
Git is all about blob management, as the blob is the fundamental data unit in Git.

Blobs are Stored in Trees

Remember there’s no metadata in the blobs, and instead the blobs are just about the file’s contents.
Git maintains the structure of the files within the repository in a tree by attaching blobs as leaf nodes within a tree.
git ls-tree HEAD will show the tree of the latest commit in the current directory.
git rev-parse HEAD decodes the HEAD into the commit ID it references.
git cat-file -t HEAD verifies the type for the alias HEAD (should be commit).
git cat-file commit HEAD will show metadata about the commit including the hash ID of the tree, as well as author info, commit message, etc.
To see that Git is maintaining its own set of information about the trees, commits and blobs, etc., use find .git/objects -type f and you’ll see the same IDs that were shown in the output from the previous Git commands.

How Trees are Made

There’s a notion of an index, which is what you use to initially create blobs out of files.
If you just do a git add without a

Mark as Played

Advertise With Us

Popular Podcasts

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.