[ACCEPTED]-What are git's thin packs?-git

Accepted answer
Score: 32

For the record, the man page (index-pack) states:

It is possible 62 for git-pack-objects to build "thin" pack, which 61 records objects in deltified form based on objects not included in the pack to reduce network traffic.
Those objects are expected to be present on the receiving end and they must be 60 included in the pack for that pack to be 59 self contained and indexable.

That would 58 complete the git push man page of the --thin option:

Thin transfer 57 spends extra cycles to minimize the number 56 of objects to be sent and meant to be used 55 on slower connection

So a "slow network" in 54 this case is a connection where you want 53 to send the lowest amount of data as possible.

See 52 more at "Git fetch for many files is slow against a high-latency disk".


In this thread, Jakub Narębski explains a bit 51 more (in the context on using git gc on 50 the remote side as well as on the local 49 side):

Git does deltification only in packfiles.
But 48 when you push via SSH, git would generate 47 a pack file with commits the other side 46 doesn't have, and those packs are thin packs, so 45 they also have deltas...
but the remote 44 side then adds bases to those thin packs 43 making them standalone.

More precisely:

On 42 the local side:
git-commit creates loose (compressed, but 41 not deltified) objects. git-gc packs and deltifies.

On 40 the remote side (for smart protocols, i.e. git 39 and ssh):
git creates thin pack, deltified;
on 38 the remote side git either makes pack thick/self 37 contained by adding base objects (object 36 + deltas), or explodes pack into loose object 35 (object).
You need git-gc on remote server 34 to fully deltify on remote side. But transfer 33 is fully deltified.

On the remote side 32 (for dumb protocols, i.e. rsync and http):
git 31 finds required packs and transfers them 30 whole.
So the situation is like on local 29 side, but git might transfer more than really 28 needed because it transfers packs in full.


The 27 problem above was related to the use (or 26 non-use) of git push --thin: when do you use it or not?
Turns 25 out you do need to carefully manage your 24 binary objects if you want git to take advantage 23 of those thin packets:

  1. Create the new filename by just copying the old (so the old blob is used)
  2. commit
  3. PUSH
  4. copy the real new file
  5. commit
  6. PUSH.

If you omit the middle 22 PUSH in step 3, neither "git push", nor 21 "git push --thin" can realize that this new 20 file can be "incrementally built" on 19 the remote side (even though git-gc totally 18 squashes it in the pack).

In fact, the way 17 thin packs work is to store delta against 16 a base object which is not included in the 15 pack.
Those objects which are not included 14 but used as delta base are currently only 13 the previous version of a file which is 12 part of the update to be pushed/fetched.
In 11 other words, there must be a previous version 10 under the same name for this to work.
Doing 9 otherwise wouldn't scale if the previous 8 commit had thousands of files to test against.

Those 7 thin packs were designed for different versions 6 of the same file in mind, not different 5 files with almost the same content. The 4 issue is to decide what preferred delta 3 base to add to the list of objects. Currently 2 only objects with the same path as those 1 being modified are considered.

Score: 3

Note from the git 1.8.5 (Q4 2013):

You would think that disabling 18 the thin option would be with push --no-thin?
You 17 would be wrong until 1.8.5:

"git push --no-thin" actually 16 disables the "thin pack transfer" optimization.


See 15 commit f7c815c for all the gory details, thanks to "pclouds" -- Nguyễn Thái Ngọc Duy:

push: respect --no-thin

  • From 14 the beginning of push.c in 755225d, 2006-04-29, "thin" option 13 was enabled by default but could be turned 12 off with --no-thin.

  • Then Shawn changed the default to 0 in 11 favor of saving server resources in a4503a1, 2007-09-09. --no-thin worked 10 great.

  • One day later, in 9b28851, Daniel extracted some 9 code from push.c to create transport.c. He (probably accidentally) flipped 8 the default value from 0 to 1 in transport_get().

From then 7 on --no-thin is effectively no-op because git-push still 6 expects the default value to be false and 5 only calls transport_set_option() when "thin" variable in 4 push.c is true (which is unnecessary).
Correct the 3 code to respect --no-thin by calling transport_set_option() in both cases.

receive-pack learns 2 about --reject-thin-pack-for-testing option, which only is for testing 1 purposes, hence no document update.

Score: 0

My understanding is that it's an optimization 4 for transmitting objects between two repositories.

I 3 think you'd only use it when implementing 2 your own git services not using send and 1 receive pack.

More Related questions