Differences between eg and git

In short, eg is a thin wrapper over git, with all the same abilities and most the same command line flags and meanings (due to the fact that eg mostly just passes on its command line arguments to git). It tweaks some defaults, cleans up some output, and fills in a big gap in git documentation. Its goal is to make git easier to learn.

Since EasyGit started as a brainstorming session about how to improve the user interface, I've had some bad ideas that I've tried out too. I hope I have removed all of them, but you never know. Either way, I hope this page helps explain what changes I made in eg and why I made them.

Themes

  1. Layer concepts -- try to introduce as little as possible needed to accomplish something useful
  2. Reinforce concepts -- reuse the same terminology where possible and make the important concepts and details appear in places where it makes sense
  3. Prevent common errors when possible -- users can forgive inconsistencies, poor documentation, and even some non-intuitive behavior, but making choices that will cause them to be error-prone is a sin approaching corrupting their data
  4. Reduce the work done by the user -- requiring multiple steps or lots of unnecessary typing tends to aggravate users
  5. Use consistent conventions -- not using consistent conventions is likely to lead to users spending lots of time wondering what you mean
  6. Knowledge transfer -- try to make knowledge users have gained from similar programs, and from other areas of the same program itself, transfer elsewhere (but watch out for common errors due to differences)

Summary

General changes Command specific changes Gratuitous svn-compatibility changes

Details

Documentation

eg tries to plug the one big gap remaining in git's documentation right now. git has a user guide (online), a tutorial (online), and a set of comprehensive highly technical command-specific reference manuals (the git manpages). git does not have any decent command-specific short tutorial pages for the built-in help, and they currently "fill the gap" by misusing/reusing the manpages for this purpose. eg has its own built-in help, which merely refers to the git manpages.

eg's documentation focuses on getting the users able to do useful things as fast as possible, rather than on providing a comprehensive set of possibilities. Thus, each page refers to the git manpages for more information. Also, as part of trying to be user-friendly, the eg help pages strive to: to make it clearer to users which parts can be skipped when first getting started, to define possibly unfamiliar (or misleading) terminology, to reinforce common terms and concepts, to use consistent notation, and to group commands into types so that users can more quickly find commands that will be useful to them.

bundle

git bundle's create subcommand requires arguments to specify what to include. For many first time users, the answer is "everything". However, it is difficult for them to figure out how to specify everything, since they are pointed to the hard-to-parse help for git-rev-list. Further, even after they eventually find "--all" they will discover that it is not enough; they also need to manually specify "HEAD" in order to be able to clone from the bundle. So, if no arguments are passed to eg bundle create, then it defaults to passing "--all HEAD" on to git bundle create.

Another common case users will have is wanting to create a new bundle that has all the new stuff since the last bundle they sent. This is not easy for new users to figure out how to do, and even for experienced users it takes a few steps (involving parsing the output of running git bundle list-heads on the old bundle). To make this easier/faster, eg provides a create-update subcommand to bundle, of the form:

eg bundle create-update NEWBUNDLENAME OLDBUNDLE [EXTRA REFERENCES]
This command will include everything in the current repository not found in the previous bundle, by default, while also allowing individual overrides to modify what is included. (In detail, eg bundle create-update will call git bundle create with the arguments --all HEAD ^SHA1NUM1_FROM_OLD_BUNDLE ^SHA1NUM2_FROM_OLD_BUNDLE... EXTRA REFERENCES.)

commit

The default behavior of git commit has come up many times. There is a question in the GitFaq particularly for this item, with links to two long threads (here and here). Somewhat surprisingly, these discussions virtually always seem to focus on whether to make the -a behavior of git commit the default, and people then spend their time arguing why one choice or the other is bad. I agree with both sides; I don't like either choice.

Explanation from the git angle:

Explanation from the used-to-other-vcs angle:

The current git default behavior is:

If there are no staged changes -> warn user: nothing to commit
else if there are unmerged files -> warn user: unmerged files
else -> commit the staged changes

In eg I have changed this to:

If the working copy is clean -> warn user: nothing to commit
else if there are unmerged files -> warn user: unmerged files
else if there are new untracked files -> warn user: new unknown files
else if both "staged" and unstaged changes present[1] -> warn user: mix of staged and unstaged changes
else -> run git commit (-a)
In the last step, -a is passed only if there are only unstaged changes present and the user didn't request to commit specific files. This essentially boils down to preventing a couple extra common user pitfalls and requiring less work of the user in the common case. Note that there is an item unrelated to the index here: warning about new untracked files. It never ceases to amaze me how many experienced VCS users forget to add new files they create before committing; this check is designed to help guard against that. Since it only applies to new untracked files instead of the mere existence of untracked files, it doesn't seem to get in the way.

Committing just the staged changes can be done with a new --staged flag (with aliases of --dirty and -d), which also have the side effect of bypassing the new untracked files check. Bypassing just the new untracked files check can be done with the new --bypass-unknown-check (or -b) flag.

For a potential reason against making this change to the default commit behavior, look at the diff section.

[1] The reason for putting "staged" in quotes comes from the case of running "eg commit --amend" when you have local unstaged changes. Does the user want to merely amend the prior commit message or add their changes to the previous commit? It is not clear what the user wants, so we warn and ask them to use -a or --staged. (Even though the index may match HEAD at this time, we are committing relative to HEAD^1, so effectively we should treat this case as though there were staged changes.)

diff

There were two sets of changes to eg diff relative to git diff:

Different defaults for what to diff relative to

Changing the defaults slightly for diff has two motivations: consistency with the changes to commit, and avoiding some common gotchas (see my blog post on "Limbo"). To quickly cut to the chase, I changed the defaults as follows:

eg diff <=> git diff HEAD (*)
eg diff --staged <=> git diff --cached
eg diff --unstaged <=> git diff
(*) INCOMPLETE MERGE CASE: A plain "eg diff" will abort telling the user that it can't show changes since the last commit since there is no single last commit, and instead suggest a number of alternatives. (In particular, it suggests extra flags the user can pass to diff as well as suggesting separate log and show commands that may be helpful). [EXTRA TECHNICALITIES: We also don't want to automatically insert "HEAD" in our call to git diff if the user passed the --no-index option to eg diff.]

While this removes a number of gotchas for new users (as well as helping new users discover the usefulness of the index in a more obvious fashion), it may be hard to adopt for the following reason: going back to the "reduce the work done by the user" theme, this change goes against that rationale for those in an integrator role (such as Linus or Junio); those who do lots of merging will see passing an extra flag to get the unstaged changes as an extra annoyance which will bother them (eventually, if they don't realize it at first). This may also cause problems for commit, since the commit and diff changes are somewhat connected in terms of having a consistent mental model.

Providing a more consistent double-dot operator

The .. operator of git diff (e.g. git diff master..devel) means what the ... operator of git log means, and vice-versa. This causes lots of confusion. We fix this by aliasing making the .. operator of eg diff do exactly what the ... operator of git diff does. To see why:

Meanings of git commands, as a reminder (A and B are revisions):

  git diff A..B  <=> git diff A B                      # Endpoint difference
  git diff A...B <=> git diff $(git merge-base A B) B  # One-sided difference to common ancestor

Why this is confusing (compare to above):

  git log A..B  <=> git log ^$(git merge-base A B) B   # One-sided difference to common ancestor
  git log A...B <=> git log ^$(git merge-base A B) A B # Endpoint difference

So, my translation:

  eg diff A B   <=>  git diff A B    <=> git diff A..B
  eg diff A..B  <=>  git diff A...B
  eg diff A...B <=>  git diff A...B

Reasons for this change:

log

I've heard many users refer to the arcaneness or weirdness of git using 40-characters hexadecimal strings as revision identifiers (even if the users are aware that they can be shortened to the first 8 or so unique characters). While the repository integrity aspects of this feature are cool, it still comes across as harsh and uninviting to many. I don't believe that they should be hidden or removed at all, but I do think they need to be better supplemented by something "friendlier".

It can also be harder to do a number of tasks using just sha1sums for identifiers. For example, if a new user wants to get the difference between two close revisions, their likely experience will be that it is harder to do in git. In cvs, for example, they could just look up one revision number, copy it, and note that the other was three smaller and do the subtraction in their head and type the other revision themselves. Sure, you can do the analogous thing in git using revision specifiers, but git passes up its best opportunity to teach revision specification methods from git-rev-parse by not showing any in git log. In git, users often believe that they need to copy two separate identifiers somewhere and then get both pasted on a git diff command line, which is slow and cumbersome (as well as resulting in a daunting-looking command line as well).

In eg, I simply made

eg log ARGS...
essentially mean
git log ARGS... | git name-rev --stdin --refs=$(git symbolic-ref HEAD) | less
except that I implemented the name-rev functionality internally to eg to avoid the need to walk all of history before showing any commits in the common case. (If HEAD is a symbolic-ref, and a revision which is not fully merged into HEAD is specified on the command line, then eg log can be slow on large repositories since it will walk the entire current branch before showing anything. This particular issue means that the eg log behavior probably does not make sense for git log, except as an option.)

This change: (1) provides users with human-meaningful and easy to manipulate revision identifiers, (2) teaches them about revision specifier methods in git, (3) reinforces the ideas of branches and the current branch, and (4) shows the relationships of commits and gives the user a feeling for the DAG structure of git (something otherwise lost on users unless/until they try out gitk separately).

One could ask whether always using the current branch (as I did with the HEAD symbolic-ref) is correct. For example, if the user specifies

eg log DIFFERENT_BRANCH
should I use refs/heads/DIFFERENT_BRANCH as the value for --refs in the git-name-rev call? That would be reasonable but I argue against it on the following grounds:

pull

I introduced a --branch flag to pull, to delay the need to understand refspecs and to make example command lines more self-documenting and memorable. Sounds simplistic, but it does wonders for the documentation.

I also have one extra fallback for the choice of branch to merge when none was specified by the user (either on the command line or in the config file); namely, if the remote repository only has one branch, then I use it. If the remote repository has more than one branch in such a situation, then rather than simply telling the user that they need to specify which one to merge, I tell them what the possible choices are.

push

I introduced a --branch flag to push, much as I did for pull, to delay the need to understand refspecs and to make example command lines more self-documenting and memorable. A simple change that does wonders for making the documentation simpler.

Another simple change was made to default to only pushing the current branch, rather than all branches that already exist both locally and in the remote repository. A --matching-branches option was added to obtain the default behavior of git push. This simplifies the mental model somewhat, and also makes the behavior more closely related to that of eg pull (which only merges changes in to the current branch).

Finally, I also added a check to determine whether the working copy was clean before doing a push (including a check for whether new untracked files are present). While such a check is not necessary for correct operation, this helps prevent users from forgetting to commit their changes they want included with a push (similar to cvs/svn users forgetting to add files before committing and then breaking everyone else...). Since some users are simply not aware that only committed changes are pushed, this check can serve to educate them. This check, of course, can be bypassed with a new --bypass-modification-check (or -b) flag I introduced.

I also added a check to prevent pushing into non-bare repositories, if the other repository was accessible from the local filesystem or via ssh (I don't know how to check if a remote repository is bare in other cases), unless the user explicitly specfied both source and destination refs (i.e. had a refspec with a ':' character in it on the push command line).

rebase

In git, it is common to see a command such as

git rebase master
which is somewhat misleading/confusing at first since master isn't being rebased, but being rebased against. The current branch is the one being rebased. To make this clearer, the same command in eg can be written as
eg rebase --against master
though the old syntax without --against is still accepted in eg. The word "against" does not fit well when the --onto option is specified, so --since is provided as an alias for --against.

revert

I have seen people confused by the different ways to "revert" things in git many times, including the sometimes subtle distinction between revert, the second form of reset, the second form of checkout, and show. Apparently "how do I undo my local changes" is one of the most frequently asked questions (see in #git), and there are git users who are unhappy with the multiple different meanings of reset and checkout.

Additionally, every other major VCS (svn, hg, bzr, darcs) uses the same meaning for revert, which is different than the one git uses. Junio seems to be inclined towards making a change to revert so that transitions for such users to git will be easier, or was.

Somewhat interestingly, alternative git porcelains often try to provide svn-like revert behavior, but in my testing of three or four of them (including earlier versions of eg), all of them got it wrong. It took me three tries (over a few months) of writing and modifying eg revert before I finally got it right. If porcelain authors have difficulty implementing this behavior correctly, I do not see how we can expect end users to find it.

In eg, I made revert provide the kind of behavior that svn, hg, bzr, and darcs revert all do (and also extended the command to allow working with either just unstaged changes or just staged changes). Thus, in eg you'll find commands like:

  $ eg revert foo.cc baz.py
  $ eg revert --since REVISION -- foobar.pl bambam.tcl
  $ eg revert --in REVISION -- foo.cc baz.py

Finally, any valid syntax for git revert is invalid for eg revert, so if users who are familiar with git try eg revert with git syntax, they'll be given a message telling them to run eg cherry-pick -R instead.

status

The changes to status are cosmetic, but do serve to make eg feel more inviting.

I felt the leading hash marks in git status were weird and harsh, and they seem to serve no purpose when not being used in a commit message. So, I simply removed them. This is probably the biggest usability improvement I made to the status command. To go along with this change, I put the branch name in parentheses.

To help reinforce consistent concepts, I made minor tweaks to the section headings:

Changes to be committed => Changes ready to be committed ("staged")
Changed but not updated => Changed but not updated ("unstaged")
Untracked files => Unknown files
The slight changes to the first two titles help reinforce the common terminology I try to use throughout eg to help users learn the index faster, while the last section title change is more of an attempt at disambiguation (does tracked mean "tracked in the index?", "something related to remote tracking branches?", "some fancy new monitoring scheme unique to git that other vcses do not have?", "something else?"). Nothing will be perfect here, but I believe "unknown" is less likely to be mistaken for extra meanings.

Finally, I removed the extraneous newlines and the unnecessary command suggestions. Now, the command suggestions with each section do happen to be necessary in git due to the various gotchas and discoverability issues associated with its current defaults; thus they are a helpful usability feature serving to mitigate other problems, but they seem like more of a distraction with the defaults of eg.

update

An update command is something that has been discussed many times. Users who were familiar with cvs and svn have long asked for such a command. However, while it has been discussed, it is not clear what update should do, particularly given the potential for local commits to exist. (For reference, both hg and bzr have implemented such a command, yet the two have made different choices about how the command should behave.)

I believe that the main reason users probably request an update command is that the built-in documentation for the git pull command is far too complex. However, I believe that even with simpler built-in documentation for pull, a command that helps users learn better workflows can be of benefit.

In eg, update is primarily a command to help users learn better workflows. In cvs or svn, update is the only method of getting new changes from other developers; unfortunately, cvs/svn update trashes your own local changes by munging them with those of other people, providing no method of undoing the update if things don't merge well. This is a horrible usability wart of cvs/svn that needs to be fixed, and which users of those systems need to be trained away from. Additionally, there are extra features of git ready to be learned, and, in my opinion, an update command is a good place to teach them.

I implemented eg update as follows:

If there are local-only commits Warn that we don't know how they want the update performed due to their local commits, and that they will need to merge with upstream changes (or rebase on top of them). Tell them to use eg pull instead.
If the user provides arguments to eg update Warn the user to use eg switch to check out an older revision or eg revert to undo changes to a file
If there are locally deleted files Tell the user to use eg revert to undo local changes to a file (and that they do not need to delete the file first as they did with cvs)
If there are locally modified files (instead of or in addition to the deletion case covered above) Warn that updating is unsafe, and provide two options: tell the users about committing before pulling updates, and about stashing changes away and applying them later
If there is no default repository to pull from Warn the user that we don't know where to pull changes from, and suggest they use "eg remote add origin REPOSITORY_URL"
If there is no default branch to pull from (i.e. config option branch.BRANCH.merge is not set) and there is more than one branch at the remote end Warn the user that we don't know which branch to pull changes from, and suggest they use eg pull or run "eg config branch.BRANCH.merge BRANCHNAME"
Otherwise... Fetch the remote branch directly into the current branch and then do a git reset --hard BRANCH