All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by noreply@blogger.com (Wojtek Ziniewicz) on 2017-08-14 13:49:00 in the "git" category

Ever wondered how to split your Git repo into two repos?


 

First you need to find out what files and directories you want to move to separate repos. In the above example we're moving dir3, dir4 and dir7 to repo A, and dir1, dir2, dir5 and dir8 to repo B.

Steps

What you need to do is to go through each and every commit in git history for every branch and filter out commits that modify directories that you dont care about in your new repo. The only flaw of this method is that it will leave those empty, filtered out commits in the history.

Track all branches

First we need to start tracking all branches locally:

for i in $(git branch -r | grep -vE "HEAD|master" | sed 's/^[ ]+//');
  do git checkout --track $i
done

Then copy your original repo to two separate dirs: repo_a and repo_b.

cp -a source_repo repo_a
cp -a source_repo repo_b

Filter the history

Following command will delete all dirs that exclusively belong to repo B, thus we create repo A. Filtering is not limited to directories. You can provide relative paths to files, dirs etc.

cd repo_a
git filter-branch --index-filter 'git rm --cached -r dir8, dir2 || true' -- --all

cd repo_b
git filter-branch --index-filter 'git rm --cached -r dir3, dir4, dir7 || true' -- --all

Note that the `|| true` prevents git from failing to filter our dirs mentioned in the `rm` clause in early stages of the git history where the dirs did not yet exist.

Look at the list of branches once again (in both repos):

git branch -l

Set new origins and push

In every repo, we need to remove the old origin and set up new origin. After it's done, we're ready to push.

Remove old origin:

git remote rm origin

Add new origin:

git remote add origin git@github.com:YourOrg/repo_a.git

Push all tracked branches:

git push origin --all

That's it!


Comments

published by noreply@blogger.com (Greg Sabino Mullane) on 2014-11-10 22:07:00 in the "git" category

When using git, being able to track down a particular version of a file is an important debugging skill. The common use case for this is when someone is reporting a bug in your project, but they do not know the exact version they are using. While normal software versioning resolves this, bug reports often come in from people using the HEAD of a project, and thus the software version number does not help. Finding the exact set of files the user has is key to being able to duplicate the bug, understand it, and then fix it.

How you get to the correct set of files (which means finding the proper git commit) depends on what information you can tease out of the user. There are three classes of clues I have come across, each of which is solved a different way. You may be given clues about:

  1. Date: The date they downloaded the files (e.g. last time they ran a git pull)
  2. File: A specific file's size, checksum, or even contents.
  3. Error: An error message that helps guide to the right version (especially by giving a line number)

Finding a git commit by date

This is the easiest one to solve. If all you need is to see how the repository looked around a certain point in time, you can use git checkout with git-rev-parse to get it. I covered this in detail in an earlier post, but the best answer is below. For all of these examples, I am using the public Bucardo repository at git clone git://bucardo.org/bucardo.git

$ DATE='Sep 3 2014'
$ git checkout `git rev-list -1 --before="$DATE" master`
Note: checking out '79ad22cfb7d1ea950f4ffa2860f63bd4d0f31692'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 79ad22c... Need to update validate_sync with new columns

Or if you prefer xargs over backticks:

$ DATE='Sep 3 2014'
$ git rev-list -1 --before="$DATE" master | xargs -Iz git checkout z

What about the case in which there were multiple important commits on the given day? If the user doesn't know the exact time, you will have to make some educated guesses. You might add the -p flag to git log to examine what changes were made and how likely they are to interact with the bug in question. If it is still not clear, you may just want to have the user mail you a copy or a checksum of one of the key files, and use the method below.

Once you have found the commit you want, it's a good idea to tag it right away. This applies to any of the three classes of clues in this article. I usually add a lightweight git tag immediately after doing the checkout. Then you can easily come back to this commit simply by using the name of the tag. Give it something memorable and easy, such as the bug number being reported. For example:

$ git checkout `git rev-list -1 --before="$DATE" master`
## Give a lightweight tag to the current commit
$ git tag bug_23142
## We need to get back to our main work now
$ git checkout master
## Later on, we want to revisit that bug
$ git checkout bug_23142
## Of course, you may also want to simply create a branch

Finding a git commit by checksum, size, or exact file

Sometimes you can find the commit you need by looking for a specific version of an important file. One of the "main" files in the repository that changes often is your best bet for this. You can ask the user for the size, or just a checksum of the file, and then see which repository commits have a matching entry.

Finding a git commit when given a checksum

As an example, a user in the Bucardo project has encountered a problem when running HEAD, but all they know is that they checked it out of sometime in the last four months. They also run "md5sum Bucardo.pm" and report that the MD5 of the file Bucardo.pm is 767571a828199b6720f6be7ac543036e. Here's the easiest way to find what version of the repository they are using:

$ SUM=767571a828199b6720f6be7ac543036e
$ git log --format=%H 
  | xargs -Iz sh -c 
    'echo -n "z "; git show z:Bucardo.pm | md5sum' 
  | grep -m1 $SUM 
  | cut -d " " -f 1 
  | xargs -Iz git log z -1
xargs: sh: terminated by signal 13
commit b462c256e62e7438878d5dc62155f2504353be7f
Author: Greg Sabino Mullane 
Date:   Fri Feb 24 08:34:50 2012 -0500

    Fix typo regarding piddir

I'm using variables in these examples both to make copy and paste easier, and because it's always a good idea to save away constant but hard-to-remember bits of information. The first part of the pipeline grabs a list of all commit IDs: git log --format=%H.

We then use xargs to feed list of commit ids one by one to a shell. The shell grabs a copy of the Bucardo.pm file as it existed at the time of that commit, and generates an MD5 checksum of it. We echo the commit on the line as well as we will need it later on. So we now generate the commit hash and the md5 of the Bucardo.pm file.

Next, we pipe this list to grep so we only match the MD5 we are looking for. We use -m1 to stop processing once the first match is found (this is important, as the extraction and checksumming of files is fairly expensive, so we want to short-circuit it as soon as possible). Once we have a match, we use the cut utility to extract just the commit ID, and pipe that back into git log. Voila! Now we know the very last time the file existed with that MD5, and can checkout the given commit. (The "terminated by signal 13" is normal and expected)

You may wonder if a sha1sum would be better, as git uses those internally. Sadly, the process remains the same, as the algorithm git uses to generate its internal SHA1 checksums is sha1("blob " . length(file) . "" . contents(file)), and you can't expect a random user to compute that and send it to you! :)

Finding a git commit when given a file size

Another piece of information the user can give you very easily is the size of a file. For example, they may tell you that their copy of Bucardo.pm weighs in at 167092 bytes. As this file changes often, it can be a unique-enough marker to help you determine when they checkout out the repository. Finding the matching size is a matter of walking backwards through each commit and checking the file size of every Bucardo.pm as it existed:

$ SIZE=167092
$ git rev-list --all 
  | while read commit
 do if git ls-tree -l -r $commit 
  | grep -q -w $SIZE
 then echo $commit
 break
 fi
 done
d91807d59a6326e48077311e96e4d5730f24304c

The git ls-tree command generates a list of all blobs (files) for a given commit. The -l option tells it to also print the file size, and the -r option asks it to recurse. So we use git rev-list to generate a list of all the commits (by default, these are output from newest to oldest). Then we pass each commit to the ls-tree command, and use grep to see if that number appears anywhere in the output. If it does, grep returns truth, making the if statement fire the echo, which shows is the commit. The break ensures we stop after the first match. We now have the (probable) commit that the user checked the file out of. As we are not matching by filename, it's probably a good idea to double-check by running git ls-tree -l -r on the given commit.

Finding a git commit when given a copy of the file itself

This is very similar to the size method above, except that we are given the file itself, not the size, so we need to generate some metadata about it. You could run a checksum or a filesize and use one of the recipes above, or you could do it the git way and find the SHA1 checksum that git uses for this file (aka a blob) by using git hash-object. Once you find that, you can use git ls-tree as before, as the blob hash is listed next to the filename. Thus:

$ HASH=`git hash-object ./bucardo.clue`
$ echo $HASH
639b247aab027b79bda788182c8b6785ed319662
$ git rev-list --all 
  | while read commit
 do if git ls-tree -r $commit 
  | grep -F -q $HASH
 then echo $commit
 break
 fi
 done
cd1d776307204cb77a731aa1b15c3c43a275c70e

Finding a git commit by error message

Sometimes the only clue you are given is an error message, or some other snippet that you can trace back to one or more commits. For example, someone once mailed the list to ask about this error that they received:

DBI connect('dbname=bucardo;host=localhost;port=5432',
  'bucardo',...) failed: fe_sendauth: no password supplied at 
  /usr/local/bin/bucardo line 8627.

A quick glance at line 8627 of the file "bucardo" in HEAD showed only a closing brace, so it must be an earlier version of the file. What was needed was to walk backwards in time and check that line for every commit until we find one that could have triggered the error. Here is one way to do that:

$ git log --format=%h 
  | xargs -n 1 -I sh -c 
  "echo -n {}; git show {}:bucardo | head -8627 | tail -1" 
  | less
## About 35 lines down:
379c9006     $dbh = DBI->connect($BDSN, 'bucardo'...

Therefore, we can do a "git checkout 379c9006" and see if we can solve the user's problem.

These are some of the techniques I use to hunt down specific commits in a git repository. Are there other clues you have run up against? Better recipes for hunting down commits? Let me know in the comments below.


Comments

published by noreply@blogger.com (Greg Sabino Mullane) on 2014-07-09 20:22:00 in the "git" category

I work with a lot of open source projects, and I use the command-line for almost everything. It often happens that I need to examine a file from a project, and thanks to bash, Github, and curl, I can do so easily, without even needing to have the repo handy. One of the things I do sometimes is compare a file across versions to see what has changed. For example, I needed to see what changes were made between versions 1.22 and 1.23 to the file includes/UserMailer.php which is part of the MediaWiki project. For this trick to work, the project must be on Github, and must label their versions in a consistent manner, either via git branches or git tags.

MediaWiki exists on Github as wikimedia/mediawiki-core. The MediaWiki project tags all of their releases in the format X.Y.Z, so in this example we can use the git tags 1.22.0 and 1.23.0. Github is very nice because you can view a specific file at a certain commit (aka a tag), and even grab it over the web as a plain text file. The format is:

https://raw.githubusercontent.com/PROJECTNAME/BRANCH-OR-TAG/FILE

Note that you can use a tag OR a branch! So to compare these two files, we can use one of these pairs:

https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_21/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_22/includes/UserMailer.php

https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.21.0/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.22.0/includes/UserMailer.php

All that is left is to treat git as a web service and compare the two files at the command line ourselves. The program curl is a great tool for downloading the files, as it dumps to stdout by default. We will add a -s flag (for "silent") to prevent it from showing the progress meter as it usually does. The last bit of the puzzle is to use <(), bash's process substitution feature, to trick diff into comparing the curl outputs as if they were files. So our final command is:

diff <(curl -s https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.21.0/includes/UserMailer.php) 
<(curl -s https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.22.0/includes/UserMailer.php) 
| more

Voila! A quick and simple glance at what changed between those two tags. This should work for any project on Github. You can also replace the branch or tag with the word "master" to see the current version. For example, the PostgreSQL project lives on github as postgres/postgres. They use the format RELX_Y_Z in their tags. To see what has changed since release 9.3.4 in the psql help file (as a context diff), run:

diff -c <(curl -s https://raw.githubusercontent.com/postgres/postgres/REL9_3_4/src/bin/psql/help.c) 
<(curl -s https://raw.githubusercontent.com/postgres/postgres/master/src/bin/psql/help.c)

You are not limited to diff, of course. For a final example, let's see how many times Tom Lane is mentioned in the version 9 release notes:

for i in {0,1,2,3,4}
do grep -Fc 'Tom Lane' 
<(curl -s https://raw.githubusercontent.com/postgres/postgres/master/doc/src/sgml/release-9.$i.sgml)
done
272
206
174
115
16

The last number is so low relative to the rest because 9.4 is still under development. Rest assured Tom's contributions have not slowed down! :) Thanks to Github for providing such a useful service for so many open source projects, and for providing the raw text to allow useful hacks like this.


Comments

published by noreply@blogger.com (Greg Sabino Mullane) on 2014-05-19 17:18:00 in the "git" category

There are times when you need to view a git repository as it was at a certain point in time. For example, someone sends your project an error report and says they were using the git head version from around January 17, 2014. The short (and wrong!) way to do it is to pass the date to the checkout command like so:

$ git checkout 'HEAD@{Jan 17 2014}'

While I used to rely on this, I no longer do so, as I consider it somewhat of a footgun. To understand why, you first have to know that the ability to checkout using the format above only works for a short window of time, as defined by the git parameter gc.reflogExpire. This defaults to a measly 90 days. You can view yours with git config gc.reflogExpire. The problem is that when you go over the 90 day limit, git outputs a warning, but them spews a mountain of output as it performs the checkout anyway! It uses the latest entry it has in the reflog (e.g. 90 days ago). This commit has no relation at all with the date you requested, so unless you catch the warning, you have checked out a repository that is useless to your efforts.

For example, the Bucardo project can be cloned via:

$ git clone git://bucardo.org/bucardo.git/

Now let's say we want to examine the project as it looked on January 17, 2014. As I am writing this, the date is May 19, 2014, so that date occurred about four months ago: well over 90 days. Watch what happens:

$ git checkout 'HEAD@{Jan 17 2014}'
warning: Log for 'HEAD' only goes back to Sat, 22 Feb 2014 11:47:33 -0500.
Note: checking out 'HEAD@{Jan 17 2014}'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at d7f89dd... Bucardo now accepts pg_service for databases

So, we get the warning that HEAD only goes back to Feb 22, but then git goes ahead and checks us out anyway! If you were not paying attention - perhaps because you only glanced over that perfectly ordinary looking last line - you might not realize that the checkout you received is not what you requested.

Since this behavior cannot, to my knowledge, be turned off, I avoid this method and use other ways to checkout the repo as it existed on a certain date. The simplest is to find the closest commit by viewing the output of git log. In smaller projects, you can simply do this in a text editor and search for the date you want, then find a good commit sha-1 hash to checkout (i.e. git log > log.txt; emacs log.txt). Another somewhat canonical way is to use git-rev-list:

$ git checkout `git rev-list -1 --before="Jan 17 2014" master`

This command works fine, although it is a little clunky and hard to remember. It's requesting a list of all commits on the master branch, which happened before the given date, ordered by date, and stop once a single row has been output. Since I deal with SQL all day, I think of this as:

SELECT repository WHERE commit_id = 
  (SELECT commit
   FROM rev-list
   WHERE commit_date <= 'Jan 10, 2014'
   AND branch = 'master'
   ORDER BY commit_date DESC
   LIMIT 1
  );

This is one of the cases where the date IS inclusive. With git, you should always test when using date ranges if the given date is inclusive or exclusive, as reading the fine manual does not always reveal this information. Here is one way to prove the date is inclusive for the rev-list command:

$ git rev-list -1 --before="Jan 17 2014" master --format=medium
commit d4b565bf46b6f478b969a378578b0cff3b24e82d
Author: Greg Sabino Mullane 
Date:   Fri Jan 17 10:49:09 2014 -0500

    Make our statement_chunk_size default match up.

As a final nail in the coffin for doing a checkout via the reflog date, the reflog actually is local to you and will pull the date of the repo as it existed for you at that point in time. This may or may not line up with the commits, depending on how often you are syncing with other people via git pull or other methods! So play it safe and request a specific commit by sha-1 hash, or use the rev-list trick.


Comments

published by noreply@blogger.com (Greg Sabino Mullane) on 2014-05-10 01:13:00 in the "git" category

Image by Flickr user Susan Drury

Upgrading MediaWiki can be a challenging task, especially if you use a lot of extensions. While the core upgrade process usually goes smoothly, it's rare you can upgrade a major version or two without having to muddle with your collection of extensions. Extensions are bits of code that extend what MediaWiki can do. Only a few are packaged with and maintained alongside MediaWiki itself - the great majority are written by third-party developers. When the MediaWiki API changes, it is up to those developers to update their extension so it works with the new version of MediaWiki. This does not always happen. Take for example one of the more common errors seen on a MediaWiki upgrade since 1.21 was released:


[Tue May 06 11:21:52 2014] [error] [client 12.34.56.78] PHP Fatal error: Call to undefined function wfLoadExtensionMessages() in /home/beckett/mediawiki/extensions/PdfExport/PdfExport.php on line 83, referer: http://test.ziggy.com/wiki/Main_Page

This is because the wfLoadExtensionMessages function, which many extensions use, has been deprecated since MediaWiki version 1.16 and was finally removed in 1.21, resulting in the error seen above. Luckily, this function has been a no-op since 1.16, so it is safe to comment it out and/or make a dummy function in your LocalSettings.php file (see below).

Sadly, the release notes for 1.21 make no mention of this fairly major change. Let's walk through as if we didn't know anything about it and see how we could solve the given error with the help of git. For this example, we'll use the Pdf Export extension, which allows you to export your wiki pages into PDF form. A pretty handy extension, and one which completely fails to work in MediaWiki version 1.21 or better.

First, let's verify that wfLoadExtensionMessages does not exist at all in version 1.21 of MediaWiki. For these examples, I've checked out the MediaWiki code via git, and am relying on the fact that lightweight git tags were made for all the versions we are interested in.

$ git clone https://github.com/SemanticMediaWiki/SemanticMediaWiki.git mediawiki
$ cd mediawiki
$ git grep wfLoadExtensionMessages 1.21.0
1.21.0:HISTORY:* (bug 12880) wfLoadExtensionMessages does not use $fallback from MessagesXx.php

A nice feature of git-grep is the ability to simply use a tag after the search string. In this case, we see that the only mention of wfLoadExtensionMessages in the entire codebase is an old mention of it in the history file. Let's see what version that bug is from:

$ git grep -n wfLoadExtensionMessages 1.21.0
1.21.0:HISTORY:5280:* (bug 12880) wfLoadExtensionMessages does not use $fallback from MessagesXx.php
$ git show 1.21.0:HISTORY | head -5280 | tac | grep '===' -m1
=== Bug fixes in 1.12 ===

That message is from way back in version 1.12, and doesn't concern us. Let's take a look at what tags exist in the 1.20 branch so we can scan the latest one:

$ git tag | grep '^1.20'
1.20.0
1.20.0rc1
1.20.0rc2
1.20.1
1.20.2
1.20.3
1.20.4
1.20.5
1.20.6
1.20.7
1.20.8

Now we can peek inside version 1.20.8 and see what that function did before it was removed. By using the -A and -B (after and before) arguments to grep, we can see the entire function in context:

$ git grep wfLoadExtensionMessages 1.20.0
1.20.0:HISTORY:* (bug 12880) wfLoadExtensionMessages does not 
  use $fallback from MessagesXx.php
1.20.0:includes/GlobalFunctions.php:function wfLoadExtensionMessages() {
$ git show 1.20.8:includes/GlobalFunctions.php | 
  grep -B6 -A2 LoadExtensionMessages
/**
 * Load an extension messages file
 *
 * @deprecated since 1.16, warnings in 1.18, remove in 1.20
 * @codeCoverageIgnore
 */
function wfLoadExtensionMessages() {
    wfDeprecated( __FUNCTION__, '1.16' );
}

Thus wfLoadExtensionMessages was basically a no-op in MediaWiki version 1.20, with the caveat that it will write a deprecation warning to your error log (or, in modern versions, the debug log unless $wgDevelopmentWarnings is set). Next we want to find the last time this function did something useful - which should be version 1.15 according to the comment above. Thus:

$ git show 1.15.0:includes/GlobalFunctions.php | 
  grep -A4 LoadExtensionMessages
function wfLoadExtensionMessages( $extensionName, $langcode = false ) {
    global $wgExtensionMessagesFiles, $wgMessageCache, $wgLang, $wgContLang;

    #For recording whether extension message files have been loaded in a given language.
    static $loaded = array();

So, it's a pretty safe bet that unless you are upgrading from 1.15.0 or earlier, it should be completely safe to remove it. When was 1.16.0 released? There are no dates in the HISTORY file (shame), but the date it was tagged should be a good guess:

$ git show 1.16.0 | grep -m1 Date
Date:   Wed Jul 28 07:11:03 2010 +0000

So what should you do with extensions that are still using this deprecated function? There are two quick solutions: comment it out inside the extension, or add a dummy function to your version of MediaWiki.

Changing the extension itself is certainly quick and easy. To get the PdfExport extension to work, we only have to comments out two calls to wfLoadExtensionMessages inside of the file PdfExport.php, and one inside of PdfExport_body.php. The diff:

$ git difftool -y -x "diff -u1"
--- /tmp/7YqvXv_PdfExport.php 2014-05-08 12:45:03 -0400
+++ PdfExport.php             2014-05-08 12:34:39 -0400
@@ -82,3 +82,3 @@
   if ($img_page > 0 || $img_page === false) {
-        wfLoadExtensionMessages('PdfPrint');
+        //wfLoadExtensionMessages('PdfPrint');
                $nav_urls['pdfprint'] = array(
@@ -92,3 +92,3 @@
 function wfSpecialPdfToolbox (&$monobook) {
-          wfLoadExtensionMessages('PdfPrint');
+          //wfLoadExtensionMessages('PdfPrint');
           if (isset($monobook->data['nav_urls']['pdfprint']))
--- /tmp/7gO8Hz_PdfExport_body.php   2014-05-08 12:45:03 -0400
+++ PdfExport_body.php               2014-05-08 12:34:44 -0400
@@ -44,3 +44,3 @@
            // For backwards compatibility
-             wfLoadExtensionMessages('PdfPrint');
+             //wfLoadExtensionMessages('PdfPrint');

A better way is to add a dummy function to LocalSettings.php. This ensures that any extension we add in the future will continue to work unmodified. Just throw this at the bottom on your LocalSettings.php:

function wfLoadExtensionMessages() { }

Probably the best overall solution is to not only add that to your LocalSettings.php, but to try to get the extension changed as well. You can notify the author, or try to fix it yourself and release a new version if the extension has been abandoned. You might also look to see if the extension has been superseded by a different extension, as sometime happens.

While there may be other compatibility issues when upgrading MediaWiki, for some extensions (such as PdfExport), this is the only change needed to make it work again on newer versions of MediaWiki!


Comments

published by noreply@blogger.com (Spencer Christensen) on 2014-05-02 21:40:00 in the "git" category

There was a significant blog post some years ago. It introduced a ?successful? workflow for using Git. This workflow was named Gitflow. One of the reasons this blog post was significant is that it was the first structured workflow that many developers had been exposed to for using Git. Before Gitflow was introduced, most developers didn?t work with Git like that. And if you remember when it was introduced back in 2010, it created quite a buzz. People praised it as ?the? way to work with Git. Some adopted it so quickly and full heartedly that they dismissed any other way to use Git as immature or childish. It became, in a way, a movement.

I start with this little bit of history to talk about the void that was filled by Gitflow. There was clearly something that drew people to it that wasn?t there before. It questioned the way they were working with Git and offered something different that worked ?successfully? for someone else. I supposed many developers didn?t have much confidence or strong feelings about their use of Git before they heard of Gitflow. And so they followed someone who clearly did have confidence and strong feelings about a particular workflow. Some of you may be questioning your current Git workflow now and can relate to what I?m describing. However, I?m not going to prescribe a particular workflow for you as ?the? way to do it.

Instead, let?s talk about the purpose of a workflow. Let?s reword that so we?re clear- the purpose of a software development workflow using Git. What is the purpose? Let?s back up and ask what is the purpose of software? The purpose of software is to help people. Period. Yes it can help servers, and networks, and robots, and telephones, etc. But help them do what? Help people. They are tools to help us (people) do things better, faster, simpler, etc. I submit to you that the purpose of a software development workflow using Git should be the same. It should help people release software. Specifically, it should help match the software development process with business expectations for the people responsible for the software. That list of people responsible for the software should include more than just the developers. It also includes operations engineers, project managers, and certainly business owners.

Does your Git workflow help your business owners? Does it help your project managers or the Operations team? These are questions you should be thinking about. And by doing so, you should realize that there is no ?one size fits all? workflow that will do all that for every case. There are many different workflows based on different needs and uses. Some are for large complex projects and some are extremely simple. What you need to ask is- what will best help my team/project/organization to develop, release, and maintain software effectively? Let?s look at a few workflow examples and see.

GitHub Flow

GitHub?s own workflow, their internal workflow, is quite different from what everyone else does who uses GitHub. It is based on a set of simple business choices:

  • Anything in the master branch is deployable
  • To work on something new, create a descriptively named branch off of master (ie: new-oauth2-scopes)
  • Commit to that branch locally and regularly push your work to the same named branch on the server
  • When you need feedback or help, or you think the branch is ready for merging, open a pull request
  • After someone else has reviewed and signed off on the feature, you can merge it into master
  • Once it is merged and pushed to ?master? on the origin, you can and should deploy immediately

They release many times per day to production using this model. They branch off master for every change they make, hot fixes and features are treated the same. Then they merge back into master and release. They even have automated their releases using an irc bot.

Skullcandy?s workflow

When I worked for Skullcandy we used a workflow loosely based on the GitHub Flow model, but altered a bit. We used a Scrum Agile methodology with well defined sprints of work and deliverables at the end of each sprint. The workflow followed these business choices:

  • A userstory or defect in our tracking system represented a single deliverable, and a Git branch was created for each userstory or defect. We used a naming convention for branches (skdy/schristensen/US1234-cool-new-feature, for example). Yes, you can use ?/? characters in branch names.
  • Everything branches off master. Features and hot fixes are treated the same.
  • After code review, then the branch was merged into a QA branch and deployed to the QA environment where business owners tested and approved the changes.
  • The QA branch is just another branch off master and can be blown away and recreated when needed at any time.
  • We released once a week, and only those changes that have been approved by the business owners in QA got merged into master and released.
  • Since branch names and items in our issue tracking system were tied together we could easily verify the status of a change, the who, when, and what, and why of it, and even automate things- like auto merging of approved branches and deployment, auto updating tickets in the tracking system, and notifying developers of any merge issues or when their branch got released.

Master only workflow

Not every team or project is going to work like this. And it may be too complicated for some. It may be appropriate to just work on master without branching and merging. I do this now with some of the clients I work with.

  • Each feature or hot fix is worked on in dev environment that is similar to production, that allows business owner direct access for testing and approval. Changes are committed locally.
  • Once approved by the business owner, commit and push changes to master on origin, and then deploy to production immediately.

You may not be working for a business, and so the term ?business owner? may not fit your situation. But there should always be someone who approves the changes as acceptable for release. That person should be the same one who requested the change in the first place.

Gitflow

On the other end of the spectrum from a master only workflow, is Gitflow. Here there are at least three main branches: develop (or development), release, and master. There are other branches as well for features and hot fixes. Many of these are long running. For example, you merge develop into the release branch but then you continue working on develop and add more commits. The workflow looks like this:

  • All work is done in a branch. Features are branched off develop. Hot fixes are treated different and are branched off master.
  • Features are merged back into develop after approval.
  • Develop is merged into a release branch.
  • Hot fixes are merged back into master, but also must be merged into develop and the release branch.
  • The release branch is merged into master.
  • Master is deployed to production.

Backcountry workflow

When I worked for Backcountry.com we used a similar workflow, however we used different names for the branches. All development happened on master, feature branches were branched off and then merged back into master. Then we branched master to create a new release branch. And then we merged the release branch into a branch called ?production?. And since master is just a branch and doesn?t have to be special, you could use a branch named whatever you want for your production code.

Guidelines

There are many other examples we could go over and discuss, but these should be enough to get you thinking about different possibilities. There are a few guidelines that you should consider for your workflow:

  • Branches should be used to represent a single deliverable request from the business- like a single user story or bug fix. Something that can be approved by the business that contains everything needed for that single request to be released- and nothing more!
  • The longer a feature branch lives without getting merged in for a release, the greater risk for merge conflicts and challenges for deployment. Short lived branches merge and deploy cleaner.
  • Business owner involvement in your workflow is essential. Don?t merge, don?t deploy, don?t work without their input. Otherwise pain and tears will ensue (or worse).
  • Avoid reverts. Test, test, test your branch before a merge. When merging use git merge --no-ff, which will ease merge reverts if really needed.
  • Your workflow should fit how you release. Do you release continually, multiple times a day? Do you have 2 week sprints with completed work to release on a regular schedule? Do you have a business Change Control Board where all released items must get reviewed and approved first? Does someone else run your releases, like the Operations team or a Release manager? Your branching and merging strategy needs to make releasing easier.
  • Complicated workflows drive people crazy. Make it simple. Review your workflow and ask how you can simplify it. In actively making things more simple, you will also make them easier to understand and work with as well as easier for others to adopt and maintain.

These should help you adjust your software development workflow using Git to fulfill its purpose of helping people. Helping you.

Further Reading

There is a lot more you can read about on this topic, and here are several good places to start:


Comments

published by noreply@blogger.com (Steph Skardal) on 2013-09-16 19:51:00 in the "git" category

Git is a tool that all of us End Pointers use frequently. I was recently reviewing history on a server that I work on frequently, and I took note of the various git commands I use. I put together a list of the top git commands (and/or techniques) that I use with a brief explanation.

git commit -m "****"
This is a no-brainer – as it commits a set of changes to the repository. I always use the -m to set the git commit message instead of using an editor to do so. Edit: Jon recommends that new users not use -m, and that more advanced users use this sparingly, for good reasons described in the comments!

git checkout -b branchname
This is the first step to setting up a local branch. I use this one often as I set up local branches to separate changes for the various tasks I work on. This command creates and moves you to the new branch. Of course, if your branch already exists, git checkout branchname will check out the changes for that local branch that already exists.

git push origin branchname
After I've done a bit of work on my branch, I push it to the origin to a) back it up in another location (if applicable) and b) provide the ability for others to reference the branch.

git rebase origin/master
This one is very important to me, and our blog has featured a couple of articles about it (#1 and #2). A rebase rewinds your current changes (on your local branch), applies the changes from origin/master (or whatever branch you are rebasing against), and then reapplies your changes one by one. If there are any conflicts along the way, you are asked to resolve the conflicts, skip the commit, or abort the rebase. Using a rebase allows you to avoid those pesky merge commits which are not explicit in what changes they include and helps you keep a cleaner git history.

git push -f origin branchname
I use this one sparingly, and only if I'm the only one that's working on branchname. This comes up when you've rebased one of your local branches resulting in an altered history of branchname. When you attempt to push it to origin, you may see a message that origin/branchname has X commits different from your local branch. This command will forcefully push your branch to origin and overwrite its history.

git merge --squash branchname
After you've done a bit of work on branchname and you are ready to merge it into the master branch, you can use the --squash argument to squash/smush/combine all of your commits into one clump of changes. This command does not perform the commit itself, therefore it must be followed by a) review of the changes and b) git commit.

git branch -D branchname
If you are done with all of your work on branchname and it has been merged into master, you can delete it with this command! Edit: Phunk tells me that there is a difference between -D and -d, as with the latter option, git will refuse to delete a branch with unmerged changes, so -d is a safer option.

git push origin :branchname
Want to delete branchname from the origin? Run this command. You can leave branchname on the origin repository if you want, but I like to keep things clean with this command.

git checkout -t origin/someone_elses_branch
Use this command to set up a local branch to track another developers branch. As the acting technical project manager for one of my clients, I use this command to track Kamil's branch, in combination with the next command (cherry-pick), to get his work cleanly merged into master.

git cherry-pick hashhashhash
Git cherry-pick applies changes from a single commit (identified by hash) to your current working branch. As noted above, I typically use this after I've set up a local tracking branch from another developer to cherry-pick his or her commits onto the master branch in preparation for a deploy.

git stash, git stash apply
I only learned about git stash in the last year, however, it's become a go-to tool of mine. If I have some working changes that I don't want to commit, but a client asks me to commit another quick change, I will often stash the current changes (save them but not commit them), run a rebase to get my branch up to date, then push out the commit, then run git stash apply to restore my uncommitted changes.

Admittedly, several of my coworkers are git experts and have many more git tools in their toolboxes – I should ask one of them to follow-up on this article with additional advanced git commands I should be using! Also take note that for us End Pointers, DevCamps may influence our git toolbox because it allows us to have multiple instances (and copies of the production database) running at a time, which may require less management of git branches.


Comments

published by noreply@blogger.com (Mike Farmer) on 2012-06-21 16:00:00 in the "git" category

Perhaps you’ve made the same mistake I have. You’re right in the middle of developing a feature when a request comes up to fix a different completely unrelated problem. So, you jump right in and fix the issue and then you realize you forgot to start a new git feature branch. Suddenly you realize that you need to merge just the fix you made, but don’t want to merge the commits from the previous feature your working on.

Git rocks at manipulating branches and I knew this, but I wasn’t sure how to just move one commit to the master branch. After some digging and a little trial and error, I finally figured it out. This may not be the simplest approach, but it worked for me and wanted to share.

The branches I’ll be working with are master and feature. In the current scenario, the feature branch is 4 commits ahead of the master and the branch that I want to bring over is just the most recent.

First things first, I need to ensure my master branch is up to date.

git checkout master
git pull origin master

Then I’ll checkout my feature branch and make sure it’s completely up to date with the master branch.

git checkout feature
git rebase origin/master

Next, I’ll create a temporary feature branch that I’ll use later on to bring over the commit that I want.

git checkout -b feature_tmp

I’ll do the same for master so that I can perform my merging and rebasing in isolation from the master branch.

git checkout master
git checkout -b master_tmp

Now I’m going to merge the two tmp branches so that I have a history that contains all of my commits. This will give me the history that I want, but will include the 3 commits I don’t want.

git merge feature_tmp

Here’s where the magic happens. I’m going to rebase this branch using interactive mode. I want to rebase everything back to the last commit on the master branch. For simplicity in the commands here, we’ll just use SHA-MASTER in place of the actual SHA1 hash.

git rebase -i SHA-MASTER

This loads the commits into my editor and from here I just delete the 3 commits that I didn’t want on my master branch. This will give me the history I want with the 4th commit coming right after the last commit on the master branch. After deleting the commits, I just save and quit my editor.

Next, I merge my tmp branch into the master branch.

git checkout master
git merge master_tmp
git log

Now in the log, I can see the history is in the correct order, just how I wanted it. To finish things up, I’ll just push my changes and then rebase my feature branch which will reorder my commits to match the master branch and place my feature commits as the last three commits in the log.

git push origin master
git checkout feature
git rebase origin/master
git log

The last thing to do is delete my tmp branches.

git branch -D tmp_master
git branch -D tmp_feature

Comments

published by noreply@blogger.com (Szymon Guz) on 2012-05-02 20:30:00 in the "git" category

Git is great, but it's not always easy to use. For example, reverting a commit is a very nice feature. There are git commands for reverting a commit which has not been pushed to the main repository. However after pushing it, things are not so easy.

While I was working for one of our clients, I made about 20 commits and then I pushed them to the main repository. After that I realised that I was working on a wrong branch. The new branch I should have used wasn't created yet. I had to revert all my commits, create the new branch, and load all my changes into it.

Creating the branch named NEW_BRANCH is as easy as:

$ git branch NEW_BRANCH

Now the harder part... how to delete the commits pushed to the main repo. After reading through tons of documentation it turned out that it is not possible. You cannot just delete a pushed commit. However you can do something else.

As an example of this, I created a simple file, added a couple of lines there, and made four commits. The git log looks like this:

$ git log
commit dc47a884f7b303fc8b207550104f5a1de192c91c
Author: Szymon Guz 
Date:   Mon Apr 30 12:14:21 2012 +0200

    replaced b with d

commit 68f56d3321324bd14cd1e73d003b1e151c4d43b4
Author: Szymon Guz 
Date:   Mon Apr 30 12:14:05 2012 +0200

    added c

commit a77427d8151f143cacb85f00eb6c8170079dc290
Author: Szymon Guz 
Date:   Mon Apr 30 12:13:58 2012 +0200

    added b

commit 73e586bb6d401f4049cf977703f25bf47c93b227
Author: Szymon Guz 
Date:   Mon Apr 30 12:13:49 2012 +0200

    added a

Now let's move the last 3 commits to another branch. I will create one diff for reverting the changes and one for replaying them on the new branch. Let's call these the 'down' and 'up' diff files: 'down' for reverting, and 'up' for recreating the changes.

The up diff can be created with:

$ git diff 73e586bb6d401f4049cf977703f25bf47c93b227 dc47a884f7b303fc8b207550104f5a1de192c91c
diff --git a/test b/test
index 7898192..3171744 100644
--- a/test
+++ b/test
@@ -1 +1,3 @@
 a
+d
+c

The down diff can be created using exactly the same command, but with switched parameters:

$ git diff dc47a884f7b303fc8b207550104f5a1de192c91c 73e586bb6d401f4049cf977703f25bf47c93b227
diff --git a/test b/test
index 3171744..7898192 100644
--- a/test
+++ b/test
@@ -1,3 +1 @@
 a
-d
-c

I saved the diffs into files called 'up.diff' and 'down.diff'.

On the old branch I want to revert the changes, after doing this I will just commit the changes and the branch will look like it was before all the commits. However all the commits stay in the branch. This something like a revert commit.

I reverted the changes on current branch with:

$ patch -p1 < down.diff 
patching file test
$ git commit -a -m "reverted the changes, moved to another branch"

Now let's move the changes into the new branch. I need to create the new branch from the repo after the first commit:

$ git branch NEW_BRANCH 73e586bb6d401f4049cf977703f25bf47c93b227

Switch to the new branch:

$ git checkout NEW_BRANCH

Apply the up.diff patch to the new branch:

patch -p1 < up.diff

And commit the changes:

$ git commit -a -m "Applied changes from the other branch"

I know that all the steps can be replaced with different ones, however this solution worked for me pretty well and without any problem.


Comments

published by noreply@blogger.com (Josh Williams) on 2011-12-15 17:31:00 in the "git" category

In a number of places we've started tracking configuration files in git. It's great for Postgres configs, Apache or nginx, DNS zone files, Nagios, all kinds of things. A few clients have private offsite repos we push to, like at GitHub, but for the most part they're independent repos. It's still great for keeping track of what was changed when, and by whom.

In one case we have a centralized Nagios instance that does little more than receive passive checks from a number of remote systems. I'd set the checks on the remote systems but not loaded that configuration in yet. However while getting the central system set up, muscle memory kicked in and I suddenly had a half-red console as it's loading in stale data.

We don't need a flood of false alerts over email, but I don't want to completely revert the config and lose all those services...

[root nagios]# git stash; service nagios restart; git stash apply
Saved working directory and index state WIP on master: 0e9113b Made up commit for blog
HEAD is now at 0e9113b Made up commit for blog
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
# On branch master
# (etc)

Green! A small victory, for sure, but it shows one more advantage of modern SCM's.


Comments

published by david@endpoint.com (David Christensen) on 2011-06-14 21:07:00 in the "git" category

Just a note to everyone that development the official DBD::Pg DBI driver for PostgreSQL source code repository has moved from its old home in SVN to a git repository. All development has now moved to this repo.

We have imported the SVN revision history, so it's just a matter of pointing your git clients to:

$ git clone git://bucardo.org/dbdpg.git

For those who prefer, there is a github mirror:

$ git clone git://github.com/bucardo/dbdpg.git

Git is available via many package managers or by following the download links at http://git-scm.com/download for your platform.

Enjoy!


Comments

published by noreply@blogger.com (Brian J. Miller) on 2011-03-29 13:54:00 in the "git" category

As a software engineer I'm naturally inclined to be at least somewhat introverted :-), combine that with the fact that End Point is PhysicalWaterCooler challenged and you have a recipe for two things to naturally occur, 1) talking to oneself (but then who doesn't do that really? no, really.), 2) finding friends in unusual places. Feeling a bit socially lacking after a personal residence move, I was determined to set out to find new friends, so I found one, his name is "--interactive", or Mr. git add --interactive.

"How did we meet?" You ask. While working on a rather "long winded" project I started to notice myself sprinkling in TODOs throughout the source code, not a bad habit really (presuming they do actually eventually get fixed), but unfortunately the end result is having a lot of changed files in git that you don't really need to commit, but at the same time don't really need to see every time you want to review code. I'm fairly anal about reviewing code and so I was generally in the habit of running a `git status` followed by a `git diff ` on every file that was mentioned by status. These are two great friends, but of late they just don't seem to be providing the inspiration they once did. Enter my new friend `git add --interactive`. Basically he combines the two steps for me in a nice, neat controlled way while adding a bit of spice to my life, in particular per change inclusion capability. When running `git add` with the interactive flag you are provided with an overall index status immediately followed by a prompt. At that prompt you have an option of "5) patch", by entering "5", then return, you are then provided the index (basically) again. From that index you can select from a list of files for which you would like to review patches. For each reviewed patch you can then specify whether to include that patch for commit, skip it, divide (split) it into smaller patches for further review, or even edit it. When selecting the files to review the patches for it is simple to choose a range of files by entering a specifically formatted string, i.e. "1-12,15-18,19". With --interactive the time it takes to review the code pending commit and skip through the TODOs is greatly reduced, something the client definitely appreciates.

"But what about your other old friends?" You then ask. Well, as it turns out my spending so much time with interactive add made `git stash` feel a bit lonely, and it dawned on me that tracking those TODOs in the working tree at all may be a bit silly. What could a guy do, perhaps these two friends might actually like to party together? As it turns out they had already been introduced and do like to party together (not sure why they couldn't have just invited me before, though it might have something to do with my past friendship with SVN and RCS). Either way, to once and for all get those unsightly TODOs out from under my immediate purview while keeping other changes I still needed in the index I found `git stash save --patch --no-keep-index "TODO Tracking"`. "save" instructs git stash to save a new stash, "--patch" tosses it into an interactive mode similar to the one described above for add, "--no-keep-index" instructs stash not to keep the changes in the working tree that are added to the created stash, and the "TODO Tracking" is just a message to make it easy for a human to understand what the stash contains (I made this one up for my specific immediate purpose). This leaves my working tree and index clean for me to do more pressing work and to know that when I have the time/need to restore those past TODOs I can, so that they may be worked on as well. Note that I've not really used this technique much (read: I've just done it now for the first time) so we'll see if it really is that useful, but the interactive patching I've used and it is definitely worth it.

As a further side bar I was discussing multiple commit indexes in a Git repo with someone in the #yui channel, and as soon as I found the above it occurred to me that using multiple stashes where you pop them could work in effect the same way, though I don't know if there is a way to add patches to an already created stash. That might make a neat feature to investigate and/or request from the Git core.

Just so you aren't too concerned, there is still a place in my heart for `git add` and `git status` even if I don't see them as frequently as I once did.


Comments

published by noreply@blogger.com (Jeff Boes) on 2010-10-19 22:00:00 in the "git version-control workflow" category

Around here I have a reputation for finding the tiniest pothole on the path to git happiness, and falling headlong into it while strapped to a bomb ...

But at least I'm dedicated to learning something each time. This time it involved branches, and how git knows whether you have merged that branch into your current HEAD.

My initial workflow looked like this:

 $ git checkout -b MY_BRANCH
   (some editing)
 $ git commit
 $ git push origin MY_BRANCH
   (later)
 $ git checkout origin/master
 $ git merge --no-commit origin/MY_BRANCH
   (some testing and inspection)
 $ git commit
 $ git rebase -i origin/master

This last step was the trip-and-fall, although it didn't hurt me so much as launch me off my path into the weeds for a while. Once I did the "git rebase", git no longer knows that MY_BRANCH has been successfully merged into HEAD. So later, when I did this:

 $ git branch -d MY_BRANCH
 error: the branch 'MY_BRANCH' is not fully merged.

As I now understand it, the history is no longer a subset of the history associated with MY_BRANCH, so git can't tell the two are related and refuses to delete the branch unless you supply it with -D. A relatively harmless situation, but it set off all sorts of alarms for me, as I thought I messed up the merge somehow.


Comments

published by noreply@blogger.com (Ron Phipps) on 2010-09-03 19:19:00 in the "git" category

I'm involved at End Point often with Interchange site migrations. These migrations can be due to a new client coming to us and needing hosting or migrating from one server to another within our own infrastructure.

There are many different ways to do a migration, in the end though we need to hit on certain points to make sure that the migration goes smoothly. Below you will find steps which you can adapt for your specific migration.

During the start of the migration it might be a good time to introduce git for source control. You can do this by creating the repository and cloning it to /home/account/live, setting up .gitignore files for logs, counter files, gdbm files. Then commit the changes back to the repo and you've now introduced source control without much effort, improving the ability to make changes to the site in the future. This is also helpful to document the changes you make to the code base along the way during the migration in case you need to merge changes from the current production site before completing the migration.

  • Export all of the gdbm databases to their text file equivalents on the production server
  • Take a backup from production of the database, catalog, interchange server, htdocs
  • Setup an account
  • Create the database and user
  • Restore the database, catalog, interchange server and htdocs
  • Update the paths in interchange/bin for each script to point at the new location
  • Grep the restored code for hard coded paths and update those paths to the new locations. Better yet move these paths out to a catalog_local.cfg where environment specific information can go.
  • Grep the restored code for hard coded urls and use the [area] tag to generate the urls
  • Update the urls in products/variable.txt to point at the test domain
  • Update the sql settings in products/variable.txt to point at the new database using the new user
  • Remove the gdbm databases so they will be recreated on startup from the source text files
  • Install a local Perl if it's not already installed (./configure -des will compile and install Perl locally)
  • Install Bundle::InterchangeKitchenSink
  • Install the DBD module for MySQL or PostgreSQL
  • Review the code base looking for use statements in custom code and Require module settings in interchange.cfg. Install the Perl modules found into the local Perl.
  • Setup a non ssl and ssl virtual host using a temporary domain. Configure the temporary domain to use the SSL certificate from the production domain.
  • Firewall or password protect the virtual host so it is not accessible to the public
  • Generate a vlink using interchange/bin/compile and copy it into the cgi-bin directory and name it properly
  • Startup the new Interchange
  • Review error messages and resolve until Interchange will start properly
  • Test the site thoroughly, resolving issues as they appear. Make sure that checkout, charging credit cards, sending of emails, using the admin, etc all function.
  • Migrate any cron jobs running on the current production site, such as session expiration scripts
  • Setup logrotation for the new logs that will be created
  • Verify that you have access to make DNS changes
  • Set the TTL for the domain to a low value such as 5 minutes
  • Modify the new production site to respond to the production url, test by updating your hosts file to manually set the IP address of the domain
  • Shutdown the new Interchange
  • Restore a copy of the original backup for Interchange, the catalog and htdocs to /tmp on the production server
  • Shutdown the production Interchange, put up a maintenance note on the production site.
  • Take a backup of the production database and restore on the new server
  • Diff the Interchange, catalog and htdocs directory between /tmp and the current production locations, making note of the files that have changed since we took the original copy.
  • Copy the files that have changed, making sure to merge with any changes we have made on the new production site. Making sure to copy over all .counter and .autonumber files to the new production site.
  • Start Interchange on the new production server
  • Test the site thoroughly on the new production server, using the production url. Make sure that checkout with charging the credit card functions properly.
  • Resolve any remaining issues found during the testing
  • Setup the Interchange daemon to start at boot for this site in /etc/rc.d/rc.local or in cron using @reboot
  • Update DNS to point at the new production IP address
  • Update the TTL of the domain to a longer value
  • Open the site to the public by opening the firewall or removing the password protection
  • Keep an eye on the error logs for any issues that might crop up

This will hopefully give you a solid guide for performing an Interchange site migration from one server to another and some of the things to watch out for that might cause issues during the migrations.


Comments

published by noreply@blogger.com (Ethan Rowe) on 2010-08-20 13:38:00 in the "git" category

I recently had to spend a few hours merging Git branches to get a development branch in line with the master branch. While it would have been a lot better to do this more frequently along the way (which I'll do going forward), I suspect that plenty of people find themselves in this position occasionally.

The work done in the development branch represents significant new design/functionality that refactors a variety of older components. My preference was to use a rebase rather than a merge, to keep the commit history clean and linear and, more critically, because the work we're doing really can be thought of as being "applied to" the master branch.

No doubt there are a variety of strategies to apply here. This worked for me and perhaps it'll help someone else.

Some Key Concerns for a Big Rebase

Beyond the obvious concern of having sufficient knowledge of the application itself, so that you can make intelligent choices with respect to the code, there are a number of key operational concerns specific to rebase itself. This list is not exhaustive, but it is not an unreasonable set of key considerations to keep in mind.

  1. Rebase is destructive

    Remember what you're doing! While a merge literally combines two or more revision histories, a rebase takes a chunk of revision history and applies it on top of another related history. It's like a cherry-pick on steroids (really nice, friendly steroids that provoke neither rage nor senate hearings): each commit gets logically applied on top of the specified head, and as such gets rewritten. The commits are not the same afterwards. The history of your working tree's branch is rewritten.

    So, before you rebase, protect yourself: Make sure you have more than one reference (either a branch or a tag) pointing to your current work.

  2. Conflict resolution can bring about bugs

    When resolving merge conflicts along the way, you'll need to manually inspect things to try to figure out the right path forward. If it's been a while since you merged/rebased, you may find that merge conflict resolution is not so simple: rather than picking one version or the other, you're literally merging them in some logical manner. You may end up writing new code, in other words.

    Because you are involved and you are a mammal, there is a decent possibility that you will screw this up.

    So, again, protect yourself: Look at what's coming before you rebase and take note of likely conflict resolution points.

  3. Things go wrong and an abort can be necessary

    Some times it becomes quite clear that a mistake has been made along the way, and you need to bail out and regroup. If you're doing a gigantic rebase in one big shot, this can happen after you're 15, 45, 90, or 120+ minutes into the task. Do you really want to have to go all the way back to the beginning of your rebase excursion and start fresh?

    Don't let this happen. When approaching the rebase, show humility, expect things to go wrong, and embrace a strategy that lets you recover from mistakes:

    Break the rebase into smaller chunks and proceed through them incrementally

  4. You may not immediately know that something went wrong

    Unless the code base is pretty trivial or you are 100% committed to that code base all the time, it is unlikely that you'll be completely on top of everything that's happened in both revision histories. You can test the stuff you know, you can run test suites, etc., but it's critical to work defensively.

    Prepare for the possibility of delayed mistake revelation: Keep track of what you do as you go

Addressing the Concerns

The technique I've come to use to address the stated concerns is fairly simple to learn, understand, and apply in practice. It's iterative in nature and is therefore Agile and therefore grants me a sense of personal validation, which is very, very important.

For a real-world use case, you'll probably want to use more helpful, specific branch and tag names than this. The names in this discussion are deliberately simple for illustrative purposes.

Say you have a master branch which represents the canonical state of the code base. You've been working on the shiny branch where everything is more awesome. But shiny really needs to keep up with master, it's been a while, and so you want to rebase shiny onto master.

We're going to have the following things:

  • Multiple stages of rebasing, leading incrementally from shiny to the full rebase of shiny on master.
  • A "target" for each stage: the commit from master onto which your rebasing the work from shiny
  • A tag providing an intuitive name for each target
  • A branch providing the revision history for each stage

Given those things, we can follow a simple process:

  1. Make a branch from the latest shiny named for the next stage (i.e. from shiny we make shiny_rebase_01, from shiny_rebase_02 we make shiny_rebase_03, and so on).

    When you're just starting the rebase, this might mean:

    [you@yours repo] git checkout -b shiny_rebase_01 shiny
    
    But for the next iteration, you would have shiny_rebase_01 checked out, and use it as your starting place:
    # The use of "shiny_rebase_01" is implied assuming our previous checkout above
    [you@yours repo] git checkout -b shiny_rebase_02
    
    # A subsequent one, again assuming we're on our most recent stage's branch already
    [you@yours repo] git checkout -b shiny_rebase_03
    
    And so on.

    This addresses concerns 1, 3, and 4: you're protecting yourself against rebase's inherent destructiveness, by always working on new branches; you're facilitating the staging of work in smaller chunks, and you're keeping track of your work by having a separate branch representing the state of each change.

  2. Review the revision history of master, look for commits likely to contain significant conflicts or representing significant inflection points, and pick your next target commit around them; if you have a pile of simple commits, you might want the target to be the last such simple commit prior to a big one, for instance. If you have a bunch of big hairy commits you may want each to be its own target/stage, etc. Use your knowledge of the app.

    The git whatchanged command is very useful for this, as by default it lists the files changed in a commit, which is the right granularity for this kind of work. You want to quickly scan the history for commits that affect files you know to be affected by your work in shiny, because they will be a source of conflict resolution points. You don't want to look at the full diff output of git log -p for this purpose; you simply want to identify likely conflict points where manual intervention will be required, where things may go wrong. After having identified such points, you can of course dig into the full diffs if that's helpful.

    Make your life easy by using the last target tag as the starting place for this review, so you only wade through the commits on master that are relevant to the current rebase stage (since the last target tag is where your branches diverge, it's where the rebase will start from).

    At this point you may say "but I don't have a last target tag!" The first time through, you won't have one because you haven't done an iteration yet. So for the first time, you can start from where git rebase itself would start:

    [you@yours repo] git whatchanged `git merge-base master shiny`..master
    

    But subsequent iterations will have a tag to reference (see the next step), so the next couple times through might look like:

    
    [you@yours repo] git whatchanged shiny_rebase_target_01..master
    
    [you@yours repo] git whatchanged shiny_rebase_target_02..master
    

    Etc.

    This is addressing items 2 and 3: we're looking at what's coming before we leap, and structuring our work around the points where things are likely to be inconvenient, difficult, etc.

  3. Having identified the commit you want to use as your next rebasing point, make a tag for it. Name the tags consistently, so they reflect the stage to which they apply. So, if this is our first pass through and we've determined that we want to use commit a723ff127 for our first rebase point, we say:

    [you@yours repo] git tag shiny_rebase_target_01 a723ff127
    

    This gives us a list of tags representing the different points in the master onto which we rebased shiny in our staged process. It therefore addresses item 4, keeping track as you go.

  4. You're now on a branch for the current stage, you have a tag representing the point from master onto which you want to rebase. So do it, but capture the output of everything. Remember: mistakes along the way may not be immediately apparent. You will be a happier person if you've preserved all the operational output so you can review to track down where things potentially went wrong.

    So, for example:

    [you@yours repo] git rebase shiny_rebase_target_01 >> ~/shiny_rebase_work/target_01.log 2>&1
    
    You would naturally update the tag and logfile per stage.

    Review the logfile in your pager of choice. Is there a merge conflict reported at the bottom? Well, capture that information before you dive in and resolve it:

    # Log the basic info about the current state
    [you@yours repo] git status >> ~/shiny_rebase_work/target_01.log 2>&1
    # Log specifically what the conflicts are
    [you@yours repo] git diff >> ~/shiny_rebase_work/target_01.log 2>&1
    

    Now go and resolve your conflicts per usual, but remember to preserve your output when you resume:

    [you@yours repo] git rebase --continue >> ~/shiny_rebase_work/target_01.log 2>&1
    

    This addresses point 4: keeping track of what happened as you go.

  5. Now you finished that stage of the rebase, you resolved any conflicts along the way, you've preserved history of what happened, what was done, etc. So the final step is: test.

    Run the test suite. You did implement one, right?

    Test the app manually, as appropriate.

    Don't put it off until the end. Test as you go. Seriously. If something is broken, use git blame, git bisect, and your logs and knowledge of the system to figure out where the problem originates. Consider blowing away the branch you just made, going back to the previous stage's branch, selecting a new target, and moving forward with a smaller set of commits. Etc. But make sure it works as you go.

    This does not necessarily fit any specific point, but is more to ensure the veracity of the overall staged rebase process. The point of iterative work is that each iteration delivers a small bit of working stuff, rather than a big pile of broken stuff.

  6. Repeat this process until you've successfully finished a rebase stage for which the target is in fact the head of master. Done.

So, that's the process I've used in the past. It's been good for me, maybe it can be good for you. If anybody has criticisms or suggestions I'd love to hear about them in comments.


Comments