All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by noreply@blogger.com (Jon Jensen) on 2008-12-22 02:09:00 in the "git" category

It's awesome to see that the Perl 5 source code repository has been migrated from Perforce to Git, and is now active at http://perl5.git.perl.org/. Congratulations to all those who worked hard to migrate the entire version control history, all the way back to the beginning with Perl 1.0!

Skimming through the history turns up some fun things:

  • The last Perforce commit appears to have been on 16 December 2008.
  • Perl 5 is still under very active development! (It seems a lot of people are missing this simple fact, so I don't feel bad stating it.)
  • Perl 5.8.0 was released on 18 July 2002, and 5.6.0 on 23 March 2000. Those both seem so recent ...
  • Perl 5.000 was released on 17 October 1994.
  • Perl 4.0.00 was released 21 March 1991, and the last Perl 4 release, 4.0.36, was released on 4 February 1993. For having an active lifespan of only 4 or so years till Perl 5 became popular, Perl 4 code sure kicked around on servers a lot longer than that.
  • Perl 1.0 was announced by Larry Wall on 18 December 1987. He called Perl a "replacement" for awk and sed. That first release included 49 regression tests.
  • Some of the patches are from people whose contact information is long gone, rendered in Git commits as e.g. Dan Faigin, Doug Landauer <unknown@longtimeago>.
  • The modern Internet hadn't yet completely taken over, as evidenced by email addresses such as isis!aburt and arnold@emoryu2.arpa.
  • The first Larry Wall entry with email address larry@wall.org was 28 June 1988, though he continued to use his jpl.nasa.gov after that sometimes too.
  • There are some weird things in the commit notices. For example, it's hard to believe the snippet of Perl code in the following change notice wasn't somehow mangled in the conversion process:
commit d23b30860e3e4c1bd7e12ed5a35d1b90e7fa214c
Author: Larry Wall <lwall@scalpel.netlabs.com>
Date:   Wed Jan 11 11:01:09 1995 -0800

   duplicate DESTROY
  
   In order to fix the duplicate DESTROY bug, I need to remove [the
   modified] lines from sv_setsv.
  
   Basically, copying an object shouldn't produce another object without an
   explicit blessing.  I'm not sure if this will break anything.  If Ilya
   and anyone else so inclined would apply this patch and see if it breaks
   anything related to overloading (or anything else object-oriented), I'd
   be much obliged.
  
   By the way, here's a test script for the duplicate DESTROY.  You'll note
   that it prints DESTROYED twice, once for , and once for .  I don't
   think an object should be considered an object unless viewed through
   a reference.  When accessed directly it should behave as a builtin type.
  
   #!./perl
  
    = new main;
    = '';
  
   sub new {
       my ;
       local /tmp/ssh-vaEzm16429/agent.16429 = bless $a;
       local  = ;      # Bogusly makes  an object.
       /tmp/ssh-vaEzm16429/agent.16429;
   }
  
   sub DESTROY {
       print "DESTROYEDn";
   }
  
   Larry

sv.c |    4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

Yes, it really is that weird. Check it out for yourself.

The Easy Git summary information from eg info has some interesting trivia:

Total commits: 36647
Number of contributors: 926
Number of files: 4439
Number of directories: 657
Biggest file size, in bytes: 4176496 (Changes5.8)
Commits: 31178

And there's a nice new POD document instructing how work with the Perl repository using Git: perlrepository.

In other news, maintenance release Perl 5.8.9 is out, expected to be the last 5.8.x release. The change log shows most bundled modules have been updated.

Finally, use Perl also notes that Booking.com is donating $50,000 to further Perl development, specifically Perl 5.10 development and maintenance. They're also hosting the new Git master repository. Thanks!


Comments

published by david@endpoint.com (David Christensen) on 2008-09-05 18:52:00 in the "git" category

It's no little secret that we here at End Point love and encourage the use of version control systems to generally make life easier both on ourselves as well as our clients.  While a full-fledged development environment is ideal for maintaining/developing new client code, not everyone has the time to be able to implement these quickly.

A situation we've sometimes found with clients editing/updating production data directly.  This can be through a variety of means; direct server access, scp/sftp, or web-based editing tools which save directly to the file system.

I recently implemented a script for a client who uses a web-based tool for managing their content in order to provide transparent version control.  While they are still making changes to their site directly, we now have the ability to roll back any changes on a file-by-file basis as they are created, modified, or deleted.

I wanted something that was: 1) fast, 2) useful, and 3) stayed out of the user's way.  I turned naturally to git.

In the user's account, I executed git init to create a new git repository in their home directory.  I then git added the relevant parts that we definitely wanted under version control.  This included all of the relevant static content, the app server files, and associated configuration: basically anything we might want to track changes to.

Finally, I determined the list of directories which we would like to automatically detect any newly created files.  These corresponded to the usual places where new content was apt to show up.  I codified the automatic update of the git repo in a script called git_heartbeat, which is called periodically from cron.

The basic listing for git_heartbeat:

#!/bin/bash
# automatically add any new files in these space-separated directories
AUTO_ADD_DIRS="catalogs/acme/pages htdocs"

# make sure we're in the proper git root directory
cd /home/acme

# actually add any newly created files in $AUTO_ADD_DIRS
find $AUTO_ADD_DIRS -print0 | xargs -0 git add

DATE=`date`

git commit -q -a -m "Acme Co git heartbeat - $DATE" > /dev/null

A couple notes:

  1. git commit -a takes care of the modification/deletion of any already tracked files.  The git add ensures that any newly created files are currently in the index and will be included with the commit.
  2. if no files have been added, removed, or deleted, no checkpoint is created.  This ensures that every commit in the log is meaningful and corresponds to an actual change to the site itself.
  3. Compared to other VCSs which keep metadata in each versioned subdirectory (such as Subversion), this approach stays out of the user's way; we don't have to worry about the user accidentally overwriting/deleting data in their upload directories and thus corrupting the repository.
  4. This approach is fast; it runs near instantaneously for thousands of files, so we could even push the cron interval to every minute if desired.  For our purposes, this system works great as is.
  5. Once the git tools are installed, there is no need to set up a central repository; git repos are very cheap to create/use and for a use case such as this, require little to no maintenance beyond the initial setup.

Areas of improvement/known issues:

  1. This script could definitely be improved insofar as providing more informative information as to which files were added/modified/deleted.  However, git's own tools can come in quite useful; for instance, git log --stat will show the files which each heartbeat commit affected.
  2. Since this is set up as a general cron job running every hour (the period is configurable, obviously), it does preclude extended stagings for non-heartbeat commits; basically, anything which takes longer than the heartbeat interval will be inadvertently committed.

Comments

published by noreply@blogger.com (Ethan Rowe) on 2008-07-30 05:09:00 in the "git-push" category

The ability to push and pull commits to/from remote repositories is obviously one of the great aspects of Git. However, if you're not careful with how you use git-push, you may find yourself in an embarrassing situation.

When you have multiple remote tracking branches within a Git repository, any bare git push invocation will attempt to push to all of those remote branches out. If you have commits stacked up that you weren't quite ready to push out, this can be somewhat unfortunate.

There are a variety of ways to accommodate this:

  • use local branches for your commits, only merging those commits into your remote tracking branches when you're ready to push them out;
  • push remote tracking branches out whenever you have something worth committing.

However, even with sensible branch management practices, it's worthwhile to know exactly what it is you're pushing. Therefore, if you want to have a sense of what you're potentially doing in calling a bare git push, always call it with the --dry-run option first. This will show you what a the push will send out, where the conflicts are, and so on, all without actually performing the push.

It is ultimately best, though, to understand the different ways of invoking git push so you can control things precisely and only change exactly what you want to change.

 git push some_repo some_branch

This will identify the ref named some_branch within your repository and push it out to the some_repo repository. If you are good about having your remote tracking branches use the same name as the source branch in the relevant remote ref, this is a simple, effective way of ensuring that you're pushing out one branch and only one branch. However, it does require that you know the purpose of some_repo; it doesn't do any magic for deciding what the "right" repository to push to is based on some_branch.

To be extremely precise, you can use a full refspec in your push call:

 git push some_repo local_branch:refs/heads/new_branch

This would take the local branch local_branch and push it out to within the remote ref identified by some_repo, but pushing it to the branch name new_branch within some_repo. This is a very useful invocation to understand in order to create new branches in bare repositories to be shared between developers/repositories. While both examples shown here will create the branch in some_repo if it does not already exist, the second example gives the programmer full control over the branch names.

If you're sharing your work with multiple developers/repositories, it can become unwieldy if not impossible to keep your tracking branch names consistent with source branch names in your remote refs. In which case, knowing these invocations of git push is an absolute necessity.

Check out the documentation on git push for a full explanation, and for an example of how to delete a branch in a remote ref. There are considerably more options for the command than what is explained here, but the refspec documentation can be a bit confusing to newcomers, in which case hopefully this discussion provides a bit more clarity. (Then again, perhaps it doesn't.)


Comments