Git Version Control System – Managing Your Coursework

In this class we will use git for

  • homework submission,
  • code project submission,
  • final coding project submission,
  • electronic file transfers needed for the course work between you and the instructor.

See the below for more information on using git and the repositories required for this class.

Version control systems were originally developed to aid in the development of large software projects with many authors working on inter-related pieces. The basic idea is that you want to work on a file (one piece of the code), you check it out of a repository, make changes, and then check it back in when you’re satisfied. The repository keeps track of all changes (and who made them) and can restore any previous version of a single file or of the state of the whole project. It does not keep a full copy of every file ever checked in, it keeps track of differences diff between versions, so if you check in a version that only has one line changed from the previous version, only the characters that actually changed are kept track of.

It sounds like a hassle to be checking files in and out, but there are a number of advantages to this system that make version control an extremely useful tool even for use with you own projects if you are the only one working on something. Once you get comfortable with it you may wonder how you ever lived without it.

Advantages

  • You can revert to a previous version of a file if you decide the changes you made are incorrect. You can also easily compare different versions to see what changes you made, e.g. where a bug was introduced.
  • If you use a computer program and some set of data to produce some results for a publication, you can check in exactly the code and data used. If you later want to modify the code or data to produce new results, as generally happens with computer programs, you still have access to the first version without having to archive a full copy of all files for every experiment you do. Working in this manner is crucial if you want to be able to later reproduce earlier results, as if often necessary if you need to tweak the plots for to some journal’s specifications or if a reader of your paper wants to know exactly what parameter choices you made to get a certain set of results. This is an important aspect of doing ‘reproducible research’, as should be required in science. If nothing else you can save yourself hours of headaches down the road trying to figure out how you got your own results.
  • If you work on more than one machine, e.g. a desktop and laptop, version control systems are one way to keep your projects synched up between machines.

Distributed systems (e.g., Git)

Git, and other systems such as Mercurial and Bazaar, use a distributed system in which there is not necessarily a “master repository’‘. Any working copy contains the full history of changes made to this copy.

The best way to get a feel for how git works is to use it, for example by following the instructions in the next section.

Remark Please also go watch the following Youtube video tutorials on git:

Git for the Class using Bitbucket

Instructions for cloning the class git repository

All of the materials for this class, including homework assignments, sample programs, and lecture note (html and pdf) are controled in a Git repository hosted at Bitbucket, located at ams 213B git.

In addition to viewing the class materials and associated files via the link above, you can also view changesets, issues, and update histories, etc. as well. To obtain a copy of the class git repo, simply create one directory where you want your copy to reside, say, ams213B in your home directory, move to the directory, and then clone the repository as follows:

$ mkdir ams213B
$ cd ams213B
$ git clone https://bitbucket.org/dongwook159/ams213b_spring2016.git ./

If you fail to clone the repo with the following message:

$ fatal: Authentication failed

then this means that you haven’t been invited to join as a team member to have an access to the course repo. In this case, please send me your email (preferably your ucsc email, rather than your personal email) so that I can send you out an invitation. You are going to use the same email when you create your own Bitbucket account for your own later (see Creating your own Bitbucket repository).

There is no (white) space in the above git command line. At this point, it is assumed you have git installed on your OS. Otherwise, go visit download:git. The clone statement will download the entire contents of the class repository as a new subdirectory called ams213B.

Keep your cloned git repo updated/synced with the course repo

The files in the class repository remotely hosted in the Bitbucket website will continuously get changed and updated as the quarter progresses with new notes, sample programs, and homework sets, etc. In order to bring these changes over to your cloned copy, all you need to do is

$ cd ams213B
$ git fetch origin
$ git merge origin/master

The git fetch command instructs git to fetch any changes from origin, which points to the remote bitbucket repository that you originally cloned from. In the merge command, `origin/master’ refers to the master branch in this repository (which is the only branch that exists for this particular repository). This merges any changes retrieved into the files in your current working directory.

The last two command can be combined as:

$ git pull origin master

or simply:

$ git pull

because origin and master are the defaults.

Creating your own Bitbucket repository

In addition to using the class repository, you are also required to create their own repository on Bitbucket. It is possible to use git for your own work without creating a repository on a hosted site such as Bitbucket, but there are several reasons for this requirement:

  • You are going to learn how to use Bitbucket for more than just pulling changes.
  • You will use this repository to “submit” your solutions to homework sets. You will give the instructor a read/write (or admin) permission to clone your repository so that the instructor can grade the homework (others will not be able to clone or view it unless you also give them permission).
  • It is recommended that after the class ends you continue to use your repository as a way to back up your important work on another computer (with all the benefits of version control too!). At that point, of course, you can change the permissions so the instructor no longer has an access to your repository.

Below are the instructions for creating your own repository. Note that this should be a private repository so nobody can view or clone it unless you grant permission.

Anyone can create a free private repository on Bitbucket. Note that you can also create an unlimited number of public repositories free at Bitbucket, which you might want to do for open source software projects, or for classes like this one.

Remark To make free open access repositories that can be viewed by anyone, GitHub is recommended, which allows an unlimited number of open repositories and is widely used for open source projects.)

Remark Please take a look at an article comparing Bitbucket and GitHub

Getting used to your own local git repo

We will clone your repository and check that testfile.txt has been created and modified as directed below.

  1. On the machine you’re working on:

    $ git config --global user.name "Your Name"
    $ git config --global user.email you@example.com
    

    These will be used when you commit changes. If you don’t do this, you might get a warning message the first time you try to commit.

  2. Go to http://bitbucket.org/ and click on “Sign up now” if you don’t already have an account.

  3. Fill in the form, make sure you remember your username and password.

  4. You should then be taken to your account. Click on “Create” under the “Repositories” tab.

  5. You should now see a form where you can specify the name of a repository and a description. The repository name need not be the same as your user name (a single user might have several repositories). For example, the class repository is named ams213b_spring2016, owned by user dongwook159. To avoid confusion, you should probably not name your repository ams213b_spring2016.

    You may want to stick to lower case letters and numbers in your repository name, e.g. ams213b-yourname might be a good choice. Upper case and special symbols such as underscore sometimes get modified by bitbucket and the repository name you try to paste into the homework submission form might not agree with what bitbucket expects.

    Don’t name your repository homework1 because you will be using the same repository for other homeworks later in the quarter.

  6. Go to “Settings” and click on “This is a private repository” in Access level. Also choose “Allow only private forks” in Forking. You can turn on “Issue tracker settings” and “Wiki settings” as well if you wish to use these features. Click “Save repository details” once you’re done.

  7. Voila, you now setup your own git repo over the network.

  8. You should now see a page with instructions on how to clone your (currently empty) repository. By cloning your remote git repo to your local machine, you can add/edit files locally, and push them to your remote git repo. To do this, click “Clone” on the top left column and copy HTTPS and repeat the cloning process:

    $ mkdir ams213B_yourDirName
    $ cd ams213B_yourDirName
    $ git clone https://youraccount@bitbucket.org/youraccount/ams213b-yourname ./
    

    In a Unix window, cd to the directory where you want your cloned copy to reside, and perform the clone by typing in the clone command shown. This will create a new directory with the same name as the repository.

  9. You should now be able to cd into the directory this created.

  10. The directory you are now in will appear empty if you simply do:

    $ ls
    

    But it will look slightly different if you try:

    $ ls -a
    ./  ../  .git/
    

    the -a option causes ls to list files starting with a dot, which are normally suppressed. The directory .git is the directory that stores all the information about the contents of this directory and a complete history of every file and every change ever committed. You shouldn’t touch or modify the files in this directory, they are used by git.

  11. Add a new file to your directory:

    $ cat > testfile.txt
    This is a new file
    with only two lines so far.
    ^D
    

    The Unix cat command simply redirects everything you type on the following lines into a file called testfile.txt. This goes on until you type a <ctrl>-d (the 4th line in the example above). After typing <ctrl>-d you should get the Unix prompt back. Alternatively, you could create the file testfile.txt using your favorite text editor.

  12. To see status of your folder, type:

    $ git status -s
    

    The response should be:

    ?? testfile.txt
    

    The ?? means that this file is not under revision control. The -s flag results in this short status list. Leave it off for more information.

    To put the file under revision control, type:

    $ git add testfile.txt
    $ git status -s
    A  testfile.txt
    

    The A means it has been added. However, at this point git is not we have not yet taken a snapshot of this version of the file. To do so, type:

    $ git commit -m "My first commit of a test file."
    

    The string following the -m is a comment about this commit that may help you in general remember why you committed new or changed files.

    You should get a response like:

    [master 31cb6ed] My first commit of a test file.
    1 file changed, 2 insertions(+)
    create mode 100644 testfile.txt
    

    We can now see the status of our directory via:

    $ git status
    # On branch master
    nothing to commit (working directory clean)
    

    Alternatively, you can check the status of a single file with:

    $ git status testfile.txt
    

    You can get a list of all the commits you have made (only one so far) using:

    $ git log
    
    commit 31cb6ed38310eed36f47d3d3aed769e03da540c9
    Author: dongwook159 <dlee79@ucsc.edu>
    Date:   Fri Sep 25 00:04:14 2015 -0700
    
    My first commit of a test file.
    

    The number 31cb6ed38310eed36f47d3d3aed769e03da540c9 above is the “name” of this commit and you can always get back to the state of your files as of this commit by using this number. You don’t have to remember it, you can use commands like git log to find it later.

    Yes, this is a number... it is a 40 digit hexadecimal number, meaning it is in base 16 so in addition to 0, 1, 2, ..., 9, there are 6 more digits a, b, c, d, e, f representing 10 through 15. This number is almost certainly guaranteed to be unique among all commits you will ever do (or anyone has ever done, for that matter). It is computed based on the state of all the files in this snapshot as a SHA-1 Cryptographic hash function, called a SHA-1 Hash for short.

Modifying a file

Now let’s modify this file:

$ cat >> testfile.txt
Adding a third line
^D

Here the >> tells cat that we want to add on to the end of an existing file rather than creating a new one. (Or you can edit the file with your favorite editor and add this third line.)

Now try the following:

$ git status -s
 M testfile.txt

The M indicates this file has been modified relative to the most recent version that was committed.

To see what changes have been made, try:

$ git diff testfile.txt

This will produce something like:

diff --git a/testfile.txt b/testfile.txt
index d80ef00..fe42584 100644
--- a/testfile.txt
+++ b/testfile.txt
@@ -1,2 +1,3 @@
 This is a new file
 with only two lines so far
+Adding a third line

The + in front of the last line shows that it was added. The two lines before it are printed to show the context. If the file were longer, git diff would only print a few lines around any change to indicate the context.

Now let’s try to commit this changed file:

$ git commit -m "added a third line to the test file"

This will fail! You should get a response like this:

# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working
#   directory)
#
#   modified:   testfile.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

git is saying that the file testfile.txt is modified but that no files have been staged for this commit.

If you are used to Mercurial, git has an extra level of complexity (but also flexibility): you can choose which modified files will be included in the next commit. Since we only have one file, there will not be a commit unless we add this to the index of files staged for the next commit:

$ git add testfile.txt

Note that the status is now:

$ git status -s
M  testfile.txt

This is different in a subtle way from what we saw before: The M is in the first column rather than the second, meaning it has been both modified and staged.

We can get more information if we leave off the -s flag:

$ git status

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   testfile.txt
#

Now testfile.txt is on the index of files staged for the next commit.

Now we can do the commit:

$ git commit -m "added a third line to the test file"

[master 51918d7] added a third line to the test file
 1 file changed, 1 insertion(+)

Try doing git log now and you should see something like:

commit 271bd14e5b8d68840e7e6481ad7e99e5708e50e7
Author: dongwook159 <dlee79@ucsc.edu>
Date:   Fri Sep 25 00:02:34 2015 -0700

       added a third line to the test file

       commit 0c20925f98b5d76d0b973d25fdc78fd43941792e
       Author: dongwook159 <dlee79@ucsc.edu>
       Date:   Fri Sep 25 00:01:25 2015 -0700

       My first commit of a test file.

If you want to revert your working directory back to the first snapshot you could do:

$ git checkout  31cb6ed383
Note: checking out '31cb6ed383'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

HEAD is now at 31cb6ed383... My first commit of a test file.

Take a look at the file, it should be back to the state with only two lines.

Note that you don’t need the full SHA-1 hash code, the first few digits are enough to uniquely identify it.

You can go back to the most recent version with:

$ git checkout master
Switched to branch 'master'

We won’t discuss branches, but unless you create a new branch, the default name for your main branch is master and this checkout command just goes back to the most recent commit.

  1. So far you have been using git to keep track of changes in your own directory, on your computer. None of these changes have been seen by Bitbucket, so if someone else cloned your repository from there, they would not see testfile.txt.

    Now let’s push these changes back to the Bitbucket repository:

    First do::
    
      $ git status
    

    to make sure there are no changes that have not been committed. This should print nothing.

    Now do:

    $ git push -u origin master
    

    This will prompt for your Bitbucket password and should then print something indicating that it has uploaded these two commits to your bitbucket repository.

    Not only has it copied the 1 file over, it has added both changesets, so the entire history of your commits is now stored in the repository. If someone else clones the repository, they get the entire commit history and could revert to any previous version, for example.

    To push future commits to bitbucket, you should only need to do:

    $ git push
    

    and by default it will push your master branch (the only branch you have, probably) to origin, which is the shorthand name for the place you originally cloned the repository from. To see where this actually points to:

    $ git remote -v
    

    This lists all remotes. By default there is only one, the place you cloned the repository from. (Or none if you had created a new repository using git init rather than cloning an existing one.)

  2. Check that the file is in your Bitbucket repository: Go back to that web page for your repository and click on the “Source” tab at the top. It should display the files in your repository and show testfile.txt.

    Now click on the “Commits” tab at the top. It should show that you made two commits and display the comments you added with the -m flag with each commit.

    If you click on the hex-string for a commit, it will show the change set for this commit. What you should see is the file in its final state, with three lines. The third line should be highlighted in green, indicating that this line was added in this changeset. A line highlighted in red would indicate a line deleted in this changeset.

Rolling back to a previous state

Let’s take a look at the case where you do not like your last change you made to your repo, and you want to revert your repo status back to a previous state, say,

  • commit 1b82c21688befa80560807247594d73768d64f0a (the current unsatisfied revision) –> commit c27d1bdf0098efe59aa25f809a719ce4fa910fef (the previous revision you wish to roll back to)

In this case, there are two ways to roll back your repo to the previous state.

Firstly, if you do:

$ git reset --hard c27d1bdf0098

it will revert both the local code and the local history back to the previous state. This might look ok but it would fail if you wished to push your reverted repo to the remote public repo especially when there is someone else in your team who already has the new history from the state commit 1b82c21688befa80560807247594d73768d64f0a.

Instead, if you do:

$ git reset --soft c27d1bdf0098

it will only revert your local files back to the previous state, leaving your history unchanged. In this case, you can successfully push your changes to the public repo without causing any conflicts in histories among your project team members.

In case you want to recover files that are deleted locally, you can do:

$ git ls-files -d | xagrs git checkout --

Similarly, to recover modified files back to the previous states:

$ git ls-files -m | xagrs git checkout --

See more examples at https://git-scm.com/docs/git-ls-files.

Summary

The commands we discussed so far will give you a good start with git. As you’re getting used to use git you will learn that only a handful git commands are needed in many cases. This is in particular true unlesss you work on the project with many other project members over the network. In our class it will primarily be yourself only who will keep checking in and out changes to and from your central repo hosted in Bitbucket. Another frequent usage will be to sync your local repo with the course repo on a regular basis.

In this simple project enviroment, you will most likely need to use the following commands:

$ git status
$ git add
$ git commit
$ git push
$ git pull