Version Control System – Managing Your Projects¶
Note This part of the lecture note has been partially extracted and modified from Prof. Randy LeVeque’s class website on HPC.
In this class we will use git
for
- homework submission,
- code project submission,
- final coding project submission,
- electronic file transfers needed for the course work between you and the instructor.
See the below for more information on using git
and the repositories required for this
class. There are many other version control systems that are currently popular,
such as cvs, Subversion, Mercurial, and Bazaar.
Version control systems were originally developed to aid in the development
of large software projects with many authors working on inter-related
pieces. The basic idea is that you want to work on a file (one piece of the
code), you check it out of a repository, make changes, and then check it
back in when you’re satisfied. The repository keeps track of all changes
(and who made them) and can restore any previous version of a single file or
of the state of the whole project. It does not keep a full copy of every
file ever checked in, it keeps track of differences diff
between
versions, so if you check in a version that only has one line changed from
the previous version, only the characters that actually changed are kept
track of.
It sounds like a hassle to be checking files in and out, but there are a number of advantages to this system that make version control an extremely useful tool even for use with you own projects if you are the only one working on something. Once you get comfortable with it you may wonder how you ever lived without it.
Advantages¶
- You can revert to a previous version of a file if you decide the changes you made are incorrect. You can also easily compare different versions to see what changes you made, e.g. where a bug was introduced.
- If you use a computer program and some set of data to produce some results for a publication, you can check in exactly the code and data used. If you later want to modify the code or data to produce new results, as generally happens with computer programs, you still have access to the first version without having to archive a full copy of all files for every experiment you do. Working in this manner is crucial if you want to be able to later reproduce earlier results, as if often necessary if you need to tweak the plots for to some journal’s specifications or if a reader of your paper wants to know exactly what parameter choices you made to get a certain set of results. This is an important aspect of doing ‘reproducible research’, as should be required in science. If nothing else you can save yourself hours of headaches down the road trying to figure out how you got your own results.
- If you work on more than one machine, e.g. a desktop and laptop, version control systems are one way to keep your projects synched up between machines.
Two Types of Version Control Systems: SVN vs. Git¶
Client-server systems (e.g., CVS, SVN)¶
The original version control systems all used a client-server model, in which there is one computer that contains “the repository” and everyone else checks code into and out of that repository.
Systems such as CVS and Subversion (svn) have this form. An important feature of these systems is that only the repository has the full history of all changes made.
Please see articles on comparison between svn and git:
both of which give brief overviews on two different client-server systems.
Distributed systems (e.g., Git)¶
Git, and other systems such as Mercurial and Bazaar, use a distributed system in which there is not necessarily a “master repository’‘. Any working copy contains the full history of changes made to this copy.
The best way to get a feel for how git
works is to use it, for example
by following the instructions in the next section.
Remark Please also go watch the following Youtube video tutorials and a cheat sheet on git:
Git for the Class using the Git server on SOE servers¶
Instructions for cloning the class git repository¶
Note This part of the lecture note has been partially extracted from Prof. Randy LeVeque’s class website on Git and has been modified sligtly.
All of the materials for this class, including homework assignments, sample programs, and lecture note (html and pdf) are controled in a Git repository hosted at one of the SOE server, located at riverdance.soe.ucsc.edu. See a short instruction on how to set up your own git repository on one of the SOE server.
In addition to viewing the class materials and associated files via the link above, you can also view
changesets, issues, and update histories, etc. as well.
To obtain a copy of the class git repo, simply create one directory
where you want your copy to
reside, say, ams209
in your home directory,
move to the directory,
and then clone
the repository as follows:
$ mkdir ams209
$ cd ams209
$ git clone yourSOEaccount@riverdance.soe.ucsc.edu:/soe/dongwook/GitRepos/teaching/2017-2018/ams209 ./
If you fail to clone the repo with the following message:
$ fatal: Authentication failed
then this means that you haven’t been invited to join as an AMS209 group member to have an access to the course repo. In this case, please send me your email (preferably your ucsc email, rather than your personal email) so that I can send you out an invitation. You would like to use the same email when you create your own account on either SOE (see instruction) or Bitbucket (see Creating your own Bitbucket repository) for your own repo. You will need an SOE account too. If you don’t have any, please fill out the account request form, listing me as a sponsoring faculty.
There is no (white) space in the above git command line.
At this point, it is assumed you have git installed on your OS.
Otherwise, go visit download:git.
The clone statement will download the entire contents of the class repository as a new
subdirectory called ams209
.
Keep your cloned git repo updated/synced with the course repo¶
The files in the class repository remotely hosted in the SOE git server will continuously get changed and updated as the quarter progresses with new notes, sample programs, and homework sets, etc. In order to bring these changes over to your cloned copy, all you need to do is
$ cd ams209
$ git fetch origin
$ git merge origin/master
The
git fetch
command instructs
git to fetch any changes from
origin
,
which points to the remote repository (e.g., SOE servers, bitbucket,
or Github; riverdance SOE server in the current example) that you
originally cloned from. In the merge command, origin/master
refers
to the master branch in this repository (which is the only branch that
exists for this particular repository). This merges any changes
retrieved into the files in your current working directory.
Remark You need to be online to run the git fetch origin master
command
which will fetch all the up-to-date changes from the remote repository
origin
to your local working branch master
. Once you have done
git fetch
, your computer can be offline to proceed git merge
origin/master
to integrate those downloaded changes from
git fetch origin master
to your local master
branch.
The last two command can be combined as:
$ git pull origin master
or simply:
$ git pull
because origin
and master
are the defaults.
There are three terminologies above, origin
, master
, and
origin/master
. Let’s now give clear definitions of them:
origin
: a remote repository that exists over the network (e.g., SOE servers, Bitbucket, Github)master
: a local branch (e.g., your local working branch after cloning fromorigin
)origin/master
(or equivalently,remote/origin/master
): a remote branch that is a local copy of the branch named master on the remote named origin.
A couple of frequently used examples are below:
The syntax to push (or pull) commits made on your local branch to (or from) a remote repo:
$ git push <REMOTENAME> <BRANCHNAME> $ git pull <REMOTENAME> <BRANCHNAME>
For exmaple, to push (or pull) your local changes in your local
master
branch to (or from) the remotemaster
branch inorigin
repo:$ git push origin master (or simply, git push) $ git pull origin master (or simply, git pull)
The syntax to retrieve all the updates made to a remote repository (e.g.,
origin
) without merging those changes into your own branch:$ git fetch <REMOTENAME>
A default example is:
$ git fetch origin (or simply, git fetch)
The syntex to merge your local changes with changes made by others:
$ git merge <REMOTENAME>/<BRANCHNAME>
A default example is:
$ git merge origin/master
As we’ve seen already, the last two git commands combined together:
$ git fetch origin $ git merge origin/master
are equivalent to:
$ git pull origin master (or simply git pull)
Remark To read more about origin
, master
, and
origin/master
, please read the following articles:
article 1,
article 2.
Creating your own Bitbucket repository¶
In addition to using the class repository, you can create their own repository either on one of the SOE servers or on Bitbucket. As the first option, if you wish to set up your own repo on the SOE servers, please follow the instructions here.
Let’s take a look at the second option and see how you can set up your git repo on non-campus remote places such as Github, or Bitbucket. It is possible to use git for your own work without creating a repository on a hosted site (such as Github, Bitbucket, or SOE servers), but there are several reasons you would like to create a remote repo. In the rest, we are going to use Bitbucket as our non-campus remote host site choice:
- You should learn how to use Bitbucket for more than just pulling changes.
- You will use this repository to “submit” your solutions to homeworks. You will give the instructor and TA permission to clone your repository so that we can grade the homework (others will not be able to clone or view it unless you also give them permission).
- It is recommended that after the class ends you continue to use your repository as a way to back up your important work on another computer (with all the benefits of version control too!). At that point, of course, you can change the permissions so the instructor and TA no longer have access.
Below are the instructions for creating your own repository. Note that this should be a private repository so nobody can view or clone it unless you grant permission.
Anyone can create a free private repository on Bitbucket. Note that you can also create an unlimited number of public repositories free at Bitbucket, which you might want to do for open source software projects, or for classes like this one.
Remark To make free open access repositories that can be viewed by anyone, Github is recommended, which allows an unlimited number of open repositories and is widely used for open source projects.
Remark Please take a look at an article comparing Bitbucket and GitHub
Remark A good graphical tutorial is available at tutorial 1, and tutorial 2.
Getting used to your own local git repo¶
We will clone your Bitbucket repository and check that testfile.txt has been created and modified as directed below. If you use one of the SOE servers to host your remote repository, please follow the instructions and jump to Step 9 below.
On the machine you’re working on:
$ git config --global user.name "Your Name" $ git config --global user.email you@example.com
These will be used when you commit changes. If you don’t do this, you might get a warning message the first time you try to commit.
Go to http://bitbucket.org/ and click on “Sign up now” if you don’t already have an account.
Fill in the form, make sure you remember your username and password.
You should then be taken to your account. Click on “Create” next to “Repositories”.
You should now see a form where you can specify the name of a repository and a description. The repository name need not be the same as your user name (a single user might have several repositories). For example, the class repository is named ams209-fall-2016, owned by user dongwook159. To avoid confusion, you should probably not name your repository ams209-fall-2016.
You should stick to lower case letters and numbers in your repository name, e.g. ams209-ucsc or ams209-scicomp might be good choices. Upper case and special symbols such as underscore sometimes get modified by bitbucket and the repository name you try to paste into the homework submission form might not agree with what bitbucket expects.
Don’t name your repository homework1 because you will be using the same repository for other homeworks later in the quarter.
Make sure you click on “Private” at the bottom. Also turn “Issue tracking” and “Wiki” on if you wish to use these features.
Click on “Create repository”.
You should now see a page with instructions on how to clone your (currently empty) repository. In a Unix window, cd to the directory where you want your cloned copy to reside, and perform the clone by typing in the clone command shown. This will create a new directory with the same name as the repository.
You should now be able to cd into the directory this created.
The directory you are now in will appear empty if you simply do:
$ ls
But it will look slightly different if you try:
$ ls -a ./ ../ .git/
the -a option causes ls to list files starting with a dot, which are normally suppressed. See Basic Unix/Linux Commands for a discussion of ./ and ../. The directory .git is the directory that stores all the information about the contents of this directory and a complete history of every file and every change ever committed. You shouldn’t touch or modify the files in this directory because they are used by git to control versions, commit changes and their history, etc.
Add a new file to your directory:
$ cat > testfile.txt This is a new file with only two lines so far. ^D
The Unix cat command simply redirects everything you type on the following lines into a file called testfile.txt. This goes on until you type a <ctrl>-d (the 4th line in the example above). After typing <ctrl>-d you should get the Unix prompt back. Alternatively, you could create the file testfile.txt using your favorite text editor (see Items for the Class).
To see status of your folder, type:
$ git status -s
The response should be:
?? testfile.txt
The ?? means that this file is not under revision control. The -s flag results in this short status list. Leave it off for more information.
To put the file under revision control, type:
$ git add testfile.txt $ git status -s A testfile.txt
The A means it has been added. However, at this point git is not we have not yet taken a snapshot of this version of the file. To do so, type:
$ git commit -m "My first commit of a test file."
The string following the -m is a comment about this commit that may help you in general remember why you committed new or changed files.
You should get a response like:
[master 31cb6ed] My first commit of a test file. 1 file changed, 2 insertions(+) create mode 100644 testfile.txt
We can now see the status of our directory via:
$ git status # On branch master nothing to commit (working directory clean)
Alternatively, you can check the status of a single file with:
$ git status testfile.txt
You can get a list of all the commits you have made (only one so far) using:
$ git log commit 31cb6ed38310eed36f47d3d3aed769e03da540c9 Author: dongwook159 <dlee79@ucsc.edu> Date: Fri Sep 25 00:04:14 2016 -0700 My first commit of a test file.
The number 31cb6ed38310eed36f47d3d3aed769e03da540c9 above is the “name” of this commit and you can always get back to the state of your files as of this commit by using this number. You don’t have to remember it, you can use commands like git log to find it later.
Yes, this is a number... it is a 40 digit hexadecimal number, meaning it is in base 16 so in addition to 0, 1, 2, ..., 9, there are 6 more digits a, b, c, d, e, f representing 10 through 15. This number is almost certainly guaranteed to be unique among all commits you will ever do (or anyone has ever done, for that matter). It is computed based on the state of all the files in this snapshot as a SHA-1 Cryptographic hash function, called a SHA-1 Hash for short.
Modifying a file¶
Now let’s modify this file:
$ cat >> testfile.txt Adding a third line ^DHere the >> tells cat that we want to add on to the end of an existing file rather than creating a new one. (Or you can edit the file with your favorite editor and add this third line.)
Now try the following:
$ git status -s M testfile.txtThe M indicates this file has been modified relative to the most recent version that was committed.
To see what changes have been made, try:
$ git diff testfile.txtThis will produce something like:
diff --git a/testfile.txt b/testfile.txt index d80ef00..fe42584 100644 --- a/testfile.txt +++ b/testfile.txt @@ -1,2 +1,3 @@ This is a new file with only two lines so far +Adding a third lineThe + in front of the last line shows that it was added. The two lines before it are printed to show the context. If the file were longer, git diff would only print a few lines around any change to indicate the context.
Now let’s try to commit this changed file:
$ git commit -m "added a third line to the test file"This will fail! You should get a response like this:
# On branch master # Changes not staged for commit: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working # directory) # # modified: testfile.txt # no changes added to commit (use "git add" and/or "git commit -a")git is saying that the file testfile.txt is modified but that no files have been staged for this commit.
If you are used to Mercurial, git has an extra level of complexity (but also flexibility): you can choose which modified files will be included in the next commit. Since we only have one file, there will not be a commit unless we add this to the index of files staged for the next commit:
$ git add testfile.txtNote that the status is now:
$ git status -s M testfile.txtThis is different in a subtle way from what we saw before: The M is in the first column rather than the second, meaning it has been both modified and staged.
We can get more information if we leave off the -s flag:
$ git status # On branch master # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # modified: testfile.txt #Now testfile.txt is on the index of files staged for the next commit.
Now we can do the commit:
$ git commit -m "added a third line to the test file" [master 51918d7] added a third line to the test file 1 file changed, 1 insertion(+)Try doing:
$ git logor:
$ git log --graphnow and you should see something like:
commit 271bd14e5b8d68840e7e6481ad7e99e5708e50e7 Author: dongwook159 <dlee79@ucsc.edu> Date: Sun Sep 25 00:02:34 2016 -0700 added a third line to the test file commit 0c20925f98b5d76d0b973d25fdc78fd43941792e Author: dongwook159 <dlee79@ucsc.edu> Date: Sun Sep 25 00:01:25 2016 -0700 My first commit of a test file.If you want to revert your working directory back to the first snapshot you could do:
$ git checkout 31cb6ed383 Note: checking out '31cb6ed383'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. HEAD is now at 31cb6ed383... My first commit of a test file.Take a look at the file, it should be back to the state with only two lines. You are now in a situation called a
detached HEAD
state. To learn more about it and how to fix the situation, take a look at the following articles:Note that you don’t need the full SHA-1 hash code, the first few digits are enough to uniquely identify it.
You can go back to the most recent version with:
$ git checkout master Switched to branch 'master'We won’t discuss branches, but unless you create a new branch, the default name for your main branch is master and this checkout command just goes back to the most recent commit.
So far you have been using git to keep track of changes in your own directory, on your computer. None of these changes have been seen by Bitbucket, so if someone else cloned your repository from there, they would not see testfile.txt.
Now let’s push these changes back to the Bitbucket repository. First do:
$ git status
to make sure there are no changes that have not been committed. This should print nothing.
Now do:
$ git push -u origin master
This will prompt for your Bitbucket password and should then print something indicating that it has uploaded these two commits to your bitbucket repository.
Not only has it copied the 1 file over, it has added both changesets, so the entire history of your commits is now stored in the repository. If someone else clones the repository, they get the entire commit history and could revert to any previous version, for example.
To push future commits to bitbucket, you should only need to do:
$ git push
and by default it will push your master branch (the only branch you have, probably) to origin, which is the shorthand name for the place you originally cloned the repository from. To see where this actually points to:
$ git remote -v
This lists all remotes. By default there is only one, the place you cloned the repository from. (Or none if you had created a new repository using git init rather than cloning an existing one.)
Check that the file is in your Bitbucket repository: Go back to that web page for your repository and click on the “Source” tab at the top. It should display the files in your repository and show testfile.txt.
Now click on the “Commits” tab at the top. It should show that you made two commits and display the comments you added with the -m flag with each commit.
If you click on the hex-string for a commit, it will show the change set for this commit. What you should see is the file in its final state, with three lines. The third line should be highlighted in green, indicating that this line was added in this changeset. A line highlighted in red would indicate a line deleted in this changeset.
Rolling back to a previous state¶
Let’s take a look at the case where you do not like your last change you made to your repo, and you want to revert your repo status back to a previous state, say,
commit 1b82c21688befa80560807247594d73768d64f0a
(the current unsatisfied revision) –>commit c27d1bdf0098efe59aa25f809a719ce4fa910fef
(the previous revision you wish to roll back to)
In this case, there are two ways to roll back your repo to the previous state.
Firstly, if you do:
$ git reset --hard c27d1bdf0098
it will revert both the local code and the local history back to the
previous state. This might look ok but it would fail if you wished to push your
reverted repo to the remote public repo especially when there is someone else in your team
who already has the new history from the state
commit 1b82c21688befa80560807247594d73768d64f0a
.
Instead, if you do:
$ git reset --soft c27d1bdf0098
it will only revert your local files back to the previous state, leaving your history unchanged. In this case, you can successfully push your changes to the public repo without causing any conflicts in histories among your project team members.
In case you want to recover files that are deleted locally, you can do:
$ git ls-files -d | xargs git checkout --
Similarly, to recover modified files back to the previous states:
$ git ls-files -m | xargs git checkout --
See more examples at https://git-scm.com/docs/git-ls-files.
Remark Wait a minute... what is the command xargs above??? It is one of the most powerful linux commands, especially when combined with other commands. Please take a look at an article for more.
In some cases, you may wish to forget about all your local changes and want git to overwrite the entire local files. In general, if you have some changes in your local files that git sees as potential conflicts, git pull will not allow you to bring in the most recent updates committed to the git by others. Git will give you errors such as:
$ error: Your local changes to the following files would be
overwritten by merge:
or:
$ error: The following untracked working tree files would be overwritten by merge:
In this case if you don’t mind overwritting your local changes with whatever available in the git, you can do the following:
$ git fetch --all
$ git reset --hard origin/master
or you can combine the two in a single line command using &&:
$ git fetch --all && git reset --hard origin/master
Again, with this command, all of your local changes will be lost with or without –hard option, and therefore any local commits that haven’t been pushed will be lost. So, you do this if you know what you’re doing and trust the recent updates by pulling from the git repo.
Summary¶
The commands we discussed so far will give you a good start with git. As you’re getting used to use git you will learn that only a handful git commands are needed in many cases. This is in particular true unlesss you work on the project with many other project members over the network. In our class it will primarily be yourself only who will keep checking in and out changes to and from your central repo hosted in Bitbucket. Another frequent usage will be to sync your local repo with the course repo on a regular basis.
In this simple project enviroment, you will most likely need to use the following commands:
$ git status
$ git add
$ git commit
$ git push
$ git pull
Quick Exercise¶
Consider that you are working on your git repo, and suppose you just
created a new file, roster.txt
:
$ touch roster.txt
After editing the file, you check the status of roster.txt
:
$ git status -s ./
and you see:
$ ?? roster.txt
As you keep working on in this way, you find that there are bunch of
such newly created files having ??
marks, for instance:
$ ?? roster.txt
$ ?? roster1.txt
$ ?? roster2.txt
$ ?? roster3.txt
$ ?? roster4.txt
At this point you can either add them to the git by doing:
$ git add roster.txt roster1.txt roster2.txt roster3.txt roster4.txt
or delete them if you don’t need them anymore:
$ rm roster.txt roster1.txt roster2.txt roster3.txt roster4.txt
When doing this, you realize that you need to type (or copy) each and every file name of them one by one. Clearly this will be a very tedious task if there are millions of such files.
- Can you come up with a quick way of doing this by using linux commands?
- Can you make your own alias command by adding it to your
.bash_profile
or.bashrc
?