Have you ever:
You aren't the first person to have these problems, in fact there is a big set of tools to handle these issues. Welcome: version control.
This is a 10 minute introduction to git, one specific version control system. The tutorial has a very specific goal: to teach one the general concepts of version control, and enough to use git on their own, personal projects. It doesn't go into the full power of git or version control systems (that's the next tutorial).
Keep in mind: I can't teach you git, but I can give you ideas and your curiosity can teach you git.
After completing this tutorial, you should be able to:
Let's look at science without using VCS.
How often have you changed something and drastically changed results, and then you have to spend hours figuring out what you just did?
With the copying system system, at least you have some backups. But you have to copy it yourself, and you end up with code.v2.py, code.v3.py, code.final.py, code.submitted.py, code.submitted.final.py, and so on. And then, once you have all these files, you have to keep them organized, and getting any information out of them is a lot of work, too. You probably won't make backups often enough, either.
People often send files back and forth for papers (it could be used for code, but in that case it's probably easier to just work by yourself). You end up with the filename game again, and only one person can edit at once.
Pros use version control for everything: code, papers (LaTeX), websites, notes, etc. All my papers are in version control, and I can even make PDFs showing what changed between revisions. My website is in git, I record changes and "push" to the server to automaticaly update it. People have written git add-ons for distributed storage of large files (git-annex). These tutorials are stored in a repository.
This tutorial doesn't talk about how to install git! However, this is a very well documented thing, so you should have no problem doing it yourself. If you have a shared computer, it probably already has git installed. You can download it for almost any operating system here:
git is not just one program, there are also other graphical user interface (GUI) git clients, which can provide a nicer interface for certain tasks. In this tutorial, I focus on the concepts of git and the command line. At the end I will demonstrate some other programs.
Let's say you want to make a new git repository for your project. The git init command does this.
1 2 | $ cd /path/to/your/project/
$ git init
|
Everything is stored in the .git directory within your project.
Files are only updated when you run a git command.
The specific git repository format is simple but complicated, and each VCS works differently. We don't need to worry about it now.
Once you run git init, you won't notice any changes. The only thing that will happen is the creation of a .git directory.
No versions are saved, and your files are not touch, unless you run a git yourself. This makes git relatively safe. Nothing happens in the background without you knowing. If you delete the .git directory, it's as if it was never made.
Notice how easy this is. You should be doing it for every project.
Git doesn't automatically track anything. You have to tell it which files are important (to track them).
Use git add to make git see and track files.
1 | $ git add code1.py mod2.py README.txt
|
You have to use git add here, but git add has another use that I am not going to discuss in this tutorial. This is known as "staging" things to the "index". It can be useful, but for now it's an unnecessary complication that you'll learn about when reading other things.
You will usually run git status to check if you forgot anything (next section).
Check what is going on with git status
Provides a summary of modified files
1 2 3 4 5 6 7 8 | $ git status
# On branch master
# ...
# Changes to be committed:
#
# new file: README.txt
# new file: code1.py
# new file: mod2.py
|
git status shows what the current state is. You will see a section for "files staged for commit", "modified files", and "untracked files". "Untracked" is files you have not git add``ed yet. "Modified" is tracked files which you have edited since the last commit. "Staged" is files you run ``git add on but not yet committed. If you do this, you can use git diff --cached to see the diff.
1$ git commit
This is what you do on normal working days:
Make changes to your project
Use git status to see what is changed / what is added and waiting to be committed.
1 | $ git status
|
Check git diff to see what is changed (new) since the last commit.
Use git commit to make commits.
Why should you look at diffs? First, and most importantly, it lets you check yourself. You can see all changes you have made since your last checkpoint (commit), to see if it makes sense when put together. This may be a bit of extra work, but it is very important for good development practices.
Commit specific files
1 2 3 4 | $ git commit -a # commit all changes
$ git commit file1.txt calculate.py # commit specific files
$ git commit -p # commit specific changes (it will ask you)
$ git commit -p file1.txt # commit specific changes in specific file
|
You can commit in different ways
You will be asked for a commit message. (Advice later)
This is the last step. Before doing this, check status and diffs. After doing this, check status and make sure everything is clean.
We'll talk about how to structure and group changes into commits later.
To view history in git, run:
1 2 3 4 5 | $ git log
$ git log --oneline # abbreviated format
$ git log --patch # also show patches
$ git log --stat # also show stats
$ git log --oneline --graph --decorate --all # for later use
|
You will have to try each of these yourself to see what they do
COMMIT_HASH is the hexadecimal like 86d026287189acd341e7fb2ee88063375e2e1e73 or 86d026 (short). It's a unique identifier for everything git knows.
Show what changed since last commit
1 | $ git diff
|
Show what changed in any one commit
1 | $ git show COMMIT_HASH
|
Show what changed between any two commits
1 | $ git diff HASH1..HASH2
|
Show old version of a file:
1 | $ git show COMMIT_HASH:file1.txt
|
Everything today will be done via ssh on triton
To connect to triton, run:
1 2 | $ ssh USERNAME@triton.aalto.fi
$ cd $WRKDIR
|
If a not your own account, make a subdirectory and change to it
Git has a configuration file stored in your home directory at ~/.gitconfig. This has options that are shared among all of your repositories. This can make your life easier.
You should at least set your name and email address wherever you work.
On triton, copy and paste the following commands into a shell (don't paste these into the file yourself - git will do that itself). Don't forget to change the name/email to your own.
1 2 3 4 5 6 7 | $ git config --global user.name "Your Name"
$ git config --global user.email your.name@domain.fi
$ git config --global color.ui auto
$ git config --global alias.log1a "log --oneline --graph --decorate --all"
$ git config --global alias.st "status"
$ git config --global alias.cm "commit"
|
You can also set your preferred editor if you don't want to use vim
1 | $ git config --global core.editor "emacs"
|
Bonus: look at the git manual page for the config file and see the types of things that are available:
1 | $ man git config
|
In this exercise, we will go to a directory with a simple project, make a new git repository, and go through the steps needed to make a commit. Copy (cp -r) the prototype to your working directory. The base is in /triton/scip/git/git-1/.
Change to the directory
1 $ cd ~/scip/git/git-1/
Run git init to create a new repository in a directory.
1 2 | $ git init
Initialized empty Git repository in /home/darstr1/scip/git-1/.git/
|
Everything is stored in the .git directory within your project. Your files are never modified unless you run a git command that is supposed to.
You need to add all the files you are working on. git doesn't make any guesses: you could have temporary files, backups, and so on that you don't want tracked.
1 | $ git add code1.py mod2.py README.txt
|
Make your initial commit using git commit. This records all files that have previously been added. An editor will come up. Add the commit message of "Initial commit" at the top of the file and save. (Hint: to save in vim, the default editor, use ESC : w q ENTER)
1 | $ git commit
|
Check if your commit appears in the log
1 | $ git log
|
Get the OpenMP Examples repository. We will cover the clone command later, but for now just run this command in your working directory
1 | $ git clone https://github.com/OpenMP/Examples.git
|
You should now see a new Examples folder. Change into it.
Run git log to see recent changes. You should be able to see the description, author, and date. Try adding on a -p or --stat options to get more details.
Run git log README to see recent changes to only the README file. You can limit to certain files this way, and even track them if they have been renamed.
What if you want to see an old version of a file? You can see it using git show commit_id:filename:
1 | $ git show 542c10d:README
|
Often, you want to know more than just the changes. What happens when you want to know who and when a particular line was created? Well, there's a command for that (obviously). git annotate takes a file, and for every line, shows you who committed it, when it was committed, and the commit hash. You can use this to track down exactly when a bug was introduced, for example.
You should still be in the OpenMP-Examples directory from the previous exercise.
Run git annotate Title_Page.tex to see who has last changed each line. Who is the main author of this file? When was it last modified?
The long hexadecimal numbers are the version numbers. Try to figure out what these git diff commands do:
1 2 | $ git diff be603ae # same as git diff be603ae..HEAD
$ git diff a17ad37..be603ae
|
Make a file called .gitignore and put patterns of things you want to ignore.
*.o *.pyc *~
This makes the "git status" output more useful and you generally want to keep your ignore file up to date.
I should really emphasize how important the .gitignore file is! It seems minor, but clean "status" output will really make git much more usable. .gitignore can be checked into version control itself.
Due to time constraints and practicality, we will not go into branches and remotes in great detail.
Get a new repository
1 | % git clone [URL]
|
Send your changes to server
1 | $ git push
|
Get changes from server
1 | $ git pull
|
Commit everything before trying a merge!
You have two things shown: Your version and "their" version.
Read the instructions, git will tell you what to do.
Auto-merging file.txt CONFLICT (content): Merge conflict in file.txt Automatic merge failed; fix conflicts and then commit the result.
git diff and git status are your friends - still.
If you forget to finish the resolve, you will have problems later.
git puts markers put in the code on the exact lines of conflict:
<<<<<<< <lines you have written> ======= <lines they have written> >>>>>>>
git diff shows the conflicting lines
1 2 | $ git status # show the files that are unresolved and resolved.
$ git diff # show what is unresolved
|
You need to combine the two versions into one. Look and edit it.
Run the command it says to continue.
1 2 | $ git add FILE
$ git commit # remembers where you left off
|
Finish with git status and git log1a and git diff to make sure everything is there.
In this set of exercises, we will explore git pushing, pulling, and conflict resolution at a very high level. We aren't going to try to cover everything here, but we will see some of the major points. It is better to become familiar with the basics before going too deep into branches, remotes, and conflicts.
Go to http://github.com. Use the search at the top to find a project related to your field.
Go to the project page. Find the "HTTPS Clone URL" on the right side.
Clone the repository
1 | $ git clone https://github.com/igraph/igraph.git
|
Check out the log. How many total commits are there in this repository? (Hint: git log | grep ^commit | wc)
In this exercise, I have set up simple get repository, all ready to do a pull and make a conflict.
Change to the directory ~/scip/git/git-conflict/.
Run git log, git diff and git status just to make sure that everything is clean and you know what's going on (no untracked changes, no surprises).
Pull changes from the default remote:
1 | $ git pull
|
You will see a big note about a conflict:
Auto-merging code1.py CONFLICT (content): Merge conflict in code1.py Automatic merge failed; fix conflicts and then commit the result.
We will now resolve the conflict. Run git status to see the situation. It should (again) say that code1.py is the file with conflicts:
# Unmerged paths: # (use "git add/rm <file>..." as appropriate to mark resolution) # # both modified: code1.py
Look at git diff. This is an advanced diff with two columns with + signs indicating what comes from each side.
Open code1.py in an editor. You will see conflict marks:
<<<<<<< HEAD from scipy.stats import gamma ======= from scipy.stats import binom >>>>>>> 5de531032424ab6afe5576ee817e0ace9e9937d7
Between <<<<<<< and ======= is what you have done (in HEAD). Between ======= and >>>>>>> is what is changed on the server (in commit 5de5310).
You see that one side imported numpy, and the other imported scipy. There's no problem with doing both of these, but since they happened on the same line, git doesn't try to guess how to put them together. A more complicated case would be edits to the same line.
To resolve this conflict, we need to import both gamma and binom from scipy.stats. Remove the two parts, and the conflict markers, and make one line having all changes together. The top of the file should look like this after you do the resolution:
... import scipy from scipy.stats import binom, gamma import scipy.linalg import numpy
We will check status to make sure things are OK. Run git diff and see the added and changed lines. This form of diff is particularly useful:
- from scipy.stats import gamma -from scipy.stats import binom ++from scipy.stats import binom, gamma
Run git add code1.py to tell git that we are done resolving this conflict and prepare it for committing. Run git status before and after this to see what changes. (Hint: it should change from Unmerged paths: to Changes to be committed:.
Run git commit. An editor will open with a pre-filled commit message (it remembers that you were doing a merge) if you want. You can adjust this if needed, for example if you need to explain how you reconciled two opposing features. Since there is nothing to add, just save and close.
Run git log and you should see that all changes are recorded, as well as the merge commit.
In this exercise, you will clone a repository from github, add and edit some files, and send the change back. This is a full cycle of what you would do if you are contributing to a real project.
First, clone the repository. The repository you will be cloning is that of this lecture itself. Clone using the git clone command. This makes a local copy of a repository on some server.
1 | git clone https://github.com/rkdarst/scicomp.git
|
You will now find a new directory scicomp in your current directory. Change into it.
1 | cd scicomp/
|
Now, you need to find some change to make. There are several options here. You can make a serious change that you would like to contribute to this talk, and I will probably actually use it. Or, you can just make some random test edits for your own practice. Go edit the files. This talk is at tut/scip/git.rst.
Commit the changes. Use a good commit message, since someone else will be reading it to judge your commit!
Now, you have to get your commits from your computer to me. Since you don't have rights to push directly to the repository, you will need to send me a patch. You could open a pull request on github, but that is beyond the scope of this tutorial. To do this, we will use git format-patch. We use do it with one argument of "the last upstream commit". We can use the keyword origin/master for this.
1 2 | $ git format-patch origin/master
0001-COMMIT_TITLE.patch
|
You can look at the .patch file to see the format. It is formatted like a raw email.
Now, you need to get this file (the new .patch) to me. Command line email isn't set up on triton, so you should copy and attach this file to an email to me (rkd@zgib.net). You could copy and paste it directly into an email, but certain mail programs can mess up whitespace and line wrapping, which will cause the patch to not apply cleanly which means it is hard to use.
Double-bonus: Research the "pull request" model of contributions. Github has good documentation on this. Emailing patches is a little bit old-fashioned, but still always works. Using the power of project hosting sites, you can more easily send changes, discuss them, and get them merged.
How often should you commit? Early and often!
Daily model:
Patch model
Commit messages: Try to make something useful but don't think too much.
"Add support for filtering by degrees"
"Daily work"
"Daily work, compare with power law model"
General format is: one line summary, blank line, then the notes (example from networkx)
add dynamic Graph surport to gexf (1.2draft) 1. can save dynamic Graph as gexf (1.2draft) format 2. add timeformat(date/double/integer) attribute to graph 3. add 'start' and 'end' attribute to edge
The commands needed, as we know them now.
These are the extra commands we have learned today.
Here are some ideas for independent study that you need to try yourself:
If you need to revert to a former version of the file:
1 2 3 | $ git checkout VERSION -- FILENAME(s)
$ git checkout -p VERSION -- FILENAME(s) # revert only certain parts
$ git reset FILENAME(s) # run this afterwards to reset the index - eliminate a complexity we haven't discussed
|
If you want to go back to an old version and lose recent commits:
1 2 | $ git reset COMMIT_HASH # doesn't lose file changes
$ git reset COMMIT_HASH --hard # obliterates changes in working directory - dangerous!
|
There are many git GUIs, including
1 2 | $ gitk
$ git-cola
|