Have you ever:
You aren't the first person to have these problems, in fact there is a big set of tools to handle these issues. Welcome: version control.
This is a 10 minute introduction to git, one specific version control system. The tutorial has a very specific goal: to teach one the general concepts of version control, and enough to use git on their own, personal projects. It doesn't go into the full power of git or version control systems (that's the next tutorial).
Keep in mind: I can't teach you git, but I can give you ideas and your curiosity can teach you git.
After completing this tutorial, you should be able to:
Let's look at science without using VCS.
This is the most basic way of working. Once you change something, the old version is gone and you can't get it back.
The strategy:
Advantages:
Disadvantages:
With this system, at least you have some backups. But you have to copy it yourself, and you end up with code.v2.py, code.v3.py, code.final.py, code.submitted.py, code.submitted.final.py, and so on. And then, once you have all these files, you have to keep them organized, and getting any information out of them is a lot of work, too. You probably won't make backups often enough, either.
I've seen this used for papers often (it could be used for code, but in that case it's probably easier to just work by yourself). You end up with the filename game again, and only one person can edit at once.
This shows some git command line options that show you very useful information. In the next part, we'll talk about how to actually put this information into git.
1 2 3 4 5 6 7 8 9 10 | diff --git a/support/algorithms.py b/support/algorithms.py
index d96131b..6114c3b 100644
--- a/support/algorithms.py
+++ b/support/algorithms.py
@@ -131,7 +131,7 @@
weighted = False
- def __init__(self, g, dir=None, basename=None, **kwargs):
+ def __init__(self, g, dir=None, basename=None, cache=None, **kwargs):
"""
Arguments:
|
What is the point of diffs? Let's say you have tens of thousands of lines of code, and you make a few changes. In order to comprehend what has changed, looking at the files themselves is too much. Instead, we have a tool, the diff, that can direct our attention only to the important parts.
The terms diff and patch are mostly interchangeable (Incidentally, diff is a program that makes diffs out of two files, patch is a program that takes a file and a diff and produces the other file). They are one of the fundamental building blocks of programming, so you will see them often.
Running git diff tells you the changes made since the last commit (save point), but you can get other diffs too.
Here's how to read it:
First two lines provide some general metadata - exactly what this part is about. The details aren't important now.
Next, we see --- FILENAME and +++ FILENAME, saying what file this diff is of.
Then, we see @@ -131,7 +131,7 @@, which says what lines this diff relates to.
Then, we see the diff itself. Each line beginning with - is a line removal, and each line with beginning with + is a line addition. For a line that is changed (like this example), you see both - and + together.
Before and after the - and +, you have context, which are unchanged lines. You need a few lines before and after in order to properly understand what is changed.
There are other diff formats. There is a word diff that is based on words instead of lines. It can be very useful sometimes (and what I look at more often than regular diffs).
The log also includes a commit message, which can explain to others (or yourself) what was going on at that time. This is especially useful for multi-person projects. There are many variations on these commands, including git log -p to show the diffs also, and git log --stat to show what files are changing.
1 2 3 4 5 | 114175ac (Richard Darst 2014-01-08 15:04:10 +0200 804) args = (_get_file(self._binary),
114175ac (Richard Darst 2014-01-08 15:04:10 +0200 805) "-seed", str(self._randseed),
e9a83ab3 (Richard Darst 2013-11-02 16:52:16 +0200 806) "-w" if self.weighted else '-uw', #unweighted or weighted
e9a83ab3 (Richard Darst 2013-11-02 16:52:16 +0200 807) "-f", self.graphfile,
8085f076 (Richard Darst 2014-01-23 19:07:45 +0200 808) )
|
This command is used less often, but when you need it, it's very helpful.
Let's say that you just found a bug, a bad one. You need to know immediately how many results are wrong: Are the plots you showed your boss one week ago wrong? What about those from one month ago? If you are making lots of changes, or working with several people, this may not be obvious.
If you can track down the bug to a few lines, the annotate command will tell you the change ID (more on this later), who made the change, , when the change was made, the line number, and the actual code. You can use the change ID to get further information on the change.
This looks a bit ugly, but graphical user interfaces make it much more convenient (and there are many).
Of course, you can view these older versions, too: git show COMMIT-ID:filename.py
Pros use version control for everything: code, papers (LaTeX), websites, notes, etc. All my papers are in version control, and I can even make PDFs showing what changed between revisions. My website is in git, I record changes and "push" to the server to automaticaly update it. People have written git add-ons for distributed storage of large files (git-annex). These tutorials are stored in a repository.
This tutorial doesn't talk about how to install git! However, this is a very well documented thing, so you should have no problem doing it yourself. If you have a shared computer, it probably already has git installed. You can download it for almost any operating system here:
git is not just one program, there are also other graphical user interface (GUI) git clients, which can provide a nicer interface for certain tasks. In this tutorial, I focus on the concepts of git and the command line. At the end I will demonstrate some other programs.
1 2 3 | $ git config --global user.name "Your Name"
$ git config --global user.email your.name@domain.fi
$ git config --global color.ui auto
|
These store some information in the file $HOME/.gitconfig. Your name and email are used in the commit logs. We'll be using the git config file more, later.
Let's say you want to make a new git repository for your project. The git init command does this.
1 2 | $ cd /path/to/your/project/
$ git init
|
Everything is stored in the .git directory within your project.
Files are only updated when you run a git command.
The specific git repository format is simple but complicated, and each VCS works differently. We don't need to worry about it now.
Once you run git init, you won't notice any changes. The only thing that will happen is the creation of a .git directory.
No versions are saved, and your files are not touch, unless you run a git yourself. This makes git relatively safe. Nothing happens in the background without you knowing. If you delete the .git directory, it's as if it was never made.
Notice how easy this is. You should be doing it for every project.
Git doesn't automatically track anything. You have to tell it which files are important (to track them).
Use git add to make git see and track files.
1 2 | $ git add *.py
$ git add file1.txt dir/file2.txt
|
You have to use git add here, but git add has another use that I am not going to discuss in this tutorial. This is known as "staging" things to the "index". It can be useful, but for now it's an unnecessary complication that you'll learn about when reading other things.
You will usually run git status to check if you forgot anything (next section).
Check what is going on by typing
1 | $ git status
|
After you see everything, run
1 | $ git commit
|
You will be prompted for a message. "Initial commit" is traditional.
git status shows what the current state is. You will see a section for "files staged for commit", "modified files", and "untracked files". "Untracked" is files you have not git add``ed yet. "Modified" is tracked files which you have edited since the last commit. "Staged" is files you run ``git add on but not yet committed. If you do this, you can use git diff --cached to see the diff.
This is what you do on normal working days:
Make changes to your project
Use git status to see what is changed / what is added and waiting to be committed.
1 | $ git status
|
Make a file called .gitignore and put patterns of things you want to ignore.
*.o *.pyc *~
This makes the "git status" output more useful and you generally want to keep your ignore file up to date.
Status tells you what you have edited since the last commit. If it shows nothing, then you can be happy: everything is committed.
I should really emphasize how important the .gitignore file is! It seems minor, but clean "status" output will really make git much more usable. .gitignore can be checked into version control itself. You can also use a ~/.gitignore file in your home directory.
Check diffs to see the exact changes.
1 2 | $ git diff
$ git diff --word-diff=color
|
This shows you the exact edits you have made since the last commit.
Gives you another chance to check yourself.
Why should you look at diffs? First, and most importantly, it lets you check yourself. You can see all changes you have made since your last checkpoint (commit), to see if it makes sense when put together. This may be a bit of extra work, but it is very important for good development practices.
Commit specific files
1 2 3 4 | $ git commit -a # commit all changes
$ git commit file1.txt calculate.py # commit specific files
$ git commit -p # commit specific changes (it will ask you)
$ git commit -p file1.txt # commit specific changes in specific file
|
You can commit in different ways
You will be asked for a commit message. (Advice later)
This is the last step. Before doing this, check status and diffs. After doing this, check status and make sure everything is clean.
We'll talk about how to structure and group changes into commits later.
To view history in git, run:
1 2 3 4 5 | $ git log
$ git log --oneline # abbreviated format
$ git log --patch # also show patches
$ git log --stat # also show stats
$ git log --oneline --graph --decorate --all # for later use
|
You will have to try each of these yourself to see what they do
COMMIT_HASH is the hexadecimal like 86d026287189acd341e7fb2ee88063375e2e1e73 or 86d026 (short). It's a unique identifier for everything git knows.
Show what changed since last commit
1 | $ git diff
|
Show what changed in any one commit
1 | $ git show COMMIT_HASH
|
Show what changed between any two commits
1 | $ git diff HASH1..HASH2
|
Show old version of a file:
1 | $ git show COMMIT_HASH:file1.txt
|
How often should you commit? Early and often!
Daily model:
Patch model
Commit messages: Try to make something useful but don't think too much.
"Add support for filtering by degrees"
"Daily work"
"Daily work, compare with power law model"
General format is: one line summary, blank line, then the notes (example from networkx)
add dynamic Graph surport to gexf (1.2draft) 1. can save dynamic Graph as gexf (1.2draft) format 2. add timeformat(date/double/integer) attribute to graph 3. add 'start' and 'end' attribute to edge
The commands needed, as we know them now.
Here are some ideas for independent study that you need to try yourself:
If you need to revert to a former version of the file:
1 2 3 | $ git checkout VERSION -- FILENAME(s)
$ git checkout -p VERSION -- FILENAME(s) # revert only certain parts
$ git reset FILENAME(s) # run this afterwards to reset the index - eliminate a complexity we haven't discussed
|
If you want to go back to an old version and lose recent commits:
1 2 | $ git reset COMMIT_HASH # doesn't lose file changes
$ git reset COMMIT_HASH --hard # obliterates changes in working directory - dangerous!
|
There are many git GUIs, including
1 2 | $ gitk
$ git-cola
|