(from phdcomics.com)
You know that great paper you just wrote, you were done with it, so you called it “final”. And then, you sent it to your advisor for her last comments. So you renamed it “final_thisone”. A few days passed, and you realized there was a typo in your figure, and you renamed it “final_thisone_really”. A couple of weeks pass, you get comments back from the reviewer, you start changing it and you renamed your file “final_submitted”. And then… your advisor think you should rewrite the conclusion and the file become “final_submitted_really”.
A year pass, you are mentoring a new undergrad in the lab, and she asks you for your final document so she can look at your methods… You open your folder, and mesmerized by the number of final labelled final, you forget which one if the correct one. Sounds familiar?
Version control is a category of software that keep track of changes to your files for you. It allows you to:
Version control is designed to manage source code. It works best when tracking files stored as plain text and not encoded as binary (e.g., Word documents). It is the “lab notebook” of the digital world. Because of its powerful and useful features, it is being used for many different project and can be used to track data sets (small), text (if you haven’t written too much of your dissertation, you should use version control to keep track of it), images, and much more.
Git is a particular implementation of version control. It’s really powerful and has many features but we will only scratch the surface.
When you start a Git project on your computer, you are going to store the entire history of the project locally. The storage of your project and its history is called a repository. It is fine. However, the great advantage of using version control such as Git is to be able to collaborate with other people (and also to store your repository somewhere else).
GitHub is a commercial website that lets you store your repository publicly for
free (you need to pay if you want to keep them private, you can also get an
educative account with an .edu
email address that will give you some free
private accounts). There are other website that offer similar services including
BitBucket. Storing your repositories on these website has many advantages. It
offers a friendly interface to many common operations so that you don’t have to
remember how to do them at the command line. They also provide other useful
features including an “Issue tracker” and wikis.
We have already “committed”, “pulled” and “pushed” but we haven’t been through the details of what it means how these pieces fit together.
The typical workflow goes like this: - you create/edit/modify a file inside your repository - you stage the changes to the staging area - you commit these changes which creates a permanent snapshot of the file in the Git directory along with a message that indicates what you did to the file.
When you start a new project, the files in your working directory are untracked, you will first need to add them to your repository before Git can keep track of them and their history.
At this stage, everything is still on your hard drive. To upload your modifications (i.e., your commits) to GitHub you need to push to it.
If you are working with other people you are also committing your shared repository on GitHub, you will need to pull to bring their modifications into your local copy of the repository.
Commits are cheap. Commit often and provide useful messages so you can keep track of what you are doing. Don’t do this:
(From xkcd.com)
Branching is a great feature of version control. It allows you to duplicate your existing repository, develop or experiment with a new feature independently, and if you like what you are doing you can merge these modifications back into your project.
Branching is particularly important with Git as it is the mechanism that is used when you are collaborating to external projects (projects you are not directly involved with).
RStudio can’t create branches directly, so you need to either:
git checkout -b
new-branch
With a pull request you are asking someone who maintains a repository to pull your changes into their repository.
To issue a pull request:
How to do this using RStudio and GitHub?
https://github.com/r-bio/challenges-francois
, it will
create a copy at https://github.com/fmichonneau/challenges-francois
.https://github.com/fmichonneau/challenges-francois
, and choose an
appropriate location on your hard-drive to store the project.At this point, we need to tell Git that this project has an upstream
version. There is no way of doing this within RStudio, so you need to enter
some commands in the shell. Go to Tools > Shell, and enter the address of the
upstream repository (in our example
https://github.com/r-bio/challenges-francois
):
`git remote add upstream https://github.com/r-bio/challenges-francois`
Make sure that it worked by typing git remote -v
, it should display 4
lines, 2 that start with origin
and the address of your fork, and 2 that
start with upstream
and the address of the upstream repository. Note that
here we used upstream
to name the upstream repository but we could have
given it another name. In this case, upstream
is just easy to remember and
accurate. Keep your shell window open.
We are now going to create a branch for our changes so they are self-contained. There is also no way to do this within RStudio so we are going to enter additional commands in the shell to create it:
`git checkout -b proposed-fixes master`
Here, proposed-fixes
is an arbitrary name we give to our branch. Ideally,
you want to choose a name that summarize what your proposed changes are
about. After seeing the message Switched to a new branch...
, you can close
the shell window.
When you are done, commit your changes:
Click the staged checkbox for the files that are affected by the
modifications you want to commit to your repository:
Make sure that you are in the correct branch, (proposed-fixes
appears next
to the History button), and write a commit message: The first line should be short (< 50 characters),
additional details can be given after skipping a line:
Close the window, and go back to the shell. There, type:
git push origin proposed-fixes
This command sends your modification to your fork on GitHub.
https://github.com/fmichonneau/challenges-francois
, and click on the green
icon:
compare
drop-down menu
in the top right, and click on the green button: “Create pull request”.If you want to accept the pull request, you can click on the “Merge pull request” button in your repository, and after adding an optional comment, click again on the “Accept pull request” button.
After that, you need to import (pull) these changes into your local copy of the repository. You should be able to do pull the changes by clicking on “Pull branches” in the “Git” menu of your R project in RStudio.
There are many resources on the web to learn about Git and GitHub: