GIT Basics Notes
What's GIT?
What is Git in a nutshell?This is an important section to absorb,because if you understand what Git is and the fundamentals of how it works,then using Git effectively will probably be mucher easier for you. As you learn Git, try to clear your mind of the things you may know about other VCSs,such as CVS,Subversion or Perforce-doing so will help you abvoid subtle confusion when using the tool.Even though Git's user interface is fairly similar to these other VCSs,Git stores and thinks about information in a very different ways,and understanding these differences will help you avoid becoming confused while using it.
Snapshots .Not Differences
The major difference between Git and any other VCS(Subversion and friends included) is the way Git thinks about its data.Conceptually,most other systems store information as a list of file-based changes. These other system(CVS,Subversion,Perforce,Bazzar,and so on) think of the information they store as a set of files and the changes made to each file over time(this is commonly described as delta-based version control).
Git does not think of or store its data this way. Instead ,Git thinks of its data like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project,Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed ,Git doesn't store the file again,just a link to the previous indentical file it has already stored. Git thinks about its data more like a stream of snapshots.
This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every espect of version control that most other system copied from the previous geneartion. This make Git more like a mini filesystem with some incedibly powerful tools built on top of it ,rather than simply a VCS .
Nearly Every Operation Is Local
Most operations in Git need only local files and resouces to operate-generally no inforamtion is needed from another computer on your network.If you're used to a CVCS where most operations have that network latency overhead,this aspct of Git will make you think that the gods of speed have bleased Git with unworldjy powers.Because you have the entire history of the project right there on your local disk,most operations seem almost instantaneous.
For example ,to browse the history of the projectr,Git does't need to go the server to get the history and display it for you-it simply reads it directly from your locat database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago,Git can look up the file a month ago and do a local differenct calculation,instead of having to either ask a remote server to do it or pull an older version of the file form the remote server to do it locally.
This also mean that there is very little you can't do if you can't offline or off VPN.If you get on an airplane or a train and want to do a little work,you cna commit happily (to your local copy,remember?) until you get to a network connection to upload. If you go home and can't get your VPN client working properly,you can still work. In many other systems ,doing so is either impossible or painful. In Perforce ,for example ,you can't do much when you aren't connected to the server;in Subversion and CVS,you can edit files,but you can't commit changes to your database(because your database is offline).This may not seem like a huge deal,but you may be surprised what a big difference it can make.
Git Has Integrity
Everything in Git is checksummed before it is stored and is then referred to by that checksum. This means it's impossible to change the contents of any file or directory without Git knowing about it. This functionality is bulit into Git at the lowest levels and is integral to its philosophy. You can't lose information in transit or get file corruption without Git being able to detect it.
The mechanism that Git uses for this checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters(0-9 and a-f) and calculated based on the contents of a file or directory structure on GIt. A SHA-1 has looks somthing like this:
24b9da6552252987aa493b52f8696cd6d3b00373
You will see thess hash values all over the place in Git because it uses them so muth.In fact ,Git stores everything in its database not by file name but by the hash value of its contents.
Git Generally Only Adds Data
When you do actions in Git,nearly all of them only add data to the Git database. It is hard to get the system ot do anything that is not undoable or to make it erase data in any ways. As with any VCS,you can lose or mess up changes you haven't commited yet,but after you commit a snapshot int Git,it is very difficultto lose,especially if you regularly push your database to another repository.
This make using Git a joy because we knows we can experement without the danger of severely screwing things up.
The Three states
Pay attention now - here is the main thing to remember about Git if you want the rest of your learning process to go smoothly.Git has three main states that your files can reside in :commited,modified and staged:
- Committed means that the data is safely stored in your database.
- Modified means that you habe changed the file but have not committed it to your database yet.
- Staged means that you have marked a modified file int its current version to go into your next commit snapshot.
This leads us to the three main sections of a Git project:the Git directory,the working tree ,and the staging area.
The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git,and it is what is copied when you clone a repository from another computer.
The working area is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a file ,generally contained in your Git directory,that stores information about what will go into your next commit.Its technical name in Git parlance is the "index",but the phrase "staging area" works just as well.
The basic Git wrokflow goes somthing like this :
1.Your modify files in your working tree.
2 .You selectively stage just those changes you want to be part of your next commit,which adds only those changes to the stageing area.
3.You d a commit,whick takes the files as they are in the staging area and stroes that snapshot permanently to your GIt directory.
If a particular version of a file is in the Git directory,it's considered commited.if it has been modified and was added to the staging area,it is staged. And it was changed since it was checked out but has not been staged, it is modified.
The Command Line
There are a lot of different ways on use Git.There are the original command-line tools,and there are many graphical user interfaces of varying capabilities.We will be using Git on the command line.For one ,the command line is the only place you can run all Git commands-most of the GUIs implement only a partial subset of Git functionality of simplicity.If you know hwo to run trhe command-line version,you can probably also figure out how to run the GUI version ,while the opposite is not necessarily true.Also ,while your choice of graphical client is a matter of personal taste,all users will have the command-lines tool installed and avaliable.
Installing Git
Before you start using Git,you have to make it available on your computer,Even if it's already installed ,it's probably a good idea to update to the lastest version.You can either install it as a package or via another installer,or download the source code and compile it yourself.
Installing on Linux
If you want to install the basic Git tools on Linux via a binary installer,you can generally do so through the package management tool that comes with your distribution.If you're on Fedora (or any closely-related RPM-based distribution,such as RHEL or CentOS),you can use dnf:
$sudo dnf install git-all
If you're on a Debian-based distribution,such as Ubuntu,try apt
$sudo apt install git-all
Installing on macOS
There are several wasy to install Git on a Mac.The easist is probably to install the Xcode Command Line Tools.On Mavericks(10.9) and above you can do this simply by trying to run git from the Terminal the very first time.
$git --version
If you don't have it installed already,it will prompt you to install it.
....omit.......
First-Time Git Setup
$git config --global user.name "renbo"
$git config --global user.email renbo@163.com
$git config --global core.editor emacs
Checking Your settings
$git config --list
Getting help
$ git help <verb>
$ man git-<verb>
For example :$git help add or $git add -h or $man git-add
Getting a Git repository
You typically obtain a Git repository in one of two ways:
1.You can take a local directory that is currently ont under version control and turn it into a Git repostory,or
2.You can clone an existing Git repository from elsewhere.
Initializing a Repository in an Existing Direcotry
If you have a project directory that is currently not under version control and you want to start controlling it with Git,you first need to go to that project's direcotry.If you've done this,it looks as a little different depending on which system you're running:
for Linux $ cd /home/user/my_project
for Mac $cd /Users/user/my_project
for Windows $cd /c/usr/my_project
and type $git init
This creates a new subdirectory named .git that contains all of your necessary repository files - a Git repository skeleton. At this point,nothing in your project is tracked yet.
If you want to start version-controlling existing files,you should probably gegin tracking those files and do an commit.You can accomplish that with a few git add commands that specify the files you want to trrack,followed by a git commit:
$git add *.c
$git commit -m 'initial project version'
We'll go over what these commands do in just a minute.At this poont ,you have a Git repository with tracked files and an initial commit.
Clonig an Existing Repository
If you want to get a copy of an existing Git repository-for example ,a project you'd like to contibute to -the command you need is git clone.If you're familiar with other VCS systems such as Subversion,you'll notice that the command is "clone" and not "checkout" .This is an important distinction-instead of getting just a working copy ,Git reveives a full copy of nearly all data that the server has .Every version of every file for the history of the project is pulled down by default when you run git clone.In fact ,if your server disk gets corrupted,you can often use nearly any of the clones on any client to set the server back to the state it was in when it was cloned(you may lose some server-side hooks and such,but the version data would be there)
You clone a repository with git clone <url>. For example ,if you want to clone the Git linkable libaray called libgit2,you can do so like this:
$git clone https://github.com/libgit2/libgit2
That creates a directory name libgit2 ,initializes a .git directory inside it,pulls down all the data for that repository, and checks out a working copy of the lastest version.If you go into the new libgit2 directory that was just created,you'll see the project files in there,ready to bo worked on or used.
If you want to clone the repository into a directory name something other than libgit2,you can specify the new direcotry name as an additional argument:
$git clone https://githum.com/libgit2/libgit2 mylibgit
That commant does the same thing as the previous one ,but the target directory is called mylibit.
Recording Changes to the Repository
At this point ,you should have a bona fide Git repository on your local machine,and a checkout or working copy of all of its file in front of you.Typically,you'll want to start making changes and committing snapshots of those changes into your repository each time the project reaches a state you want to record.
Remember that each file in your working directory can be in one of two states:tracked or untracked.Tracked files are files that were in the last snapshot;they can be unmodified,modified,or staged.In short ,tracked files are files that Git knows about.
Untracked files are everything else-any files in your working directory that were not in your last snaphsot and are not in your staging area,When you first clone a repository ,all of your files will bi tarcked and unmodified because Git just checked them out and you haven't edited anything.
As you edit files ,Git sees them as modified,because you've changed them since your last commit.As you work ,you selectively stage these modified files and then commited all those staged changes,and the cycle repeats.
Checking the status of Your Files
The main tool you use to determinate which files are in which state is the git status command,If you run this command directly after a clone ,your should see somthing like this:
$git status
On branch master
Your branch is up-to-data with 'origin/master'
nothing to commit,working directory clean
This means you have a clean working direcoty;in other words,none of your tracked files are modified.Git also doesn't see any untracked files ,or they would bi listed here.Finally,the command tells you whick branch you're on and informs you that it has not diverged from the same branch on the server.For now,that branch is always "master",which is the default;your won't worry about it here.
Let's say you add a new files to your project,a simple README file.If the file didn't exist before,and you run git status,your see your untracked file like this:
$echo 'my project'>README
$git status
On branch master
Your branch is up-to-date with 'origin/master'
Untracked files:
(use "git add <file>..." to include in what will be commited)
README
nothing added to commit but untrached files present(use "git add" to track)
You can see that your new README file is untracked,because it's under the *Untracked files" heading in your status output.Untracked basically means that Git sees a file you didn't have in the prevous snapshot(commit);Git won't start including it in your commit sanpshots until you explicitiy tell it to do so .It does this so you don't accidentally begin including generated binary files or other files that you did not mean to include.You do want to start including READM,so let's start tracking the file.
Tracking New Files
In order to begin tracking a new file,you use the command git add.To begin tracking the README file,you can run this:
$git add README
If you run your status command again, you can see that your README file is now tracked and staged to be commited:
$git status
On branch master
Your branch is up-to-date with 'orgin/master'
Change to by commited:
(use "git reset HEAD<file>...." to unstage)
new file: README
You can tell that it's staged because it's under the "Changes to be commited" heading .If your comit at this point ,the version of the file at the time you ran git add is what will be in the subsequent historical snapshot.You may recall that when you run git init earlier,you then ran git add <files> -that was to begin tracking files in your directory.The git add command takes a path name for either a file of a directory;if it's a directory,the command adds all the files in that directory recursively.
Staging Modified Files
Let's change a file that was already tracked.If you change a previously tracked file called CONTRIBUTING.md and then run your git status command again,you get something that looks like this:
$git status
On branch master
Your branch is up-to-date with 'origin/master'
Changes to be commited:
(use "git reset HEAD <file> ..."to unstage)
new file: README
Changes not staged for commit:
(use "git and <file>...." to update what will be commited)
(use "git checkout -- <file>..." to discard chages in working directory)
modified: CONTRIBUTING.md
This CONTRIBUTE.md file appears under a section named "Changes not staged for commit"--which means that a files that is tracked has been modified in the directory but not yet staged.To stage it ,you run the git add command .git add is a multipurpose command--youd use is to begin tracking news files,to stage files ,and to do another things like making merg-conflicted files as resolved .It may be helpful to think of it more as "add precisely this content to the next commit" rather than "add this file to the project".Let's run git add now to stage the CONTRIBUTING.md file,and then run git status again:
$git add CONTRIBUTING.md
$git status
On branch master
Your branch is up-to-date win 'origin/master'
Change to the commited:
(use "get reset HEAD <file>..."to unstage)
new file: README
modified files:CONTRIBUTING.md
Both files are staged and will go into your next commit.At htis point,suppose you remember one little change that you want to make in CONTRIBUTING.md before you commit it .You open it again and make that change,and you're ready to commit.However ,let's run git status one more time:
$vim CONTRIBUTING.md
$git status
On branch master
Your branch is up-to-date with 'orgin/master'
Changs to be commited:
(use "git reset HEAD <file>..."to unstage)
new file:README
modified: CONTRIBUTING.md
Changes not staged for commit:
(use "git add <file> ..." to update what will bi commited)
(use "git checkout --<file>....." to discard chages in working direcotry)
modified: CONTRIBUTING.md
What the heck? Now CONTRIBUTING.md is listed as both staged and unstaged.How is that possible?It turns out that Git stages a file exactly as it is when you run the git add command.If you commit now,the version of CONTRIBUTING.md as it was when you last ran the git add command is how it will go into the commit,not the version of the file as it looks as in your working driectory when you run git commit . If you modify a file after you run git add,you have to run git add again to stage the latest version of the file:
$ git add CONTRIBUTING.md
$ git status
On branch master
Your banch is up-to-date with 'origin/master'
Changes to be commited:
(use "git reset HEAD <file>...." to unstage)
new file: README
modified : CONTRIBUTING.md
Short Status
While the git status output is pretty comprehensive ,it's also quite wordly.Git also has a short status flag so you can see your changes in a more compact way.
Ignoring Files
Often,you'll bave a class of files that you don't want Git to automatically add or even show you as being untracked.These are generally automatically generated files such as log files or files produced by your build system.In such case,you can create a file listing patters to match them names .gitignore.Here is an example .gitignore fiel:
$cat .gitignore
*.[oa]
*~
The first line tells Git to ignore andy files ending in ".o" or ".a"--object and archieve files that may be the product of building your code.The second line tells Git to ignore all files whose names end with a tiide(~),which is used by many text editors such as emads to mark termporary files. You may also include a log.tmp.or pid directory;automatically generated documentation;and so on.Setiting up a .gitignore file for your new repository before you get going is generally a goog idea so you don't accidentally commit files that you really don't want in your GIt repository.
Viewing Your Staged and Unstaged Changes
If the git status command is too vague for you--you want to know exactly what you changed,not just chich files were changed--you can use the git diff command.We'll cover git diff in more detail later,but you'll probably use it most often to answer two questoins:What have you changed but not yet staged? And what have you staged that you are about to commit?Although git status answers those questions very generally by listing the file names,gitt diff shows you the exact lines added and removed --the patch ,as it were.
Let's say you edit and stage the README file again and then edit the CONTRIBUTING.md file without stageing it .If you run your git status command,you once again see something like this:
$git status
On branch master
Your branch is up-to-date with "origin/master"
Changes to be commited:(use "git reset HEAD <file>.."to unstaged)
modified :README
Changes no staged fo commit:(
use "git add <file>..." to update what will bi commited)
(use "get checkout --<file>..." to discard changes in working directory)
modified :CONTRIBUTING.md
To see what you've changed but not yet staged,type git diff with no other arguments:
$git diff
diff --get a/CONTRUBUTING.md b/CONTRIBUTING.md
index 83bb991..643e224f 100644
--- a/CONTRIBUTING.md
+++b/CONTRIBUTING.md
@@ -65,7 +65,8 @@branch directory,things can get messy.
Please include a nice description of your changes when you submet your PR;if we have to read the whole diff to figure out why you're contributing in the first place,you've less likely to get feedback and have your chage
-- merged in.
++merged in. Alse ,split your changes into comprehensive chunks if your patch is
+longer than a dozen lines.
If you are starting to work on a particular area,feel free to submit a PR that highlights your work in progress(and note in the PR title that it's
That command compares whate is in your working direcotry with what is in your staging area. The results tells you the changes you'ver made that your haven't yet staged.
If you want to see what you've staged that will go into your next commit,you can use git diff --staged .
Committing Your Changes
$git commit
Skipping the Staging Area
Althouth it can be amazingly userful for crafting commits exactly how you want them,the stageing area is sometimes a bit more complext than you need in your workflow.If you want to skip the staging area ,GIt provides a simple shortcut.Adding teh -a option to the git commit command makes Git automatically stage every file that is already tarcked before doing the commit,letting you skip th git add part.
Removing Files
To remove a file from Git ,you have to remove it from your tracked files(more accurately,remove it from your staging area) and then commit.The git rm command does that,and also removes the file from your working direcotry so you don't see it as an untracked file the next time arround.
If you simply remove the file from your working directory,it shows up under the "Change not staged for commit'(that is ,unstaged) area fo your git status ouput:
$rm PROJECTS.md
$git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>.." to update what will be committed)
(use "git checkout -- <file>... " to discard changes in working directory)
deleted: PROJECTS.md
no changes added to commit(use "git add" and/or "git commit -a")
Then ,If you run git rm,it stages the file's removal:
$git rm PROJECTS.md
rm 'PROGICTS.md'
$git status
On branch master
Your branch is up-to-date with 'origin/master'
Changes to be commited:
( use "git reset HEAD <file>..." to unstage)
deleted:PROJECTS.md
The next time you commit,the file will be gone and no longer tracked,If you modified the file or had already added it to the staging area,you must force the removal with the -f option,This is a safety feature to prevent accidental removal of data than hasn't yet been recorded in a snapshot and that can't be recovered from Git.
Another useful thing you may want to do si to keep the file in your working tree but remove it form your staging area.In other words,you may want to keep the file on your hard drive but not have Git track it anymore.This is particularly useful if you forgot to add something to your .gitignore file and accidentally staged it,like a large log file or a bunch of .a compiled files.To do this,use the --cache option:
$ git rm --cache README
Moving Files
Unlike many other VCS systems,Git doesn't explicitly track file movement.If you rename a file in Git,no metadata is stored in Git that tells it you renamed the file.However,git is pretty smart about figuring that out after the fact-we'll deal with detecting file movement a bit later.
Thus it's a bit confusing that Git has a mv command,If you want to rename a file in Git,you can run something like :
$ git mv file_from file_to
and it works fine.In fact ,if you run something like this and look at the status,you'll see that Git considers it a renamed file:
$ git vm README.md README
$ git status
On branch master
Your branch is update-to-date with 'origin/master'
Changes to be commited:
(use "git reset HEAD <file>...." to unstage)
renamed:README.md --> README
However,this is equivalent to running something like this:
$ mv README.md README
$ git rm README.md
$git add README
Git figures out that it's a rename implicitly,so it doesn't matter if you rename a file that way or with the mv command.The only real difference is that git mv is one command instead of three -- it's a convenience function.More importantly ,you can use any tool you like to rename a file,and address the add/rm later,before you commit.
Viewing the Commit History
After you have created several commits,or if you have cloned a repository with an existing commit history,you'll probably want to look back to see what has happened.The most basic and powerful tool to do this is the git log command.
These examples use a very simple project called "simplegit".To get the project,run
$git clone https://github.com/schacon/simplegit-progit
When you run git log in this project,you should get output that looks something like this:
$git log
comit...................
Undong Things
At any stage ,you may want to undo something.Here ,we'll review a few basic tools for undoing changes that you've made.Be careful,because you can't always undo some of thesd undos.This is one of the few areas in Git where you may lose some work if you do it wrong.
One of the common undos takes place when you commit too early and possibly forget to add some files,or you mess up your commit message.If you want to redo that commit,make the additional chages you forgot,stage them,and commit again using the --amend option:
$ git commit --amend
This command takes your staging area and uses it for the commit.If you've made no changes since your last commit(for instance,you run this command immediately after your previous commit),then your snapshot will look exactly the same,and all you'll change is your commit message.
The same commit-message editor fires up,but is already contains the message of your previous commit.You can edit the message the same as always,but it overwrites your previous commit.
As an example,if you commit and then realize you forgot to stage the changes in a file you wanted to add to this commit,you can do something like this:
$ git commit -m 'initial commt'
$ git add forgotten_file
$ git commit --amend
You end up with a single commit --the second commit replaces the results of the first.
Unstaging a Staged File
The next two sections demonstrate how to work with your staging area and working directory changes.The nice part is that the command you use to determine the state of those two areas also reminds you howto undo changes to them.For example ,let's say you've changed two files and want to commit them as two separate changes,but you accidentally type git add * and stage them both.How can you unstage one of the two?The git status command remids you:
$git add *
$git status
On branch master
Your branch is up-to-date with 'origin/master'
Changes to be commited;(use "git reset HEAD <file>.." to unstage)
renamed: README.md --> README
modified:CONTRIBUTING.md
Right below the "Changes to be commitd" text,it says use git reset HEAD <file>... to unstage file.So,let's use that advice to unstage the CONTRIBUTING.md file:
$ git reset HEAD CONTRIBUTING.md
Unstaged changes after reset:
M CONTRIBUTING.md
$ git status
On branch master
Changes to be commited:(use "git reset HEAD <file>..." to unstage)
renamed: README.md-->README
Changes not staged for commit:
(use "git add <file> ..." to update what will be commited)
(use "git checkout -- <file>... " to discard changes in working direcotry)
modified:CONTRIBUTING.md
The command is a bit strange,but it works. The CONTRIBUTING.md file is modified but once again unstaged.
Unmodifying a Modified File
What if you realize that you don't want to keep your changes to the CONTRIBUTING.md file?How can you easily unmodify it-revert it back to it looked like when you last commited(or initially cloned,or however you got it into your working directory)? Luckily,git status tells you how to do that,too.In the last example output,the unstaged area looks like this:
Changes not staged for commit:
(use "git reset HEAD <file>..." to unstage)
(use "git checkout -- <file> .." to discard changes in working directory)
modified: CONTRIBUTING.md
It tells you pretty explicitly how to discard the changes you've made.Let's so what it says:
$ git checkout --CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
renamed: README.md --> README
You can see that the changes have been reverted.
It's important to understand that git checkout -- <file> is a dangerous command .Any local changes you made to that file are gone--Git just copied the most recently-committed version of that file over top of it. Don't ever use this command unless you absolutely know that you don't want those unsaved local changes.
If you would like to keep the changes you've made to that file but still need to get it out of the way for now,we'll go over stashing and branching in Git branching;these are generally better wasy to go.
Remember ,anything that is committed in Git can almost always to be recovered.Even commits that were on branches that were deleted or commits that were overwritten with an --amend commit can be recovered.However ,anything you lose that was never commited is likely never to be seen again.