Before anyone gets confused, yes, the Aging Well Lab still conducts psychology research, and no, we aren’t moving our lab space to Silicon Valley. As it turns out, coding is a very relevant skill for most research disciplines. Many neuroscientists and psychologists use coding software to create tasks or “games” to collect data, and statistical languages like R are used to analyze participants’ behavior.
Some labs use proprietary software like the Statistical Package for the Social Sciences (SPSS) or Jeffrey’s Amazing Statistics Program (JASP). They are menu-driven, like Excel, and don’t require prior coding experience. Users can analyze their data by pushing buttons presented by the software, but there isn’t any record of the changes made to the data by the program. Other labs, like ours, use scripts written in languages like R or Python to analyze their data. Scripts can track every change that has been made to data, which allows the researcher to easily re-run analyses, detect and fix errors, and share their methods with other researchers.
Have you ever had multiple versions of a paper for a class and had trouble keeping track of which version you were working on? Platforms like Box or Dropbox provide an efficient way to track changes made to a document. This is known as version control. However, these platforms are often restricted to applications like Word, PowerPoint, and Excel. Git provides the same functionality, but works specifically for code.
Terms to Know
All bolded words in this article are defined here.
- Command Line – (aka terminal, console, CLI) used to type commands for your computer
- Directory – another word for a folder on a computer
- Repository or Repo – a directory that stores all the files related to the code you’re working on. Directories can be repositories and vice versa. Repos are specific to the content inside
- You can have directories inside repos to organize your files
- Your repo will likely be in a directory on your computer
- Local vs Remote – files stored on your computer are local. Files stored elsewhere (think cloud storage) are remote.
- Commit – a snapshot of the changes you’ve made to files in your repository
- Stage – packaging files that you want to commit together. You can stage and commit files one by one, or you can stage files with similar changes together for one commit.
Git and the Command Line
Git is a free version control software that uses an application already on your computer. This application, called the command line (a.k.a. terminal, console, CLI), is essentially a small black window that you can use to type commands for your computer. Typing “git” before a command tells the computer to use Git for the action you want to execute.
Each new line in the terminal will start with a prompt: either $ (Mac/Linux) or > (Windows). Command line tutorials usually include prompts in their examples, but keep in mind that you only have to type everything that follows the prompt, not the prompt itself. From this point forward, I’ll be formatting my commands with the prompt as well.
Check out this command line cheat sheet if you need some help!
The command line is like a horse with blinders; it can only focus on what’s right in front of it and needs step-by-step guidance to know what you want it to do. When working in the command line, you have to make sure that it’s able to see the file or folder you’re working on. Either the top of the command line window or the prompt indicates the folder it’s currently “seeing”. Always make sure the command line is seeing the correct folder, especially when using Git!
Basic Git Workflow
“Saving” in Git, known as committing, is different than saving something like a Word document. Saving your script using ctrl + S/cmd + S keeps the most up to date version of your file on your computer and discards previous versions. Committing your changes to Git requires using the command line, and you can commit multiple files in one go. Each time you commit a file (or files), Git takes a snapshot of the changes you’ve made from previous versions rather than saving the entire document again.
Git is incredibly useful, but it doesn’t automatically work with every folder or file on your computer. You first have to create a folder, or directory, to store any files related to your code. Consolidating these files in a directory also makes that folder a repository, or repo. In the command line, change directories to your repository and type “git init” to connect it to Git. Now you’ll be able to use Git to commit the changes you make for all the files in your repository.
It’s good practice to check on the status of your repo, especially before committing. Type
$ git status
in the command line, and it’ll tell you which files in your repo have been modified, deleted, and added since your last commit. Then, you have to prep, or stage, the files you want to commit.
Organize your commits!
I’m using R to work on some code, and have 4 files in my repo: fileA.R, fileB.R, fileC.R, and fileD.R. For fileA, fileB, and fileC, I added one line of code to calculate a sum. In fileD, I removed a couple of lines that kept returning an error message. Once finishing work for the day, I check the status of my repo, it tells me that all 4 of my files are not staged for commit. I could type $ git add fileA.R
$ git add fileB.R
$ git add fileC.R
as separate lines, but that can be tedious. Instead, I type$ git add *
which stages all unstaged files in your repo. Now I also have fileD staged, but I made different changes to fileD and don’t want to commit it with my other 3 files. To unstage fileD, I type$ git rm fileD.R
.
Once again, I’ll type in$ git status
to double check that the only files I want to commit are staged, and then type$ git commit -m "added line to calculate sum"
.
After fileA, fileB, and fileC are committed, I can stage fileD and type$ git commit -m "removed lines that didn't work"
.
Staging helps Git understand which files you want to group together for a single commit. Files will remain unstaged until you stage them yourself. You can do this for each individual file by typing
$ git add [filename.extension]
or stage all unstaged files by typing
$ git add *
When you’re ready to commit, type
$ git commit -m "comment"
The “-m” tells Git that the following sentence in quotations is your commit message. A good commit message explains what you did, why you did it, and can easily be understood by others viewing your script. Try to keep commit messages short—about 50 characters or less.
Now you know how to connect your repository to git, stage your files, and commit your changes! The steps outlined above highlight some of the common Git commands, but is by no means exhaustive. Git is a powerful tool that every coder should have, regardless if you’re a professional software engineer or someone like me, who uses it for research. Coding comes with a steep learning curve, but once you get the hang of it, it can be one of the most fun parts of your job! Keep an eye out for a future post on how to collaborate with others using an extension of Git—GitHub.