Skip to contents

The goal of gitcellar is to help you download archives of all repositories in an organization. For context see the blog post Safeguards and Backups for GitHub Organizations.

Installation & setup

You can install the development version of gitcellar from GitHub with:

# install.packages("remotes")
remotes::install_github("ropensci-org/gitcellar")

You will need a GitHub Personal Access Token (PAT). See gh docs on the topic. As long as you’re an owner of the organisation you’re trying to back up, absolutely no permissions are required for your PAT. You will only need to add the repos scope if you wish to automatically include private repositories in the list of repos to back up. Note however that there is a workaround using the extra_repos argument in download_organization_repos().

Example

This is a basic example which shows you how to download archives of all repositories in an organization (of which you are an owner):

library(gitcellar)
download_organization_repos(organizations = "maelle-test")

The archives (<org-name>_<repo-name>_migration_archive.tar.gz) will be saved in distinct folders (archive-<org-name>_<repo-name>) under the current directory (or the directory you input via the argument dest_folder). It might seem wasteful to create one archive per repository as opposed to one archive of all repositories but in our experience it prevents failures. Then, the reason to store one archive per folder is due to the fact that it worked better with the tool we used for uploading the archive to a cloud service.

After this step, you can use the tool of your choice to upload the backups to a cloud service like Digital Ocean, AWS, etc. You could run the code once a week and keep 8 weeks of backups on a rolling basis.

Where is the code?

In the archive .tar.gz, you will find JSON files of metadata about the organizations (members for instance) and repositories (issues, pull requests) but also bare git repositories. A bare git repository is a git repository as it exists on a remote. All the code is in there but you cannot see it until you clone the bare git repository to another folder where you will be able to see the files because by default the clone is not bare. Or you can use gert::git_ls(<path-to-bare-repo>, ref = "<default-branch>") to list files tracked in the bare git repository.

You could think of the bare git repositories as a compressed version of the code in the sense that it contains all the information, and that you just need a few steps (cloning to another folder) to get to the actual repository content (including all its git history).