z - Advanced topic: Understanding the rPackages set release cycle and using bleeding edge packages
Source:vignettes/z-bleeding_edge.Rmd
z-bleeding_edge.Rmd
Introduction
It is important to understand the release cycle of the rPackages set
and what steps you should take if you need bleeding edge packages. R
packages on nixpkgs
tend to get updated alongside a new
release of R, and the reason is to ensure a certain level of quality.
The vast majority of CRAN (and Bioconductor) packages are made available
through nixpkgs
in a fully automated way. But some packages
do require some manual intervention to work on Nix, and we only know
this if we try to build these packages, but building packages requires
quite a lot of resources. We can’t build CRAN packages every single day
to see if everything works well on Nix, so we only rebuild the whole
tree whenever there’s a new release of R. Packages get built on a CI
infrastructure called Hydra, and then these packages get cached
on cache.nixos.org so whenever
someone wants to install a package, a pre-built binary gets download
from the cache. This avoids having to build software from source
locally. For packages that don’t need compiling this is not that big of
a time save, but for packages that do need to get compiled it is huge.
Depending on which packages you want to install, if you had to build
everything from source, it could potentially take hours, but if you can
install pre-built binaries it’s just a matter of how quick your Internet
connection is.
R packages available through Nix
As explained in the introduction, the rPackages set on
nixpkgs
gets updated shortly after a new release of R. The
process involves first updating the package definitions found here,
and then building the whole tree on a CI platform called Hydra.
Build failures then get fixed by volunteers (to learn how you can
contribute, read the
vignette("z-contributing_to_nixpkgs")
). After the most
important packages have been fixed, the whole rPackages set gets updated
and made available through nixpkgs
master branch.
Essentially this means that if you start a project with
rix using "latest-upstream"
as the
r_ver
just after the rPackages set got updated, this
project will use very fresh packages. But if instead you start a project
just before an R release, then the environment will be using older
packages. In practice this rarely matters, unless you absolutely need a
very recent version of a specific package because you need a specific
feature, or if you need an environment with bleeding edge packages for
development. For cases like this, we provide the
r_ver = "bleeding-edge"
and
r_ver = "frozen-edge"
options. If you need to test the
current development version of R, you can use
r_ver = "r-devel"
, and if you need to test the current
development version of Bioconductor use
r_ver = "bioc-devel"
and if you need both the development
version of R and Bioconductor use
r_ver = "r-devel-bioc-devel"
. The table below illustrates
this more clearly:
r_ver or date | Intended use | State of R version | State of CRAN packages | State of Bioconductor packages | State of other packages in Nixpkgs |
---|---|---|---|---|---|
r_ver = "latest-upstream" | Start of new project where versions don’t matter | Current or previous | Outdated (up to 6 months) | Outdated (up to 6 months) | Current at time of generation |
r_ver = "4.4.2" (or other) | Reproducing old project or starting a new project where versions don’t matter | Same as in `r_ver`, check `available_r()` | Outdated (up to 2 months if using latest release) | Outdated (up to 2 months if using latest release) | Potentially outdated (up to 12 months) |
date = "2024-12-14" | Reproducing old project or starting a new project using the most recent date | Current at that date, check `available_dates()` | Current at that date, check `available_dates()` | Current at that date, check `available_dates()` | Potentially outdated (up to 12 months) |
r_ver = "bleeding-edge" | To develop against the latest release of CRAN | Always current | Always current | Always current | Always current |
r_ver = "frozen-edge" | To develop against the latest release of CRAN, but manually manage updates | Current at time of generation | Current at time of generation | Current at time of generation | Current at time of generation |
r_ver = "r-devel" | To develop/test against the development version of R | Development version | Always current | Always current | Always current |
r_ver = "r-devel-bioc-devel" | To develop/test against the development version of R and Bioconductor | Development version | Always current | Development version | Always current |
r_ver = "bioc-devel" | To develop/test against the development version of Bioconductor | Always current | Always current | Development version | Always current |
If you want to benefit from relatively fresh packages and have a
stable environment that for production purposes, using a date for
r_ver
is your best option.
Using bleeding and frozen edge package for your environments
CRAN is continuously getting new or updated packages. When you use R
outside of Nix, running install.packages(pkg)
will install
the latest version of the {pkg}
package available from CRAN
(unless you changed the default repository url). With Nix, packages do
not get downloaded from CRAN but for the nixpkgs
repository
and they may be outdated as explained above. If you require bleeding
edge packages, use:
-
"bleeding-edge"
for the very latest packages as available currently on CRAN and Bioconductor (there could be a slight delay of some hours though): be aware that building this expression will always result in a different, updated environment. As such,"bleeding-edge"
is ideal for running tests against the current state of CRAN or Bioconductor on CI for example; -
"frozen-edge"
, which is the same as"bleeding-edge"
, but with a pinned revision: in other words, building this expression will always result in the same environment. -
"r-devel"
: the same as"bleeding-edge"
but also with added development version of R. Building this expression will also always result in a different, updated environment. -
"bioc-devel"
: the same as"bleeding-edge"
but also with added development versions of Bioconductor packages instead of the latest release. Building this expression will also always result in a different, updated environment. -
"r-devel-bioc-devel"
: the same as"bioc-devel"
but also with added development version of R. Building this expression will also always result in a different, updated environment.
Just as when you provide a date or an R version to
rix()
, using these options will point to our
rstats-on-nix
fork of Nixpkgs.
This fork gets updated every 12 hours with the latest commits from both
the nixpkgs
repository and CRAN. This means that
environments generated using this fork will contain bleeding packages
for both the CRAN (and Bioconductor) packages as well as system-level
dependencies.
In any case, whatever option you choose, using them comes at a cost of which you must be aware.
First, because these packages are bleeding edge, they have not had the chance to be built by Hydra yet. Hydra periodically builds packages and these get cached. So if you’re using Nix, pre-compiled binaries get used instead of being built from source. This is not the case if you use our fork, unless you also use the cache that we provide, courtesy of cachix.org. However, we are limited in space, and cannot possible cache all the CRAN and Bioconductor packages. So only the most popular packages get built and cached, and hopefully the packages you need are part of this limited set. To use the cache, run the following commands on your computer. First, install the Cachix client (if you followed these instructions already when you set up rix you can skip them):
nix-env -iA cachix -f https://cachix.org/api/v1/install
then use the cache:
cachix use rstats-on-nix
that’s it! Packages now not only will get pulled from the official
cache, but also from the dedicated rstats-on-nix
cache. Our
cache also contains the latest version of R, which sometimes can also
lag behind on the official nixpkgs
repository.
When building an environment, if you see the following message:
warning: ignoring the client-specified setting 'trusted-substituters', because it is a restricted setting and you are not a trusted user
then this means that the cache was not configured properly. Try
cachix use rstats-on-nix
again, and check that the
/etc/nix/nix.conf
file has a line that looks like
trusted-users = root YOUR_USERNAME
. If not, add it using
admin/root privileges.
Making your own cache
If you need packages that are not included in the
rstats-on-nix
cache, you can very easily roll out your own
binary cache. Simply build the environment once on one machine, and then
push the binaries in a cache for re-use. Refer to the vignette
vignette("z-binary_cache")
to learn how to set up your own
cache.