Hubverse Release Process#

Parts of this document are adapted for The Hubverse from The Carpentries Developer’s Handbook (c) The Carpentries under the CC-BY 4.0 license.

Looking for a checklist?

Executive Summary#

The release process is now a general workflow that can help to reduce the size of pull requests:

  1. Iterate on small bug fixes and PRs on branches, merging into main as a stable development branch

  2. When ready, bump the version, add an annotated git tag, and release

  3. Bump the version in main back to a development version

This workflow applies to both R and Python packages, although the implementation specifics will vary between languages.

Background#

While the release process described here applies to both R and Python, the impetus for adopting it was making the Hubverse R packages available on the Hubverse R-Universe. R-Universe checks for new versions hourly, which allows users to get up-to-date packages without running out of GitHub API query attempts.

The Hubverse does not have a Python equivalent to the R-Universe. Instead, Python packages are distributed via PyPI, Python’s package index.

In order to maintain quality, packages are only sent to the R-Universe or PyPi if they have been formally released on GitHub. This allows us to add new experimental, non-breaking features incrementally, without changing the stable deployments.

Previous state#

Prior to updating the Hubverse release process in August 2024, each pull request merged to the main branch of a repository effectively meant the release of a new version. Ideally, this meant that our Git history would look like a series of adjacent belt loops:

        gitGraph
    commit id: "abcd"
    commit id: "efgh (0.14.0)"
    branch feature1
    checkout main
    checkout feature1
    commit id: "ijkl"
    commit id: "mnop"
    checkout main
    merge feature1 id: "gpne (0.14.1)"
    branch feature2
    checkout feature2
    commit id: "qrst"
    commit id: "uvwx"
    checkout main
    merge feature2 id: "punx (0.14.2)"
    

However, a pattern often emerges where more than one feature in a particular release is desired and, because we release directly to main, the graph looks like a thundercloud:

        gitGraph
    commit id: "abcd"
    commit id: "efgh (0.14.0)"
    branch feature1
    checkout main
    checkout feature1
    commit id: "ijkl"
    commit id: "mnop"
    branch feature2
    checkout feature2
    commit id: "qrst"
    checkout feature1
    commit id: "uvwx"
    checkout feature2
    commit id: "yz12"
    checkout feature1
    merge feature2 id: "xu12"
    checkout main
    merge feature1 id: "gpxe (0.14.1)"
    

The problem with the second graph is that, as the number of extra features grows, the size of the PR that will be the release becomes larger and larger. Moreover, the exact changes that were needed in the original PR are mixed in with all the changes from the child branches, meaning that it is more difficult to retrospectively review that PR.

New release process#

The release process adopted in August 2024 alleviates the “many features = large PR” problem and makes it easier to multiple people to work on package features simultaneously.

Now that we distribute packages outside of GitHub itself, builds can be pinned to Releases on specific tags, resulting in a git graph that looks similar to the first graph, but it can also enable developers to work in parallel on independent features, allowing users to intermittently test these features by installing from the default branch in GitHub.

        gitGraph
    commit id: "abcd"
    commit id: "efgh" tag: "0.14.0"
    branch feature1
    branch feature2
    checkout main
    checkout feature1
    commit id: "ijkl"
    commit id: "mnop"
    checkout main
    checkout feature2
    commit id: "qrst"
    commit id: "uvwx"
    checkout main
    merge feature1 id: "gpne"
    merge feature2 id: "punx"
    branch release
    checkout release
    commit id: "2zy1"
    checkout main
    merge release id: "1u2n" tag: "0.14.1"
    

Versioning#

The hubverse is built using very basic semantic versioning using the X.Y.Z pattern. Development versions are indicated as

  • X.Y.Z[.9000] for R

  • X.Y.Z[.dev<n>] for Python.

Here’s an example for an R package. If we take the previous git graph and labelled the version number, you can see how the first release in our graph was a minor release, followed by a patch release. Everything that has a .9000 attached is considered in-development. Read below for details and examples of each of these semantic versions:

        gitGraph
    commit id: "abcd (0.13.2.9000)"
    commit id: "efgh (0.14.0)" tag: "0.14.0"
    branch feature1
    branch feature2
    checkout main
    checkout feature1
    commit id: "ijkl (0.14.0.9000)"
    commit id: "mnop (0.14.0.9000)"
    checkout main
    checkout feature2
    commit id: "qrst (0.14.0.9000)"
    commit id: "uvwx (0.14.0.9000)"
    checkout main
    merge feature1 id: "gpne (0.14.0.9000)"
    merge feature2 id: "punx (0.14.0.9000)"
    branch release
    checkout release
    commit id: "2zy1 (0.14.1)"
    checkout main
    merge release id: "1u2n (1.14.1)" tag: "0.14.1"
    
X

Major version number: this version number will change if there are significant breaking changes to any of the user-facing workflows. That is, if a change requires users to modify their scripts, then it is a breaking change. EXAMPLE: There are no examples of hubverse packages undergoing a major version change, but schemas v3.0.0 included the breaking change of switching sample/output_type_id (an array) to sample/output_type_id_params (an object). The breaking change meant that it was a non-trivial task to switch from a v2.0.1 schema to v3.0.0 and the users. This was reflected in the month-long timeline between the announcement of the change on 2024-05-13 to the actual release on 2024-06-18.

Y

Minor version number: this version number will change if there are new features or enhanced behaviors available to the users in a way that does not affect how users who do not need the new features use the package. This number grows the fastes in early stages of development. EXAMPLE: The {hubValidations} package version 0.5.0 gives users the ability to reduce compute time by allowing them to sub-set the configuration by the new output_type argument to expand_model_out_grid(). Users who change nothing in their workflows or scripts will not see any change for the better or worse.

Patch

Patch version number: this version number will change if something that was previously broken was fixed, but no new features have been added. EXAMPLE: the {hubAdmin} package needed an enhancement for schema version 3.0.1, which removed the requirements for CDF outputs to have a specific pattern. The actual fix was not something a user would interact with, so version 1.0.1 was a patch release.

dev versioning

Development version indicator: this version number indicates that the package is in a development state and has the potential to change. When it’s on the main branch, it indicates that the features or patches introduced have been reviewed and tested. This version is appended after every successful release. When this development version indicator exists, the documentation site will have an extra dev/ directory that contains the upcoming changes so that we can continue to develop the hubverse without disrupting the regular documentation flow.

  • For R packages, advice on incrementing the version number from The R Packages Book (Wickham and Bryan, 2023):

    Increment the development version, e.g. from 9000 to 9001, if you’ve added an important feature and you (or others) need to be able to detect or require the presence of this feature. For example, this can happen when two packages are developing in tandem. This is generally the only reason that we bother to increment the development version. This makes in-development versions special and, in some sense, degenerate. Since we don’t increment the development component with each Git commit, the same package version number is associated with many different states of the package source, in between releases.

  • For Python packages, we use setuptools-scm to calculate and apply versions automatically, including the version numbers for incremental development releases. When it’s added to a Python package (via the pyproject.toml file), setuptools-scm inspects git tags and dynamically sets the version number as described in its documentation.

Release Process#

The sections below outline a general release process for both R and Python packages. For language-specific checklists, refer to:

Non-urgent releases only#

This release process assumes that we have accumulated bugfixes and/or features in the main branch, which we are ready to release. If you have a bug that needs to be patched immediately and you have new features in the main branch that are not yet ready to be released, then you should create a hotfix instead.

Releases#

Releases are a concept that is specific for GitHub. Under the hood, releases are created from commits or tags. When you use the GitHub web interface, you can choose to have a tag automatically created from a release (though the tag will not be annotated or signed, see below for creating stronger, signed tags). All releases should contain a summary of the changes that happened between this version and the previous version. A good example of this are the hubverse schema release notes, and GitHub offers to automatically fill the release notes with titles and links to the pull requests that populated the release, which is a good summary (assuming people create good PR titles).

Once the release is created on GitHub, R packages will be available on the R-Universe in about an hour or less. Python packages are released to PyPI via a GitHub workflow that runs when a release tag is added to the repo.

Signed Tags#

It is good practice to create a tag to release from so that we have a way to track the releases even if we eventually move away from GitHub. Better practice is to create annotated tags with the -a flag, which adds metadata to the tag, meaning that it cannot be moved. Even better practice is to create signed tags with the -s flag, which adds a cryptographic signature to the annotated tag metadata that allows anyone to verify that it came from your computer and not someone pretending to be you.

Zhian likes to create tags via the command line because he has set up his git configuration to use a gpg signature so the tags and the releases are both verified. Recently, Git and GitHub added support for creating signatures via an SSH key, which is a lot more approchable than GPG.

Unfortunately, you cannot create a signed tag on the GitHub web interface and it must be done locally.

git tag -s X.Y.Z -m 'summary of changes'
git push --tags

Once you have a signed tag created, you can then go to the Releases tab on GitHub and create the release (or you can use the gh CLI utility and run gh release create X.Y.Z to interactively create the release).

R: Releasing to CRAN#

R-Universe is not the official R package index—CRAN is. If you want to release a package to CRAN, it’s a relatively straightforward process that takes five minutes on your part as the maintainer.

The time between sending your package to CRAN and the time it becomes available to the public usually ranges anywhere from 30 minutes to two or three days, depending on volume and feedback from CRAN maintainers. Because CRAN submissions are often a back and forth process, you should not tag your release until you have confirmed that CRAN has accepted the submission.

Some tips for CRAN submissions:

(you will restore this later with git restore DESCRIPTION). 1You will get a warning from devtools saying that all files should be tracked and committed before release. Double check that all that changed was removing the Remotes: field from the DESCRIPTION and proceed.

  • Commit the CRAN_SUBMISSION file when it changes

Tip

If you get an unhelpful or nasty review from a CRAN administrator, contact one of the other Hubverse maintainers who has experience submitting to CRAN (i.e. Zhian) for backup/strategy/comiseration. You can also see if you ran into a common issue by checking The CRAN Cookbook

Testing#

Packages will exist in two states: released and stable development. Both of these states need to pass checks (for example, R CMD check, tests, and coverage), but the challenge with two versions becomes: how do you ensure both versions work with both versions of the other packages?

The strategy is three-fold:

  • All packages in development are tested against development packages. It is important to remember that development versions must also not introduce breaking changes. Testing breaking changes should be performed in a separate branch.

  • On a release/hotfix, pull request, all packages are tested against the released packages to ensure no regressions.

  • All packages (released and devel) are tested weekly. This helps to avoid surprise reverse-dependency problems.

R Testing#

For R packages, testing against the development versions is achieved by setting the Remotes: section in your DESCRIPTION file; this ensures that when dependencies are installed for the GitHub workflow run, the hubverse dependencies are installed from GitHub and not the R-universe.

During releases and hotfixes (detected by checking that the branch name contains either /release/ or /hotfix/, the remote declarations of hubverse packages are removed (before the dependencies are installed) with the command sed -i "" -E -e '/[ ][ ]hubverse-org/d' DESCRIPTION.

Python Testing#

At the time of this writing, the Hubverse does not have a documented process for testing Python packages against development versions.