AWS Onboarding (and Offboarding)#
The Hubverse team currently provides cloud hosting for hubs. A “cloud-enabled” hub is one that mirrors its data and configuration to an Amazon Web Services (AWS) S3 bucket. By default, the current hub directories are synced in near-real-time to AWS:
auxiliary-data
hub-config
model-abstracts
model-metadata
model-output
target-data
Cloud Onboarding Setup#
Because each hub has its own S3 bucket and other dedicated AWS resources, a member of the Hubverse team needs to be involved in cloud onboarding.
If a hub admin wants to enable cloud hosting, these are the steps to follow:
Decide on a name for the hub’s S3 bucket. S3 bucket names must follow Amazon’s bucket naming rules and be globally unique.
Create the hub’s AWS resources by following instructions in the
hubverse-infrastructure
README.
Note: Don’t be intimidated by “creating AWS resources.” The process is automated and requires a three line config change.Once the AWS resources are in place, submit a PR to the hub:
Add a
cloud
section to theadmin.json
file. See the Hubverse schema documentation for more details.Add the
hubverse-aws-upload.yaml
GitHub workflow file. This is a Hubverse-maintained workflow that runs after a PR is merged to the hub’smain
branch. You do not need to make changes to this file.Update the hub’s README to include information about accessing data from S3. The
hubTemplate
repo has some boilerplate to use as a starting point.
Tip
As an example of this process, here are the pull requests used to onboard the variant-nowcast-hub
to AWS:
Other notes:
A hub can be onboarded at any time.
The S3 data sync occurs after pull requests are merged to the hub. The sync process does not interfere with hub operations (for example, if AWS is down, hub validations and other tasks will still work).
Mirroring a hub’s data to the Hubverse-hosted AWS account does not require AWS tokens or other secrets to be stored in its repository.
How it works#
At a high level, this diagram describes the interactions between hub users, the hub hosted on GitHub, and the hub’s data mirrored to AWS:
--- config: theme: base themeVariables: primaryBorderColor: '#3c88be' primaryColor: '#dbeefb' --- sequenceDiagram create actor A as hub admins and modelers create participant h as hub A->>h: PR: update hub config A->>h: PR: submit model-output h-->h: run validations h-->h: generate target data create participant hc as Hubverse cloud h->>hc: sync config, target, and model output data actor B as hub data user B->>hc: query hub data hc->>B: return data
Cloud Offboarding#
Removing a hub from Hubverse AWS hosting is essentially a reverse of the onboarding process, with a few caveats.
Update the hub’s
admin.json
file, setting thecloud.enabled
value to false.
Note: You can also remove the entirecloud
section if you prefer.Optional: Remove the
hubverse-aws-upload.yaml
workflow file from the hub. Leaving this workflow intact won’t harm anything because it checksadmin.json
forcloud.enabled
= true before syncing data to AWS.To completely remove the hub’s AWS resources:
Manually delete the contents of the hub’s S3 bucket (AWS does not permit deleting S3 buckets that contain objects).
Submit a PR to
hubverse-infrastructure
that removes the hub from thehubs.yaml
file.Once the PR is merged, the hub’s AWS resources will be deleted.