GHub data storage practicalities
Here we provide some information about quotas, storage, and data for the GHub project. The idea is to orient you to how storage is used for the project. Please contact us if you have questions.
The machine that you are interacting with now, as you browse the GHub site, is a web server housed at San Diego Supercomputing Center. We fondly refer to it as "the GHub instance". This beefy machine manages user sessions, serves web content, coordinates GHub's tools, and communicates with CCR's high performance computing cluster to run larger jobs. As a registered GHub user, you use this machine each time you log in to the site!
Most GHub tools run locally on this GHub instance. Each registered user has a quota on the GHub instance for temporarily storing data files and results from tool runs. Since GHub is a shared resource, it's not a place for long-term storage, so quotas are small, starting at 1GB. Users manage their own files, downloading and deleting as needed, to remain below their local quota. If need be, users can request a modest quota extension.
GHub makes use of project storage and scratch space housed at UB CCR's high performance computing center. Both CCR's scratch and project storage are high-end, parallel filesystems tuned to operate with the compute cluster, and are very fast. Both can also be accessed via Globus utilities, ensuring fast, secure file transfer. These storage offerings are not just a hard drive to store data; they provide high-speed integration with the CCR compute cluster, access via Globus, and reliable backup of our files.
GHub tools that use CCR compute resources can use CCR's scratch space for writing intermediate results.
Scratch space is specialized, high-speed storage that's accessible from the computing cluster. It is a bit like the "scratch paper" you might use while computing something by hand; it's a temporary place for writing files, sometimes big ones, during calculations. The GHub group has a 10 TB (terabyte), 4 million file quota on CCR scratch.
Files on scratch are never backed up. Rather, files there are deleted 60 days after last access.
Ghub's scratch directory is located at /panasas/scratch/grp-ghub at CCR. This location should be used, rather than group or user space, during calculations on the cluster. To do so, create your own subdirectory of scratch to write to. Remember that its contents will be deleted regularly!
Examples of GHub tools that use scratch are the crevasse detection tool and the netCDF file regridding tool. These are more computing-intensive than typical GHub tools, since they need to access numerous files and utilize substantial memory all at one time, so they use the CCR cluster to perform their calculations. These tools write results to a scratch subdirectory called ghubjobs. Users running these tools can retrieve the results of their runs from that location via the GHub scratch Globus endpoint.
Like GHub's project space, scratch space is housed in CCR's machine room on the UB Medical Campus, for proximity to the high performance computing cluster's nodes.
In addition to global scratch, compute nodes on the CCR cluster have local scratch space built into them for saving intermediate results during a data-intensive computation. These nodes cannot store data beyond the length of time the computation lasts, and one node's local scratch cannot be accessed by a different node. Thus, if you need your intermediate results to persist beyond the length of the calculation, or if you need them to be accessible by multiple nodes, please use the /panasas/scratch location reserved for GHub.
Read about CCR scratch space:
The bulk of GHub's project storage is at /projects/grid/ghub, housed in CCR's machine room on UB's Medical Campus. GHub presently has access to 20TB of storage there. Our project space can be accessed by the CCR computing nodes, and is an enterprise-level, highly redundant storage solution designed for 99% uptime. Project space is safe for data storage, since files there are backed up. Backups are performed by UB CIT and stored at a separate site.
GHub project storage is being used to store numerous terabytes of model output for the ISMIP6 project, and smaller amounts of data for the CmCt tool and for model outputs created by NCAR. Several different Globus endpoints serve the project storage for GHub, to provide rapid access to the data. Like CCR's scratch space, project space is housed in CCR's machine room on the UB Medical Campus, for proximity to the high performance computing cluster's nodes.
For further information about project storage at CCR, consult this CCR documentation:
Join us on GHub! Register now.