Ghub can be used as a repository for data products generated during your research. To host your data products on GHub as an open repository to the scientific and education communities, please read on.
- Small Datasets
- Large Datasets
- Contributing Data
- Metadata for Datasets
- More Information
Datasets encompassing files of modest size (under 1 GB) can be stored with GHub in two different ways. To do so, you can create either a GHub Project or a GHub Resource of type Data sets and collections.
Much larger files and datasets can also be stored with GHub. We will use the Resources method outlined below for collecting and documenting the dataset's metadata (abstract, credits, citations, etc.). Please create a ticket to let us know about your dataset's space needs and background information, and begin the Resource metadata creation as described.
Where are my data stored?
For large datasets, the data itself will be stored at UB CCR's data center. We will create a Globus endpoint for your data and provide read and write access to you.
The GHub platform offers several different ways to publish, document, cite, and upload your data to make it available to others. These options are summarized for you here, along with links to complete documentation.
Projects provide a way to associate data files with to-do lists, notes, citations, and documents, and allow other members of GHub to contribute to the collaboration, without creating a new release of the dataset. The size of all Project files in total must be under 1 GB, and the maximum individual file size is 100 MB.
Who has access to my Project?
A GHub Project is unique in that it is a) accessible only to registered GHub members, and b) editable by all registered GHub members. Projects can be a good way to collect and organize materials prior to releasing them as a Resource.
Any registered GHub user can create, view, and contribute to GHub Projects.
Where are my data stored?
Files you upload for your Project are stored on and accessible from GHub's dedicated Google Drive space.
How do I create a Project?
To explore and get started with Projects, select Projects from the GHub Collaborate menu. The system will display the GHub Projects home page as pictured in the screen shot at left. From there, you can explore existing GHub projects and begin one of your own by clicking Add Project.
Full documentation, helpful how-to videos, and further information on using Projects is available at the HUBzero platforms's Project help pages.
If you create a Project, please contact us if you want to add it to GHub's dataset listing. This process does not occur automatically.
Resources enable you to associate background information such as citations, and documentation with your dataset and release the whole package in a citable way. The Dataset resource is suitable for either small, self-contained datasets, or large ones.
What features are included for Resources?
The Hub supports several different types of resources; Datasets are only one. All have elements in common.
Resources on the Hub can include an abstract, citations, supporting documentation, and topic tagging. As the creator of a Resource, you can assign a development team of other GHub users, who can work on the supporting material and descriptive metadata that describe the Resource.
New versions of the Resource can be released as needed. The Hub provides a fully guided release flow that enables you to collect the needed information to document your dataset. Once released, the Resource supports an area for user questions and answers, citations, a usage report, and user-contributed wishlists.
Who has access to my Resource?
You can control who has access to a Resource you create. We recommend you set the Resource's Group to GHub and the Access Level to Public, as shown in the screenshot below.
Any registered GHub user can create a GHub Resource. Only users named to the Resource development team may edit it.
Where are my data stored?
For smaller datasets, Resources are stored directly on the GHub webserver. Large datasets are stored at UB CCR's datacenter, where they are accessible to users via the Globus app and command line interface.
How do I create a Resource?
To explore and get started with Dataset Resources, navigate to: https://vhub.org/groups/ghub/resources. The system will display the GHub Resources home page as pictured in the screen shot at left. From there, you can explore existing GHub resources and begin one of your own by clicking Start a Contribution.
Full documentation and further information on Resources is available at the HUBzero platforms's Resource help pages.
If you create a Resource, please contact us if you want to add it to GHub's dataset listing. This process does not occur automatically.
Publications provide another way to publish your own datasets on GHub. In addition to features described above for Resources, they offer the ability to assign curators who review your contribution prior to publication.
For further information about Publications, please refer to HUBzero user documentation
When you publish a dataset Resource on GHub, you have the opportunity to comprehensively specify the dataset, its use, its origin, attributions, and provenance, along with other information, together called metadata--data about the dataset. We ask that dataset contributors honor these requirements, so that dataset users understand the uses and limitations of your data product.
When publishing a dataset, please include at a minimum a summary of the following information in its landing page:
- Abstract/General description
- Datatype specification
- References and citations
- Sources of error, caveats
- Credits, including responsible team and funding source
- How to use the data
As needed, more comprehensive descriptions of the data can be uploaded and associated with the dataset as Supporting Documentation.
You can consult the following official HUBzero documentation for guidance:
Not registered? Join us on GHub! Register now.