HOW TO SUBMIT DATA AND METADATA TO THE PORTAL?

This page covers the submission system documentation of the Data Resource Portal. We are collecting file and code assets from Common Fund programs to make them Findable, Accessible, Interoperable, and Reusable (FAIR) within the Data Resource Portal. To submit assets, you must be logged in and registered. Registration involves being assigned a role by an administrator. To register, please send us an email at help@cfde.cloud.

XMT

XMT files are text based files which contain a collection of sets of a given entity type. The 'X' in XMT stands for the entity that the sets contain For example, .gmt files are XMT files that contain a collection of gene sets while .dmt files are XMT files that contain a collection of drug sets. On each row of the XMT file, the first column contains the Term associated with the set while all other columns contain the set entities.

C2M2

The Crosscut Metadata Model (C2M2) is a collection of files coded in the frictionless data package format. The collection of files are a zipped set of TSV files containing metadata standardized to a set of known ontologies. Please explore the C2M2 technical wiki for more information about how to prepare your metadata into C2M2 compatible files. Please also see the C2M2 section in the Documentation page of the CFDE Workbench portal on how to create C2M2 files.

KG Assertions

A knowledge graph is a network that illustrates the relationship between different entities which may come from different datasets. A knowledge graph consists of three main components: nodes, edges and labels. Nodes are the entities represented in the knowledge graph e.g GO Ontology terms. Edges characterize the relationship between nodes e.g. co-expressed with. Knowledge graph assertions are files which contain information about the nodes and edges that could be used to create a knowledge graph. For example, a KG Assertions file for nodes would contain columns which define information about each node: id, label, ontology_label. A KG Assertions file for edges would contain columns that comprises the necessary information about each edge: its source and target nodes, the labels for these nodes and their relationship.

Attribute Table

Attribute tables are files containing tables that describe the relationship between two entities with one entity type on the rows (e.g genes) and another on the columns (e.g tissue types). The intersection of a given row and column is then a value defining nature of the relationship between the row entity and the column entity e.g. the qualitative score of similarity between a given gene and a given tissue type.

The recommended extensions for each file asset type are:
    C2M2: .zip
    KG Assertions: .zip
    Attribute Table: .h5 or .hdf5
    XMT: .(x)mt e.g .gmt or .dmt

ETL

Extract, transform, load (ETL) is the process of converting the DCC raw data into various processed data formats such as the C2M2, XMT, KG assertions, attribute tables, and database tables.The ETL URL should point to the DCC GitHub repo containing the scripts that process the data by the DCC to generate these processed datasets.
Example: LINCS ETL script

API

It is expected that each DCC will have a URL to a page that documents how to access each DCC data and tools via APIs. Moreover, APIs should be documented in a standard format and the recommended standard is OpenAPI. In addition, it is recommended to deposit these API into the API repository SmartAPI.
OpenAPI: The OpenAPI specification provides a formal standard for describing REST APIs. OpenAPI specifications are typically written in YAML or JSON.
SmartAPI: This is a community-based repository for depositing APIs documented in the OpenAPI specification. It features additional metadata elements and value sets to promote the interoperability of RESTful APIs.
Learn more about generating an OpenAPI or SmartAPI specification on the Documentation page.
Example: exRNA openAPI link

Playbook Workflow Builder (PWB) Metanodes

A PWB metanode is a workflow engine component implemented by defining the semantic description, typescript-constrained type, and functionality of a node in the network of PWB workflows. See Playbook Partnership documentation and Documentation page for more information about developing and publishing metanodes. The form requires a GitHub link to a script describing a Playbook metanode.
Example: PWB Metanode created by the Metabolomics DCC

Entity Page Template and Example

The Entity Page Template and Example are links to:

  1. A template used to create the landing page displaying the datasheet about a gene, a metabolite, and protein, a cell type, or other entities from a DCC;
  2. The example URL provides a valid URL to an existing entity page that presents a single view of a given entity.

Example of a template from GTEx: https://www.gtexportal.org/home/gene/<GENE_NAME>.

Example live entity page from GTEx: https://www.gtexportal.org/home/gene/MAPK3

Chatbot Specifications

Chatbot specifications URL is a link to a manifest file containing metadata and OpenAPI specifications which can be used to develop a chat plugin for large language models. These plugins allow the large language models to function as specialized chatbots that have access to the exposed API endpoints described in the manifest files and can call these APIs based on user input. See ChatGPT plugins documentation for more information on how to develop chatbot specifications.
Example: ai-plugin specs template

Apps URL

An Apps URL is a link to a page(s) that serves a listing of bioinformatics tools, workflows, and databases produced by the DCC.
Example: LINCS Apps URL

Not Approved

This is the first stage of approval. All assets that are just uploaded or submitted by a DCC uploader will first be placed in this category. The asset will be tagged by the icon on the Uploaded Assets page, icon which represents that the file was not reviewed by the DCC approver or evaluated by the DRC.

DCC Approved

When an asset has been approved by a DCC approver (appointed by each DCC), the status of the asset will be updated to 'DCC Approved' which is tagged by the icon under the DCC Status column on the Uploaded Assets page.

DRC Approved

When an asset has been approved by an appointed DRC approver, the status of the asset will be updated to 'DRC Approved'. This status is tagged by the icon under the “DRC Status” column on the Uploaded Assets page. Please note that DCC and DRC approval status are independent of each other.

Current

An asset tagged by the icon under the 'Current' column on the Uploaded Assets page is considered the current version of that file type for a given DCC.

Archived

An asset tagged by the icon under the 'Current' column on the Uploaded Assets page, is considered an archived version of that asset type. Please note that both DCC and DRC approvers can change the current status of an asset.

As a Common Fund Data Coordinating Center (DCC) you have 3 role options for your users of the submission system:

User

This is a general user of the platform who cannot upload, approve, or view non-public files. You can have as many users in this role as you want.

Uploader

Can submit data packages, but can't approve data packages/files. Users can see files that they submitted for their DCC, but can't approve them. You can have as many users in this role as you want.

Approver

Can submit new packages and approve a submitted package. You can have as many users in this role as you want.

Any given person in your DCC can only have 1 role. To give a member of your DCC Approver or Uploader privileges, contact the DRC via email with the following information about the member:

    Name
    Email
    Role
    DCC

Please also indicate if the user has previously logged into the portal (has a user account) or has never accessed the portal (is a new user).

File Upload Steps

  • Go to the Data and Metadata Upload Form OR Click on the "Submit" tab in the navigation bar or in the footer and click on the "Submit and Manage File/Code Assets" button.
  • On the Upload Form page, upload your processed data by either dragging and dropping it in the upload box, or clicking in the box or on the "Choose File" button.
  • The file you have selected should appear under “File to Upload”. If you select a wrong file, you can delete it by clicking on the delete icon next to the file name or by re-uploading the correct file.
  • Select the DCC that the files to upload were generated from. Only DCCs that you are affiliated with will be provided as an option in the dropdown menu. If you are affiliated with a DCC and the option is not provided, please contact the DRC to update this information.
  • Select the file asset type that you wish to upload the file as and click on the "Submit Form" button.
  • Unexpected File type: There are file extensions that are expected for each file asset type. If the extension of the selected file does not match one of the expected extensions based on the entered File Asset Type, a dialog box will appear requesting you to confirm your upload of this unexpected file type. If the unexpected file type is intentional, click on the 'Yes Continue' button to proceed with the upload, otherwise click 'No' to cancel the upload.
    The recommended extensions for each file asset type are:
      C2M2: .zip
      KG Assertions: .zip
      Attribute Table: .h5 or .hdf5
      XMT: .(x)mt e.g .gmt or .dmt
  • If an upload is successful, a green banner with “Success! File Uploaded” should appear. If an upload is unsuccessful, a red banner with an error message will appear with the reason for the upload error. Ensure that the file you have selected for upload files has either a .csv, .txt, .zip or .(x)mt file extension and is not larger than 5GB.
  • Details of your uploaded file should appear on the Uploaded Assets page.

File Integrity Validation

A checksum is a digital fingerprint that can be made from a sequence of bytes, otherwise known as a bitstream e.g. the contents of a file. Just like a fingerprint, a checksum is unique to the bitstream. Any change to the bitstream, however big or small, will cause the value of its checksum to change completely. Checksums can be used to detect changes in the contents of a file which occur during file upload and download. During file submission on the site, file integrity is verified using the SHA256 checksum algorithm. A checksum is calculated from the file a user upload browser-side and compared to the checksum calculated from the file received by the AWS S3 bucket. If these checksum values are the same, which shows that the file was unchanged/uncorrupted during upload, the file upload is successful. Otherwise, if the values are different, the system will throw an error.

  • The checksum of a successfully uploaded file is displayed on the Uploaded Assets page under the File Info dropdown of each file.
  • To verify file integrity after downloading a file from the portal:
    • Download the intended file
    • Calculate the checksum in your terminal:
      For Windows:
      certutil -hashfile [file location] SHA256
      

      For Linux:
      sha256sum [file location]
      

      For MacOS:
      shasum -a 256 [file location]
      
    • If the string that is returned is the same as that displayed for the file on the portal, then the file contents have not been changed during download

Code Asset Submission Steps

  1. Go to the Code Assets Upload Form page OR Click on the "Submit" tab in the navigation bar or in the footer and click on the "Submit and Manage File/Code Assets" button on the Submit page:
  2. On the Code Assets Upload Form, fill out all the fields:
    • Select the DCC for which the asset is affiliated with
    • Select the code asset type you wish to submit from the available options ETL, API, PWB Metanode, Entity Page Template, Chatbot Specifications and Apps URL. If submitting an API asset. Please see the API Code Asset Submission Steps section
    • Enter the URL for the code asset in the URL field. Only valid HTTPS URLs are accepted.
  3. After clicking on the “Submit Form” button:
    • If an upload is successful, a green banner with “Success! Code Asset Uploaded” should appear.
    • If an upload is unsuccessful, a red banner with an error message will appear.
  4. Details of your uploaded code asset should appear on the Uploaded Assets page.

API Code Asset Submission Steps

  1. Follow Steps 1-3 of the Code Asset Submission Steps section.
  2. Enter the URL of the page that documents the DCC APIs.
    • If the API documentation meets OpenAPI specifications, check the OpenAPI Specifications box.
    • If the API documentation is deposited in the SmartAPI registry, check the Deposited in SmartAPI box and insert the SmartAPI URL (the link to the page that displays the page of the API on the SmartAPI website) in the provided textbox.

Asset Upload Submission Troubleshooting/FAQ:

  1. Before uploading, ensure that all your account information has been entered/is accurate on the My Account page
    • If your email is missing, please fill it out and click 'Save Changes' or press the enter key
    • If you do not have any DCCs associated with your account, please contact the DRC to update your information.
    • If a DCC that you are affiliated with (and you are an uploader for) is not listed as one of your DCC options, please contact the DRC through email to update your information.
    • If Role is inaccurate, please contact the DRC to update your information.
  2. If you are to be an Uploader or Approver for your DCC and have “Access Denied” on the Code Assets Upload Form and Uploaded Assets pages, please contact the DRC through email to grant you access.
  3. If a mistake has been made in a submission, go to the Uploaded Assets page, delete the incorrectly submitted asset by clicking on the delete icon on the row of the given file and reupload the corrected file.

This section is for DCC and DRC Approvers Only

  1. Go to the Uploaded Assets page OR Click on the “Submit” tab in the navigation bar or in the footer, click on the "Submit and Manage File/Code Assets" button on the Submit page and on the Uploaded Assets tab
  2. Here you will find all uploaded assets that fall under your jurisdiction.
    • For DCC Approvers, these are all assets that have been uploaded or submitted for your DCC.
    • For DRC Approvers, these are all assets that have been uploaded/submitted by uploaders across all DCCs.
  3. All unapproved assets that you are authorized to approve will be marked by the “Approve Upload” button under the DCC status or DRC status columns for DCC and DRC Approvers respectively.
  4. To approve an asset, click on the “Approve Upload” button to approve the file.
  5. To remove the approved status of a asset, click on the button under the DCC/DRC status column. This reverses the Approval action.
  6. Similar steps are done to set an asset as the most current version.
    • To toggle between setting an asset as Current and Archived, click on the button under Current column. Please note that:
      • Multiple assets of the same asset type can be set as current for a DCC.
      • DCC and DRC approvers are authorized to change the current status of assets for affiliated DCCs/all DCCs respectively.

Troubleshooting/FAQ:

  1. If you are to be a DCC or DRC Approver and have “Access Denied” on the Uploaded Assets page, please contact the DRC through email to change your role and grant you access.
  2. If a DCC that you are affiliated with is not listed as one of your DCC options on the My Account page, please contact the DRC through email to update your information. You will not be allowed to approve uploaded files for this DCC otherwise.

Both Uploaders and Approvers can delete uploaded assets.

  1. On the Uploaded Assets page, click on the delete icon next to the asset you wish to delete.
  2. A pop up will appear verifying your decision to delete the given asset.
  3. Click on "Yes, Delete" to confirm the deletion of the asset. Please note that the delete operation is permanent.
  4. For DCC and DRC approvers: If a current asset is deleted, please update the current status of the otherwise most to update DCC asset of that type.

This section is for Admin Users Only

Create a User:

  1. Go to the Admin page and click on the "Create New User" button,
  2. Fill out the new user's information and click the “Create User” button. If successful, a banner with “User Creation Successful” should appear.

Update User Information

  1. Go to the Admin page and select the users whose information is to be updated.
  2. In the dialog box that appears, for each user, select their new role and DCC information and click “Update”. An alert with “User Information Updated” will appear if the update operation is successful.
  3. When all selected users' information have been updated, click on “Done” or outside the dialog box.

Delete Users

  1. Go to the Admin page and select the users to delete.
  2. Click on the “Delete Users” button to delete selected users. Please note that the delete operation is irreversible.