File Types
XMT
XMT files are text based files which contain a collection of sets of a given entity type. The 'X' in XMT stands for the entity that the sets contain For example, .gmt files are XMT files that contain a collection of gene sets while .dmt files are XMT files that contain a collection of drug sets. On each row of the XMT file, the first column contains the Term associated with the set while all other columns contain the set entities.
C2M2
The Crosscut Metadata Model (C2M2) is a collection of files coded in the frictionless data package format. The collection of files are a zipped set of TSV files containing metadata standardized to a set of known ontologies. Please explore the C2M2 technical wiki for more information about how to prepare your metadata into C2M2 compatible files. Please also see the C2M2 section in the Documentation page of the CFDE Workbench portal on how to create C2M2 files.
KG Assertions
A knowledge graph is a network that illustrates the relationship between different entities which may come from different datasets. A knowledge graph consists of three main components: nodes, edges and labels. Nodes are the entities represented in the knowledge graph e.g GO Ontology terms. Edges characterize the relationship between nodes e.g. co-expressed with. Knowledge graph assertions are files which contain information about the nodes and edges that could be used to create a knowledge graph. For example, a KG Assertions file for nodes would contain columns which define information about each node: id, label, ontology_label. A KG Assertions file for edges would contain columns that comprises the necessary information about each edge: its source and target nodes, the labels for these nodes and their relationship.
Attribute Table
Attribute tables are files containing tables that describe the relationship between two entities with one entity type on the rows (e.g genes) and another on the columns (e.g tissue types). The intersection of a given row and column is then a value defining nature of the relationship between the row entity and the column entity e.g. the qualitative score of similarity between a given gene and a given tissue type.
Code Asset Types
ETL
Extract, transform, load (ETL) is the process of converting the DCC raw data into various processed data formats such as the C2M2, XMT, KG assertions, attribute tables, and database tables.The ETL URL should point to the DCC GitHub repo containing the scripts that process the data by the DCC to generate these processed datasets.
Example: LINCS ETL script
API
It is expected that each DCC will have a URL to a page that documents how to access each DCC data and tools via APIs. Moreover, APIs should be documented in a standard format and the recommended standard is OpenAPI. In addition, it is recommended to deposit these API into the API repository SmartAPI.
OpenAPI: The OpenAPI specification provides a formal standard for describing REST APIs. OpenAPI specifications are typically written in YAML or JSON.
SmartAPI: This is a community-based repository for depositing APIs documented in the OpenAPI specification. It features additional metadata elements and value sets to promote the interoperability of RESTful APIs.
Learn more about generating an OpenAPI or SmartAPI specification on the Documentation page.
Example: exRNA openAPI link
Playbook Workflow Builder (PWB) Metanodes
A PWB metanode is a workflow engine component implemented by defining the semantic description, typescript-constrained type, and functionality of a node in the network of PWB workflows. See Playbook Partnership documentation and Documentation page for more information about developing and publishing metanodes. The form requires a GitHub link to a script describing a Playbook metanode.
Example: PWB Metanode created by the Metabolomics DCC
Entity Page Template and Example
The Entity Page Template and Example are links to:
Example of a template from GTEx: https://www.gtexportal.org/home/gene/<GENE_NAME>.
Example live entity page from GTEx: https://www.gtexportal.org/home/gene/MAPK3
Chatbot Specifications
Chatbot specifications URL is a link to a manifest file containing metadata and OpenAPI specifications which can be used to develop a chat plugin for large language models. These plugins allow the large language models to function as specialized chatbots that have access to the exposed API endpoints described in the manifest files and can call these APIs based on user input. See ChatGPT plugins documentation for more information on how to develop chatbot specifications.
Example: ai-plugin specs template
Apps URL
An Apps URL is a link to a page(s) that serves a listing of bioinformatics tools, workflows, and databases produced by the DCC.
Example: LINCS Apps URL
Asset Approval Status
Not Approved
This is the first stage of approval. All assets that are just uploaded or submitted by a DCC uploader will first be placed in this category. The asset will be tagged by the icon on the Uploaded Assets page, icon which represents that the file was not reviewed by the DCC approver or evaluated by the DRC.
DCC Approved
When an asset has been approved by a DCC approver (appointed by each DCC), the status of the asset will be updated to 'DCC Approved' which is tagged by the icon under the DCC Status column on the Uploaded Assets page.
DRC Approved
When an asset has been approved by an appointed DRC approver, the status of the asset will be updated to 'DRC Approved'. This status is tagged by the icon under the “DRC Status” column on the Uploaded Assets page. Please note that DCC and DRC approval status are independent of each other.
Current vs Archived Status
Current
An asset tagged by the icon under the 'Current' column on the Uploaded Assets page is considered the current version of that file type for a given DCC.
Archived
An asset tagged by the icon under the 'Current' column on the Uploaded Assets page, is considered an archived version of that asset type. Please note that both DCC and DRC approvers can change the current status of an asset.
User Roles
As a Common Fund Data Coordinating Center (DCC) you have 3 role options for your users of the submission system:
User
This is a general user of the platform who cannot upload, approve, or view non-public files. You can have as many users in this role as you want.
Uploader
Can submit data packages, but can't approve data packages/files. Users can see files that they submitted for their DCC, but can't approve them. You can have as many users in this role as you want.
Approver
Can submit new packages and approve a submitted package. You can have as many users in this role as you want.
Any given person in your DCC can only have 1 role. To give a member of your DCC Approver or Uploader privileges, contact the DRC via email with the following information about the member:
Please also indicate if the user has previously logged into the portal (has a user account) or has never accessed the portal (is a new user).
Data and Metadata Upload Form
File Upload Steps
File Integrity Validation
A checksum is a digital fingerprint that can be made from a sequence of bytes, otherwise known as a bitstream e.g. the contents of a file. Just like a fingerprint, a checksum is unique to the bitstream. Any change to the bitstream, however big or small, will cause the value of its checksum to change completely. Checksums can be used to detect changes in the contents of a file which occur during file upload and download. During file submission on the site, file integrity is verified using the SHA256 checksum algorithm. A checksum is calculated from the file a user upload browser-side and compared to the checksum calculated from the file received by the AWS S3 bucket. If these checksum values are the same, which shows that the file was unchanged/uncorrupted during upload, the file upload is successful. Otherwise, if the values are different, the system will throw an error.
certutil -hashfile [file location] SHA256
sha256sum [file location]
shasum -a 256 [file location]
Code Assets Upload Form
Code Asset Submission Steps
Asset Approval Steps
This section is for DCC and DRC Approvers Only
Delete File or Code Assets
Both Uploaders and Approvers can delete uploaded assets.
Admin User Documentation
This section is for Admin Users Only
Instructional Video