Gene Product Association Data (GPAD) file description

The Gene Product Association Data (GPAD) file contains annotation data provided by the Gene Ontology Consortium in standardized tab-delimited text file format. Each line in the file represents an association between a gene product and a GO term, with an evidence code, a reference to support the association, and other data associated with the gene product or the annotation. This page is a summary of GPAD 2.0 format; for full technical details and a summary of changes from previous GPAD formats, see the GitHub specification page. Note that the GPAD file must be submitted together with the corresponding GPI file, based on the same file version.

GO also provides annotations as GAF files and recommends use of the GAF format for most use cases.

For general information on GO annotations, please see the introduction to GO annotation page.

GPAD 2.0 format description

File Header

Mandatory elements of the GPAD 2.0 file header

!gpad-version: 2.0
!generated-by: database (must be listed in dbxrefs.yaml)
!date-generated: YYYY-MM-DD or YYYY-MM-DDTHH:MM

Other header elements may be included such as links to the submitters project page, funding sources, ontology versions, etc.

! URL: e.g. http://www.yeastgenome.org/
! Project-release: e.g. WS275
! Funding: e.g. NHGRI
! Columns: file format written out
! go-version: PURL
! ro-version: PURL
! gorel-version: PURL
! eco-version: PURL

GPAD file fields

The GPAD format comprises 12 tab-delimited fields. Some fields are optional, some fields are mandatory and cardinality varies by field and other conditions. For fields that permit multiple values, values should be separated by pipes (|) for OR statements and commas (,) for AND statements.

Column Content Required? Cardinality Example
1 DB:DB_Object_ID required 1 SGD:S000002164
2 Negation optional 0 or 1 NOT
3 Relation required 1 RO:0002331
4 GO ID required 1 GO:0043409
5 Reference required 1 or greater PMID:26546002
6 Evidence Code required 1 ECO:0000316
7 With [or] From optional 0 or greater SGD:S000003631
8 Interacting taxon ID optional 0 or greater NCBITaxon:5476
9 Date required 1 2018-01-19
10 Assigned by required 1 SGD
11 Annotation Extension optional 0 or greater RO:0002233(UniProtKB:Q00772),BFO:0000050(GO:0071852)
12 Annotation Properties optional 0 or greater noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=orcid:0000-0003-3212-6364

Definitions and requirements for field contents

1. DB Object ID

A unique identifier for the item being annotated. The DB prefix is the database from which the DB Object ID is drawn and must be one of the values from the set of GO database cross-references. The DB:DB Object ID is the combined identifier for the database object. The DB is not necessarily the same as the group submitting the file, which is named in column 10 Assigned by. Examples:

  • UniProtKB:P99999
  • SGD:S000002164
  • MGI:MGI:1919306

The identifier usually references the canonical form of a gene or gene product including functional RNAs. Identifiers may also describe gene variants, distinct proteins produced by to differential splicing, alternative translational starts, post-translational cleavage or post-translational modification. If the gene product is not a canonical gene or gene product identifier, the Gene Product Information (GPI) file should contain information about the canonical form of the gene or gene product.

  • Cardinality = 1

2. Negation

Negation is indicated by the ‘NOT’ value.

  • Cardinality = 0 or 1

3. Relation

This column is populated with relations from the Relation Ontology that describe how the annotated biological entity relates to the GO term with which it is associated. See also the documentation on qualifiers in the GO annotation guide.

  • Cardinality = 1

4. GO ID

The GO identifier for the term attributed to the DB object ID. Must be in the format GO:GOID.

  • Cardinality = 1

5. Reference

One or more unique identifiers for a single source cited as an authority for the attribution of the GO ID to the DB object ID. This may be a literature reference or a database record. Valid references are one of: PubMed, DOI, GO_REF, MOD reference. The syntax is DB:accession, e.g. PMID:2676709, SGD_REF:S000047763.

Only one reference can be cited on a single line in the gene association file. If a reference has identifiers in more than one database, multiple identifiers for that reference can be included on a single line. For example, if the reference is a published paper that has a PubMed ID, the PubMed ID must be included; if the model organism database has its own identifier for the reference, that can also be included (e. g.: PMID:2676709|SGD_REF:S000047763.)

  • Cardinality = 1 or >1
  • For cardinality >1, values must be pipe-separated.

6. Evidence code

One of the codes from the Evidence & Conclusion Ontology, ECO. See the wiki linked from our evidence code documentation for more information.

  • Cardinality = 1

7. With [or] From

This field is used with specific ECO codes to capture an additional identifier supporting the evidence for the annotation. For example, it can identify another gene product to which the annotated gene product is similar (ISS) or interacts with (IPI). Population of the With/From is mandatory for certain evidence codes, see the documentation for the individual evidence codes for more information.

  • Cardinality 0, 1, >1 with the following rules:
    • Cardinality must be 0 for evidence codes IDA, TAS, NAS, or ND.
    • Cardinality must be 1, >1 for IEA, IC, IGI, IPI, ISS & child terms of ISS.
    • For cardinality >1 pipes or commas may be used: a pipe is used to separate independent evidence (e.g. FB:FBgn1111111|FB:FBgn2222222). A comma indicates grouped evidence, e.g. two of three genes in a triply mutant organism.

8. Interacting taxon ID

Taxonomic identifier for interacting organism to be used only in conjunction with terms that have the biological process term ‘GO:0044419 biological process involved in interspecies interaction between organisms’or the cellular component term ‘GO:0018995 host cellular component’ as an ancestor. Identifiers must come from NCBI Taxonomy database and have the NCBITaxon: prefix.

  • Cardinality = 0 or 1
  • For cardinality >1, values must be pipe-separated

9. Date

Date on which the annotation was made; format is YYYY-MM-DD. Conforms to the date portion of ISO 8601.

  • Cardinality = 1

10. Assigned By

The database which made the annotation one of the values from the set of GOC groups; used for tracking the source of an individual annotation. Value may differ from the DB:DB Object ID column: any annotation that is made by one database and incorporated into another retains the original value.

  • Cardinality = 1

11. Annotation Extension

Annotation extensions allow GO terms in standard annotations to be further specified, using gene products, chemicals, cell types, anatomical structures, to provide additional biological context. The cross-reference is prefaced by an appropriate relationship from the Relation Ontology. Multiple extensions may be entered.

  • Cardinality = 0, 1, >1
  • For cardinality > 1, use of a pipe (|) specifies an independent statement (OR) and is equivalent to making separate annotations, i.e. not all conditions are required to infer the annotated GO term. Use of a comma (,) specifies a connected statement (AND) and indicates that all conditions are required to infer the annotated GO term. In this case, ‘OR’ is a weaker statement than ‘AND’, therefore will be correct in all cases. Pipe and comma separators may be used together in the same annotation extension field.

12. Annotation Properties

The Annotation Properties column contains a list of “property_name = property_value”. If the property exists, the property is single valued. Annotation properties include GO-CAM information and comments on annotations.

Example: id=GOA:2113861687 noctua-model-id=gomodel:6086f4f200000223 model-state=production  creation-date=2019-07-20T12:04:08
  • Cardinality = 0 or 1
  • For cardinality > 1, values must be pipe-separated