Page contents
Introduction to GO annotations
GO annotations come in two flavors: standard GO annotations and GO-CAM Models.
Standard GO annotations
- A standard GO annotation is a statement that links a gene product and a GO term via a relation from the Relations Ontology (RO). In standard GO annotations, each statement is independent; this is a key difference between standard annotations and GO-CAMs.
- A standard GO annotation minimally contains:
- a gene product; may be a protein, an miRNA, a tRNA, etc.
- a GO term
- a reference, usually a PMID, but DOIs and GO Reference (GO_REF) are also used.
- an evidence code, using a GO Evidence Code, which describes the type of evidence: experimental evidence, sequence similarity or phylogenetic relation, as well as whether the evidence was reviewed by an expert biocurator. If not manually reviewed, the annotation is described as ‘automated’.
- Any number of annotations can be made to a gene, in order to fully describe its function and the location(s) in which it acts.
- Annotation extensions are used to provide additional biological context to a GO term, using a relation from the Relations Ontology (RO) and a term from GO or an external ontology, e.g. UBERON. Further description of annotation extensions can be found in tha paper of Huntley & Lovering 2017 and Huntley et al. 2014.
Semantics of a standard GO annotation
- Associations of gene products to GO terms are statements that describe:
- Molecular Function: the normal molecular activity of a gene product; mutants and roles in disease are outside the scope for GO
- Biological Process: the pathways and larger processes to which the gene product’s activity contributes
- Cellular Component: where the gene product is located when the activity occurs
- By the transitivity principle, an annotation to a GO term implies annotation to all its is a and part of parents (also known as superclasses). Details about the ontology structure can be found on the Gene Ontology overview page.
- GO annotations are meant to reflect the most up-to-date understanding of a gene product’s role. As biological knowledge advances, annotations for a particular gene product may be updated to align with new insights or adjustments in the ontology.
- GO adopts an open-world model, meaning that the absence of an annotation for a specific class does not imply that the gene product lacks that function, is not localized to that cellular component, or is uninvolved in that biological process. Moreover, if a gene product is unannotated, it does not mean that its role is unknown. Genes for which no role has been demonstrated are annotated to the root term (molecular_function, biological_process, cellular_component) with the evidence code ND (No Biological Data available).
Gene product to term relations relations
- Gene product to term (‘gp2term’) relations link gene products to GO terms in standard annotations via a relation from the Relations Ontology (RO)..
- Any of the relations can be associated with the modifer NOT. For the full list of permitted gp2term relations, see the GO wiki. The most common relations are:
Relations between a gene product and a Molecular Function:
- enables links a gene product to a Molecular Function it executes.
- contributes to links a gene product to a Molecular Function executed by a macromolecular complex, in which the Molecular Function cannot be ascribed to an individual subunit of that complex. Only the complex subunits required for the Molecular Function are annotated to the Molecular Function term with ‘contributes to’.
Relations between a gene product and a Biological Process:
- involved in links a gene product and a Biological Process in which the gene product’s Molecular Function plays an integral role.
- acts upstream of or within links a gene product and a Biological Process when the mechanism relating the gene product’s activity to the Biological Process is not known.
Relations between a gene product and a Cellular Component:
- is active in links a gene product to the cellular location in which it enables its Molecular Function.
- located in links a gene product and the Cellular Component, specifically a cellular anatomical anatomy or virion component, in which a gene product has been detected.
- part of links a gene product and a protein-containing complex.
The NOT modifier
The NOT statement indicates that the gene product does NOT enable a Molecular Function, is not part of a Biological Process or is not located in or active in a specific Cellular Component. NOT statements are only used when a user might expect that the gene product would have a specific biological property (MF, BP or CC). NOT makes an explicit statement that a gene product has been experimentally demonstrated not to be able to carry out a particular activity or sequence analysis shows loss of an essential active site or rapid divergence after a duplication event over the course of evolution. The NOT modifier is not used for negative or inconclusive experimental results.
Both positive and NOT statements can be used between a single gene product and a GO term when there is unresolved conflicting experimental findings in the literature. If an isoform has a different function from the main isoform represented by the gene-centric entity, a NOT annotation can be captured together with the isoform identifier.
Contrary to positive annotations, NOT statements propagate down the ontology to more specific terms, such that the annotation gene product *NOT | enables* protein kinase activity means that the gene product does not enable protein kinase activity` and neither does it enable more specific functions, such as protein serine/threonine kinase activity and protein tyrosine kinase activity. |
GO-Causal Activity Models
- GO-Causal Activity Models (GO-CAMs) provide a system to extend GO annotations with biological context as well as causal connections between activities.
- The network representation of GO-CAMs enables pathway visualization and analysis.
- The biological context captured for each GO aspects corresponds to:
- Molecular Function: substrates (“input”), products (“output”), activators, inhibitors
- Biological Process: the broader process that the molecular function helps accomplish: cell cycle transition, transcription, signaling pathways, etc. Processes can be nested, i. e., a biological process can be part of another biological process. For example, a signaling pathway can be part of a developmental process, like the Wnt signaling pathway is part of dorsal/ventral pattern formation.
- Cellular Component: the cellular component, cell, and/or tissue the function/process take place.
- See the [GO-CAM example](http://model.geneontology.org/ for an illustration.
GO-CAM data model
The primary unit of biological modeling in GO-CAM is a molecular activity, e.g. protein kinase activity, of a specific gene product or complex. A molecular activity is an activity carried out at the molecular level by a gene product; this is specified by a term from the GO MF ontology. GO-CAM models are thus connections of GO MF annotations enriched by providing the appropriate context in which that function occurs. All connections in a GO-CAM model, e.g. between a gene product and activity, two activities, or an activity and additional contextual information, are made using clearly defined semantic relations from the Relations Ontology. A more complete dscription of GO-CAM can be found in Thomas & al 2019.
Browsing and visualizing GO-CAMs
GO-CAMs can be browsed and visualized at http://geneontology.org/go-cam.
Annotation Quality Control
The GOC runs a number of GO rules to ensure data integrity (for example, identifiers must be valid) that the annotations meet the GO curation guidelines.
GO as a dynamic source of biological knowledge
GO aims to represent the current state of knowledge in biology, hence it is constantly revised and expanded as biological knowledge accumulates. With the ever-increasing number of published articles, experiments and methods, covering all biology with the latest annotations is always challenging. We therefore invite researchers and computational scientists to submit requests for missing, erroneous or out-of-date annotations to improve the GO database.
Statistics
Overall GO statistics and detailed statistics are available. The statistics are also achived.
GO annotation files
- Gene association file (GAF) 2.2
-
Gene Product Association Data (GPAD) 2.0 files + Gene Product Information (GPI) 2.0 files: companion files
- (Deprecated) Gene association file (GAF) 2.1
- (Deprecated) Gene association file (GAF) 2.0
- (Currently being deprecated) Gene Product Association Data (GPAD) 1.1/1.2 files + Gene Product Information (GPI) 1.2 files: companion files
Downloads
- Download GO annotations by species
- Download GO-CAM models