Introduction to GO annotations
A GO annotation is a statement about the function of a particular gene. GO annotations are created by associating a gene or gene product with a GO term. Together, these statements comprise a “snapshot” of current biological knowledge. Hence, GO annotations capture statements about how a gene functions at the molecular level, where in the cell it functions, and what biological processes (pathways, programs) it helps to carry out.
There are four pieces of information that uniquely identify a GO annotation. Although there are additional components a curator can use to indicate more information, including relations and annotation extensions, at the very minimum an annotation consists of:
- Gene product (may be a protein, RNA, etc.)
- GO term
- Reference
- Evidence
Different pieces of knowledge regarding gene function may be established to different degrees, which is why each GO annotation always refers to the evidence upon which it is based. All GO annotations are ultimately supported by the scientific literature, either directly or indirectly. The Reference almost always a PMID or GO Reference (GO_REF). In GO, the supporting evidence is presented in the form of a GO Evidence Codes and either a published reference or description of the methodology used to create the annotation. The GO evidence codes describe the type of evidence and reflect how far removed the annotated assertion is from direct experimental evidence, and whether this evidence was reviewed by an expert biocurator.
Semantics of a GO annotation
Associations of gene products to GO terms are statements that describe
- Molecular Function: the molecular activities of individual gene products
- Cellular Component: where the gene products are active
- Biological Process: the pathways and larger processes to which that gene product’s activity contributes
General principles of GO annotations
- Annotations represent the normal functions of gene products.
- A gene product can be annotated to zero or more terms from each ontology.
- Each annotation is supported by an GO Evidence Codes from the Evidence and Conclusions Ontology and a reference.
- Gene products are annotated to the most granular term in the ontology that is supported by the available evidence.
- By the transitivity principle, an annotation to a GO term implies annotation to all its parents (except for NOT annotations, which propagate down the ontology).
- GO annotations are meant to reflect the most up-to-date view of a gene product’s role in biology.
- Because biological knowledge changes, annotations for a given gene product may change to reflect changes in knowledge and/or changes in the ontology.
- There is an open-world assumption, that is, if a gene product is unannotated then its role is still unknown.
Annotation relations
A specific set of terms from the Relations Ontology (RO), sometimes referred to as ‘gp2term’ (gene product to term) relations, are used to link gene products to GO terms in standard annotations. The modifer NOT, as well as qualifiers like enables, acts upstream of or within , and *enables are used in the GAF format. For the full list of permitted gp2term relations, see the GO wiki. Some of the most common relations are:
The NOT modifier
NOT is used when a GO term is expected to apply to a gene product, but an experiment, sequence analysis, etc. proves otherwise. NOT makes an explicit statement that a gene product has been experimentally demonstrated not to be able to carry out a particular activity or it has been shown to have lost that function (e.g. sequence analysis showing a loss of an active site or rapid divergence after a duplication event) over the course of evolution. The NOT modifier is not to be used for negative or inconclusive experimental results.
Contrary to positive annotations, NOT statements propagate down the ontology, such that the annotation gene product
NOT|enables
protein kinase activity
means that the gene product does not enable protein serine/threonine kinase activity or protein tyrosine kinase activity either. Full guidelines for NOT are on the GO wiki.
The enables relation
enables links a gene product to a Molecular Function it executes.
The contributes to relation
contributes to links a gene product to a Molecular Function executed by a macromolecular complex, in which the Molecular Function cannot be ascribed to an individual subunit of that complex. Only the complex subunits required for the Molecular Function are annotated to the Molecular Function term with ‘contributes to’.
The involved in relation
involved in links a gene product and a Biological Process in which the gene product’s Molecular Function plays an integral role.
The acts upstream of or within relation
acts upstream of or within links a gene product and a Biological Process when the mechanism relating the gene product’s activity to the Biological Process is not known.
The located in relation
located in links a gene product and the Cellular Component, specifically a cellular anatomical anatomy or virion component, in which a gene product has been detected.
The part of relation
part of links a gene product and a protein-containing complex.
The colocalizes_with relation
colocalizes_with indicates a peripheral association of the protein with an organelle or complex. For example, human microtubule depolymerase KIF2A is dynamically localized to spindle poles, regulating the degradation of microtubule during mitotic progression.
Annotation extensions
Annotation extensions provide additional information about a GO annotation that cannot be captured in a single GO term. Please see publications describing annotation extensions: Huntley & Lovering 2017 and Huntley et al. 2014. Annotation extensions are available in both the GAF File Format and the GPAD File Format.
Annotation quality control
The GO Consortium implements a number of automated queries to check the quality of the annotations submitted to the GO database.
GO-Causal Activity Models
GO-Causal Activity Models (GO-CAMs) use a defined “grammar” for linking multiple standard GO annotations into larger models of biological function (such as “pathways”) in a semantically structured manner. Minimally, a GO-CAM model must connect at least two standard GO annotations (GO-CAM example).
The primary unit of biological modeling in GO-CAM is a molecular activity, e.g. protein kinase activity, of a specific gene product or complex. A molecular activity is an activity carried out at the molecular level by a gene product; this is specified by a term from the GO MF ontology. GO-CAM models are thus connections of GO MF annotations enriched by providing the appropriate context in which that function occurs. All connections in a GO-CAM model, e.g. between a gene product and activity, two activities, or an activity and additional contextual information, are made using clearly defined semantic relations from the Relations Ontology.
GO-CAMs can be browsed and visualized at http://geneontology.org/go-cam
Types of GO annotation files
- Gene association file (GAF) 2.2
-
Gene Product Association Data (GPAD) 2.0 files + Gene Product Information (GPI) 2.0 files: companion files
- (Deprecated) Gene association file (GAF) 2.1
- (Deprecated) Gene association file (GAF) 2.0
- (Currently being deprecated) Gene Product Association Data (GPAD) 1.1/1.2 files + Gene Product Information (GPI) 1.2 files: companion files
Downloads
- Download GO annotations by species
- Download GO-CAM models
GO as a dynamic source of biological annotations
GO aims to represent the current state of knowledge in biology, hence it is constantly revised and expanded as biological knowledge accumulates.
With the ever-increasing number of published articles, experiments and methods, covering all biology with the latest annotations is always challenging. We therefore invite researchers and computational scientists to submit requests for missing, erroneous or out-of-date annotations to improve the GO database.
Statistics
GO statistics are available both for the current release and over time.