Download annotations

Getting annotations for a selected organism

This page has instructions for getting GO annotations for almost any organism. If your organism is not available in the official GO products, UniProt GAFs by proteome, or NCBI RefSeq, we recommend using the latest version of InterProScan for unannotated organisms.

Required Files

Most tools that use GO annotations take two input files:

  1. a file with the annotations (in Gene Annotation Format, or GAF)
  2. a file with the GO ontology structure (in Open Biomedical Ontology Format, or OBO)

Because the ontology and annotations are constantly being improved over time, we recommend downloading the latest version of the annotations for your organism and the corresponding ontology file for that GO version. The version should be specified in the header of the annotation file.

Citing GO

To ensure reproducibility for any publication where GO was used at any point in the research, please include:

1. Commonly studied organisms

This GAF download page has annotations for selected commonly-studied species.

For organisms with many expert-curated GO annotations (those with MODs, dedicated databases, etc.), we recommend downloading annotations from the links in the above-linked table. These organisms often have a large number of manual annotations supported by direct experimental evidence as well as annotations based on other evidence types.

2. All other organisms

For all other organisms we recommend downloading annotations from one of the following sources: UniProt or NCBI RefSeq. Both of these provide highly accurate computational methods. The header of the annotation file specifies the version of the ontology you should use to accompany the annotation file. Older versions of the GO ontology can be downloaded from the GO download archives.

  • UniProt GAFs by proteome: Annotation files are available for about 20,000 complete proteomes (one protein sequence per protein-coding gene). Use these files if you want to use UniProtKB identifiers.
  • NCBI RefSeq: If your organism has a reference sequence in NCBI, GO annotations are available through NCBI’s FTP server. Use these files if you want to use Entrez Gene identifiers. Annotation files are available for all eukaryotic genomes available at NCBI. Note that GO annotations are not currently available for archaea, bacteria or viruses.
    • Go to https://ftp.ncbi.nlm.nih.gov/genomes/refseq/
    • Navigate to your organism, e.g. Anopheles_gambiae/ is in the /invertebrate directory
    • Open the representative/ directory, and open the directory within that
    • Download the file with the suffix gene_ontology.gaf.gz, e.g. GCF_943734735.2-RS_2023_12_gene_ontology.gaf.gz

More information on GO annotation formats

  • GO has monthly releases
  • Annotation files are taxon-specific, with a few exceptions including the Reactome and Candida Genome Database files
  • Current format guides:

Programmatic access to GO annotations

As for any resource in GO, GO annotations are accessible through the DOI-versioned release stored in Zenodo and can be retrieved using BDBag. Read more about programmatic access.

Error or omission ?

Any errors or omissions in annotations should be reported by writing to the GO helpdesk.