GO subsets
What are GO subsets?
GO subsets (also known as GO slims) are simplified versions of the GO containing a reduced number of terms. There are several types of subsets:
- Binning: Binning subsets are intended to be used to summarize the molecular function/biological processes/cellular components for sets of genes, including entire genomes/proteomes.
- Ribbon: Ribbons are used to give a quick overview of the broad classes of terms annotated for a gene. Ribbon subsets are usually smaller than binning subsets.
- Exclusion List: Exclusion lists are terms that should not be used for annotations, either because the term is too broad to be informative (for example, “GO:0008152 metabolic process”), and/or because a more specific term should be used (for example, more specific children of “GO:0048856 anatomical structure development” should be used that describe the actual structure formed).
GO subsets are part of the ontology, under the tag subset. For example:
- [Term]
- id: GO:0048856
- name: anatomical structure development
- namespace: biological_process
- subset: goslim_generic
How are GO subsets used?
- GO subsets can be used to provide an overview of the range of functions and processes found in an organism’s genome or a group of organisms.
- GO subsets are also useful for addressing specific research needs in particular areas of biology. For instance, if for researchers specifically interested in the process of development or in the nuclear proteome, specific subsets allow them to focus solely on the terms under that section of the ontology.
- GO subsets can also be useful for simplifying searches or annotation operations, by reducing the number of choices a user is presented with.
Who creates and maintains GO subset?
- The GO consortium provides a generic subset which, like the GO itself, is species-agnostic, and which should be suitable for most purposes. In addition, many model organism-specific subsets have been created by GO consortium members and are available for download as listed below.
- Users and user communities can create their own GO subsets. Please contact the GO helpdesk for more information about creating and submitting your GO subsets.
- Groups who have created a GO subset are responsible for keeping it up to date as the ontology changes.
Download GO subsets
- The files available below for download are generated by script from that file.
- GO subsets are available in OBO, OWL as as well as JSON formats.
| Subset name | Maintainer | File name | OBO format | OWL format | json format | tsv format |
|---|---|---|---|---|---|---|
| Alliance of Genome Resources subset | Developed by GO Consortium for the Alliance of Genomes Resources | goslim_agr | obo | owl | json | tsv |
| Generic GO subset | GO Consortium | goslim_generic | obo | owl | json | tsv |
| Aspergillus subset | Aspergillus Genome Data | goslim_aspergillus | obo | owl | json | tsv |
| Candida albicans subset | Candida Genome Database | goslim_candida | obo | owl | json | tsv |
| Drosophila subset | FlyBase | goslim_drosophila | obo | owl | json | tsv |
| Chembl Drug Target subset | ChEMBL | goslim_chembl | obo | owl | json | tsv |
| Metagenomics subset | InterPro group | goslim_metagenomic | obo | owl | json | tsv |
| Mouse GO subset | Mouse Genome Informatics | goslim_mouse | obo | owl | json | tsv |
| Plant subset | The Arabidopsis Information Resource | goslim_plant | obo | owl | json | tsv |
| Prokaryote subset | GO Consortium | goslim_prokaryote | obo | owl | json | tsv |
| Protein Information Resource subset | PIR | goslim_pir | obo | owl | json | tsv |
| Schizosaccharomyces pombe subset | PomBase | goslim_pombe | obo | owl | json | tsv |
| Yeast subset | Saccharomyces Genome Database | goslim_yeast | obo | owl | json | tsv |
GO Exclusion Lists
For internal checking purposes, GO maintains two “anti-slims”, terms to which annotations should not be made. “Anti-slim” terms should never be used when creating a subset, and terms that are obsoleted are removed from subsets.
| Subset name | Usage | File name | OBO format | OWL format | json format | tsv format* |
|---|---|---|---|---|---|---|
| Do not annotate | The set of high level terms that are useful for grouping, but should have no direct annotations | gocheck_do_not_annotate | obo | owl | json | tsv |
| Obsoletion candidate | GO terms planned for obsoletion. This subsets serves as an early warning system both for users and curators | gocheck_obsoletion_candidate | obo | owl | json | tsv |
Related tools
- GO tools at the Lewis-Sigler Institute:
The Bioinformatics Group at Princeton’s Lewis-Sigler Institute provides several web-based GO tools:
- The Generic GO Term Mapper takes a list of genes with their detailed GO term annotations and maps them to broader GO slim terms, allowing users to bin their genes into broad functional categories.
- The Generic GO Term Finder identifies significant GO terms shared among a list of genes, helping discover what genes may have in common.
- LAGO (A Logically Accelerated GO Term Finder) is a implementation of the GO Term Finder algorithms with significant speed improvements.
These web-based tools support batch processing of gene lists and provide easily interpretable results. Additional information is available from the Lewis-Sigler Institute GO tools website. Users should note that web-based tools may have processing limitations and time constraints.