OBSOLETE Gene Ontology annotation by SEA-PHAGE biocurators
Ivan Erill, SEA-PHAGE biocurators; 2014
This GO reference describes the criteria used by biocurators of the SEA-PHAGE consortium for the annotation of predicted gene products from newly sequenced bacteriophage genomes in the SEA-PHAGE phagesdb.org and other databases and in the GenBank records periodically released to NCBI for these genomes. In particular, this GO reference describes the criteria used to assign evidence codes ISS, ISA, ISO, ISM, IGC and ND. To assign ISS, ISA, ISO and ISM evidence codes, SEA-PHAGE biocurators use a varied array of bioinformatics tools to establish homology and conservation of sequence and structure functional determinants with proteins from multiple organisms with published association to experimental GO terms and lacking NOT qualifiers. These proteins are referenced in the WITH field of the annotation using their xref database accession. The primary tools for homology search in ISS, ISA, ISO and ISM assignments are BLASTP and HHpred, using a maximum e-value of 10^-7 for BLASTP and a minimum probability of 0.9 for HHpred, and manual inspection of alignments in both cases. For ISS and ISA assignments, BLASTP alignments are required to have at least 75% coverage and 30% identity. For ISO assignments, orthology is further validated using reciprocal BLASTP with the identified hit. For HHpred results, ISS or ISM annotations are made only if the source for the original GO annotation explicitly defines a matched domain function, or if more than half of the domains of the query protein are identified in the matching protein. All ISS, ISA, ISO and ISM assignments entail the manual verification of the source for the GO term in the matching protein sequence and critical curator assessment of the likelihood of preservation of function, process or component in the context of bacteriophage biology. IGC codes are assigned on the basis of suggestive evidence for function based on synteny, as inferred from whole-genome comparative analyses of multiple bacteriophage genomes using primarily the Phamerator software platform, and with special emphasis on the bacteriophage virion structure and assembly genes. When extensive review of published literature on putative homologs reveals no experimental evidence of function, component or process for a particular gene product, it is assigned an ND evidence code and annotated to the root term for Cellular Component, Molecular Function and Biological Process. As part of the review process for assignment of ISS, ISA, ISO, IGC and ISM evidence codes, SEA-PHAGE curators are required to analyze the reference literature for identified matches and shall perform GO annotations with appropriate evidence codes if these were not available.