Protein Annotation in ResNet
ResNet protein annotations are created by using only publicly available data sources. Information from public sources is compiled into a single map file in Ariadne's proprietary RNEF XML format that is then imported into ResNet database using the Pathway Studio importer.
The principal source of annotation of proteins in ResNet is EntrezGene. The EntrezGene dump is parsed to convert the irregular EntrezGene format into plain text format suitable for import into ResNet. The following EntrezGene fields are imported as ResNet attributes:
LOCUSID as LocusLink ID
ORGANISM as organism
OFFICIAL_SYMBOL as Name
OFFICIAL_GENE_NAME or PREFERRED_GENE_NAME as Description
PRODUCT, ALIAS_SYMBOL, ALIAS_PROT as Alias
NM, XM, NC, NP, XP, ASSEMBLY, CONTIG, ACCNUM as GenBank ID
HGNC ID as Hugo ID
UNIGENE as Unigene ID
SUMMARY and SUMFUNC as Note
OMIM and PHENOTYPE_ID as OMIM ID
GO: molecular function as GO Molecular Function
GO: biological process as GO Biological process
GO: cellular component as GO Cellular Component
RGD ID and MGI ID are parsed from the corresponding values of the field LINK.
- EntrezGene records for discontinued records are not imported.
- The following attributes from additional public sources are mapped onto an EntrezGene record using the following rules:
- Microarray IDs are mapped using EntrezGene IDs from Affymetrix and Agilent chip annotation files
- EntrezGene IDs that have been moved by NCBI staff are also imported as Locus Link ID attributes for an existing protein record
Starting from the ResNet database version 4.0, proteins will be annotated with 26 protein classes from Ariadne’s classification and cell localization attribute. Both annotations are manually curated. Proteins will also be annotated with Medline references imported from EntrezGene.
Merging protein records from different organisms
Merging is performed using the HomoloGene database, an NCBI-supported database which specifically lists the homologous genes in different organisms.
Orthologous mapping for other organisms
Orthologs for all organisms are determined by a BLAST of the proteins from the model organism against human proteins. Protein sequences are obtained from RefSeq database at NCBI. The best reciprocal orthologs are found using the global alignment similarity calculated from local BLAST alignments as described in (Ispolatov et al).

