discoveryHub Wrappers

The discoveryHub will typically access life sciences data through via the Wrappers that are supplied with the product and where appropriate write data out to target data stores using similar wrappers. Frequently, users may wish to store intermediate results data in a local staging database or warehouse/application database and for this a set of high throughput wrappers and data loaders are optionally available to meet huge performance demands.

We can classify different types of wrapper based upon the type of data source they access e.g. Algorithms, Web Sites, Flat Files, Free Text Documents, RDBMS, XML etc, and we will typically support most types of data access that a client requires. In the cases where a new set of wrappers are required we can supply consultancy and/or the use of our wrapper development kit to aid users in the process of understanding the protocols and requirements to build a wrapper.

This is a subset of the wrappers that are currently supported and tested on a daily basis.

BLAST Algorithms
FASTA3The FASTA3 program compares protein or DNA sequence to a sequence database using using an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson.
WU-BLAST2 The Washington University BLAST 2.0 program performs rapid similarity searches of protein and nucleotide sequence databases.
Web PSI-BLASTThis is the (unfinished) driver to NCBI web site for PSI-BLAST.
Database Search
ACeDBThe ACeDB software performs general purpose queries against the underlying ACeDB database that is popular amongst biologists to store genomic data (e.g. C. elegans, rice and grass sequence databases).
FASTA3The FASTA3 program compares protein or DNA sequence to a sequence database using using an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson.
NCBI TaxonomyThe NCBI Taxonomy database is a hierarchical classification of organisms whose nucleotide sequences can be found in the GenBank/EMBL/DDBJ databases.
OracleOracle is a relational database management system (RDBMS) that is popular amongst biological scientists for storing genomic, clinical trial, and other data (e.g. Incyte, HGS databases).
SCOPThe SCOP (Structural Classification of Proteins) database provides a comprehensive description of the structural and evolutionary relationships of proteins with known structures including all protein entries in the Protein Data Bank (PDB).
SeqIndexSeqIndex is a suite of programs for indexing and matching long strings like amino acid and DNA sequences.
SeqIndex for DataSeqIndex for Data can improve search access speeds to very large data files using pre-indexed key fields.
Sybase Sybase is a relational database management system that is popular amongst biological scientists for storing genomic and other data (e.g. GDB and GSDB).
Web DBESTWeb dbEST, a web site located at NCBI (National Center for Biotechnology Information) is a database of Expressed Sequence Tags (dbEST) that contains nucleic acid sequence data and other information on single-pass cDNA sequences or Expressed Sequence Tags (ESTs) from different organisms.
Web EntrezWeb Entrez, located at the NCBI (National Center for Biotechnology Information) web site, allows the broad querying of several integrated protein, nulceic acid and published literature databases, including GenBank/EMBL/DDBJ, SwissProt, PDB, PIR, PRF, PubMED, OMIM.
Web PDBWeb PDB, a web site located at the Protein Data Bank (PDB) of the Brookhaven National Laboratory, is a database of protein and nucleic acid structures determined using X-ray or NMR techniques that contains the coordinates of the structures along with information about the molecules and about the techniques used to determine the structures.
Web PatentWebPatent, two web sites located at the US Patent Office and IBM Corporation, allows searching of the US patent database for both claims and figures.
Web TIGRWeb TIGR, a web site located at The Institute for Genome Research (TIGR), provides access to several databases including the Human Gene Index (HGI) and the Expressed Gene Anatomy Database (EGAD).
Web UniGene Web UniGene, a web site located at the National Center for Biotechnology Information (NCBI), is an experimental system for automatically partitioning GenBank sequences into a non­redundant set of gene­oriented clusters.
WebEnzymeWebEnzyme, a web site located at the ExPASy web site at the Swiss Institute of Bioinformatics (SIB), is a repository of nomenclature of enzymes and allows the search of enzymes by EC (Enzyme Commission) number, enzyme class and other relevant descriptions.
Web Search
HTTP HTTP uses the standard http protocol to evaluate Uniform Resource Locators (URLs) and execute native http requests.
Text Analysis Algorithms
KEXKEX (Knowledge EXtraction tool) is a program that locates protein names in MEDLINE abstracts or other English biomedical text.
XML readerXML Reader allows reading of documents in the eXtended Markup Language (XML) without consulting a Data Type Definition (DTD) file
Pattern Recognition Algorithms
FASTA3The FASTA3 program compares protein or DNA sequence to a sequence database using using an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson.
HMMerSean Eddy's HMMER program suite (version 1.0) can be used to perform multiple sequence alignment or sensitive database searching using statistical hidden Markov model descriptions of a consensus sequence derived from a family of sequences.
HMMer 2 The HMMER2 program uses profile Hidden Markov Models (profile HMMs) to perform nucleotide or amino acid database searches and alignments.
PSORTKenta Nakai's PSORT II program predicts the sub-cellular localization sites (e.g. transmembrane, mitochodria, chloroplast) of proteins from an analysis of their amino acid sequences.
Prosite ScanISREC's (Swiss Institute of Experimental Cancer Research) PrositeScan program can compare a query protein sequence against a locally installed protein profile database to detect common sequence motifs.
Web BLOCKSThe Henikoffs' Web BLOCKS web site performs sequence similarity searches against the BLOCKS database, which is a collection of the most conserved regions in groups of related protein and nucleotide sequences.
Web HMM Search Web HMM Search, a web site centered at Washington University in St. Louis, searches a Pfam (protein families grouped by sequence similarity) database located in St. Louis for sequences similar to the input query sequence.
Web pfscanWeb pfscan, Philipp Bucher's pfscan 2.1 protein family recognition program, allows remote searching of a query protein sequence for common motifs or domains against a pre-curated database of protein family profiles at the ISREC (Swiss Institute for Experimental Cancer Research) web site.
hmmPfam hmmPfam uses the HMMER software package to search a locally installed Pfam (protein families grouped by sequence similarity) database for sequences similar to the input query sequence.
pfscan 2.1Philipp Buscher's pfscan 2.1 protein family recognition program, allows remote searching of a query protein sequence for common motifs or domains against a pre-curated database of protein family profiles at the ISREC (Swiss Institute for Experimental Cancer Research) web site.
Sequence Analysis Algorithms
BLAST 2.0The BLAST 2.0 program performs rapid similarity searches of protein and nucleotide sequence databases.
ClustalWClustalW is a general purpose multiple sequence alignment program for nucleic acid or protein sequences.
Codon Frequency Usage Codon Usage Frequency calculates simple codon frequencies, amino acid frequencies, and nucleotide frequencies while allowing for translation between nucleic acid and protein sequence. It can also be used for translating protein sequences to nucleic acid sequences and vice-versa using user-specified codon mapping tables.
FASTA3The FASTA3 program compares protein or DNA sequence to a sequence database using using an improved version of the rapid sequence comparison algorithm described by Lipman and Pearson.
HLAHuman Leucocyte Antigen).
HMMerSean Eddy's HMMER program suite (version 1.0) can be used to perform multiple sequence alignment or sensitive database searching using statistical hidden Markov model descriptions of a consensus sequence derived from a family of sequences.
HMMer 2 The HMMER2 program uses profile Hidden Markov Models (profile HMMs) to perform nucleotide or amino acid database searches and alignments.
MinAligZhang Louxin's (Minimum Alignment) is a simple pairwise sequence alignment program for nucleic acid or protein sequences.
ORF ExtractionJean-Michel Claverie’s JMC ORF Extraction program finds/predicts candidatereading frames in a supplied DNA sequence and subsequently translates them into protein sequences.
Prosite ScanISREC's (Swiss Institute of Experimental Cancer Research) PrositeScan program can compare a query protein sequence against a locally installed protein profile database to detect common sequence motifs.

SEG filterWootton and Federhen's SEG program identifies and masks segments of low compositional complexity in amino acid sequences.
Six Frame TranslationSix Frame Transaltion translates a DNA sequence into protein sequences using all three possiblereading frames from both strands of the nucleic acid sequence.
Ssearch3W. Pearson's Ssearch3 program performs similarity searches against databases of protein and nucleotide sequences using the Smith-Waterman algorithm and generates globally optimal pairwise sequence alignments.
WU-BLAST2 The Washington University BLAST 2.0 program performs rapid similarity searches of protein and nucleotide sequence databases.
Web BLAST Web BLAST performs similarity searches against databases of protein and nucleotide sequences using the BLAST website at NCBI (The National Center for Biotechnology Information).
Web BLAST 2Web BLAST 2 performs similarity searches against sequence databases of protein and nucleotide sequences using the BLAST 2.0 web site located at NCBI (The National Center for Biotechnology Information).
Web BLOCKSThe Henikoffs' Web BLOCKS web site performs sequence similarity searches against the BLOCKS database, which is a collection of the most conserved regions in groups of related protein and nucleotide sequences.
Web COGWeb COG, a web site located at NCBI (The National Center for Biotechnology Information), can be used to find whether a given protein or nucleic acid sequence is a member of a known COG (Clusters of Orthologous Group) which are groupings of functionally similar genes from different species that contain the same, common, ancient, conserved domains.
Web DBESTWeb dbEST, a web site located at NCBI (National Center for Biotechnology Information) is a database of Expressed Sequence Tags (dbEST) that contains nucleic acid sequence data and other information on single-pass cDNA sequences or Expressed Sequence Tags (ESTs) from different organisms.
Web HMM Search Web HMM Search, a web site centered at Washington University in St. Louis, searches a Pfam (protein families grouped by sequence similarity) database located in St. Louis for sequences similar to the input query sequence.
Web NetStartNetStart predicts translation initiation sites in vertebrates and Arabidopsis using neural nets.
Web PSI-BLASTThis is the (unfinished) driver to NCBI web site for PSI-BLAST.
Web PatentWebPatent, two web sites located at the US Patent Office and IBM Corporation, allows searching of the US patent database for both claims and figures.
Web VGC ORF TranslationWeb VGC ORF, located at the Virtual Genome Center web site, predicts candidatereading frames (ORFs) in very long, multi-gene, genomic DNA sequences and translates them into candidate protein sequences.
Web nnPredictWeb nnPredict, a web site located at UCSF (University of California San Francsico), predicts secondary structure from a protein sequence.
Web pfscanWeb pfscan, Philipp Bucher's pfscan 2.1 protein family recognition program, allows remote searching of a query protein sequence for common motifs or domains against a pre-curated database of protein family profiles at the ISREC (Swiss Institute for Experimental Cancer Research) web site.
hmmPfam hmmPfam uses the HMMER software package to search a locally installed Pfam (protein families grouped by sequence similarity) database for sequences similar to the input query sequence.
pfscan 2.1Philipp Buscher's pfscan 2.1 protein family recognition program, allows remote searching of a query protein sequence for common motifs or domains against a pre-curated database of protein family profiles at the ISREC (Swiss Institute for Experimental Cancer Research) web site.
      Sitemap