Finding a sequence

From Bioinformatics.Org Wiki

Jump to: navigation, search

The most common task in bioinformatics must be the acquisition of some bioinformatics data on which to operate. Usually this in the form of a nucleic acid or protein sequence, stored as characters in the appropriate alphabet together with a header of related information: for example some kind of unique identifying number the species from which the original biological substrate was obtained, the names of any authors who published the sequence and so on.

You may have already generated your own sequence data experimentally. In this case you are likely to want to find sequences which are identical or similar (and therefore possibly related) to yours. The task is then one of similarity search.


...I have a description.

A paradoxical problem generated by the success of the bioinformatics revolution is the increasing difficulty of navigating the huge amount of data available. Once you could print out most of the existing sequence databases onto paper and cram them into a single binder. Now a search for "actin" alone will pull out hundreds and hundreds of sequences. The key to find what you want is to develop your own discriminatory skills rather than rely on computers to figure out what it is you're really after.

Use Entrez-PubMed

Make sure you are clear about your aim first. If you are looking for a sequence for a specific scientific purpose then you might be best to start with a relevant human-generated publication. For example, you have cloned a gene which is part of a well-characterised biochemical pathway and you want to find other sequences of the same functional gene product in other species (orthologues) Entrez PubMed is your friend.

PubMed is a huge and very comprehensive database of the biomedical scientific literature., created by the U.S. National Library of Medicine (NLM). Entrez PubMed is another indispensable resource of the U.S. National Centre for Biotechnology Information (NCBI). Both are part of the U.S. Department of Health and Human Services National Institutes of Health

Use Swiss-Prot

Swiss-Prot is curated by human beings.

Use SRS at the RFCGR


Use Boolean logic


Use cunning


...I have an accession number.


...I have another sequence.

This section will be expanded---and there will be a more basic and detailed explanation for novice searchers, but, in the meantime, here are the top tips cribbed from the excellent paper by Hugh B. Nicholas Jr., David W Deerfield II and Alexander J. Ropelewski in BioTechniques.

...I'm not sure whether or not to use the defaults.

Hugh, David and Alexander again on when not to use the default search parameters provided by a server.

Personal tools
wiki navigation