Education Template

Sequence Retrivial Tools

Let's say you want to analyze co-evolution within or across one or more molecules. Do to this properly, you will need to collect a large set of ortholog sequences of this molecule across many species. Sure, you can simple search through NCBI and select and copy sequences one by one. But a faster approach is to access this database programmatically.

An easy way to do this is with the E-utilities made available by NCBI. With these, you can input a search term, and receive a file that contains the IDs of the genes you want to add to the collection. Then, using these IDs, you can fetch the sequence.

* Esearch
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=REPLACE_with_SEARCH_TERMS&retmax=REPLACE_with_MAX_NUMBER_of_RESULTS
First blank is search term, second blank is number of sequences to retrieve

* Remove all tags, replace all tags with commas. Input result in Efetch (below)

* Efetch
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=REPLACE_with_IDS&rettype=fasta


If the above solution seems to be too laborious, you can try some of the programs below, including an online form that helps expedite the above instructions. Find_Seqs automates the first step for any type of molecule, and NucSeqFetch is the proper followup for retrieving nucleotide sequences (important for looking at RNA co-evolution).

E-utilities Assistant (Online Version)

Type of molecule to retrieve:

Number of Results to Retrieve: , starting at result index:

Use the above two inputs as a way to retrieve sequences in batches, always changing the starting index - i.e. set to 100 and 0, collect a batch, then set to 100 and 100, then 100 and 200, etc.

Search Term (Currently only works well for Protein searchs!):



Find_Seqs

Please select the files you would like to download.

NucSeqFetch

Please select the files you would like to download.