[Bioclusters] weird errors from formatdb

Joseph Landman bioclusters@bioinformatics.org
05 Dec 2002 16:16:57 +0500


On Thu, 2002-12-05 at 21:02, Jeremy Mann wrote:
> After ryncing the BLAST databases from bio-mirror.net, I am now in the
> process of formatting. I am encountering stdout errors when formatting
> the nr database. Here is snip:
> 
> <snip>
> 
> formatdb] ERROR: nr.phrOutput
> Blast-def-line-set.E.<title>
> Invalid value(s) [1] in VisibleString
> [AIG1#gi|12324508|gb|AAG52213.1|AC022288_12 AIG1; 4264-2635 [Arabidopsis
       ^
       |
Whoops... looks like the ">" character and the newline before it got
munged.  Did you perchance uncompress it on a PC running windows?  

Or did you grab the .zip file?

> I am using:
> 
> uncompress -c nr.Z | formatdb -i stdin -o T -n nr 

Nope, not the .zip.

Try a simple 

	uncompress nr.Z

then hand edit (vi) nr

	vi nr

After it loads, you should get a whole lotta FASTA formatted records:

        >gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR
        (CCK-A RECEPTOR) (CCK-AR)
        MDVVDSLFVNGSNITSACELGFENETLFCLDRPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFL
        LSLAVSDLMLCLFCMPFNLIPSLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLQSRVWQTKSHALK
        VIAATWCLSFTIMTPYPIYSNLVPFTKNNNQTGNMCRFLLPNDVMQQTWHTFLLLILFLIPGIVMMVAYGLISLELYQGI
        KFDAIQKKSAKERKTSTGSSGPMEDSDGCYLQKSRHPRKLELRQLSPSSSGSNRINRIRSSSSTANLMAKKRVIRMLIVI
        VVLFFLCWMPIFSANAWRAYDTVSAERHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGTPGVRG
        EMGEEEEGRTTGASLSRYSYSHMSTSAPPP
        >gi|1708198|sp|P80487|HHP_THICU HETEROTROPH-SPECIFIC PROTEIN
        AADDVTVVIGSAAPMSGPQ
        >gi|13878750|sp|Q9CDN0|RS18_LACLA 30S ribosomal protein S18
        MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ
        N
        >gi|13878816|sp|Q9CI15|TIG_LACLA Trigger factor (TF)
        MTVSFEKTSDTKGTLSFSIDQETIKTGLDKAFNKVKANISVPGFRKGKISRQMFNKMYGEEALFEEALNAVLPTAYDAAV
        KEAGIEPVAQPKIDVAKMEKGSDWELTAEVVVKPTVSLGDYKDLTVEVEATKEVSDEEVETRLTNSQNNLAELVVKETAA

Then run this through formatdb.  Do you still get errors?

-- 
Joseph Landman, Ph.D.
Michigan Center for Biological Information
University of Michigan
email:   scalable@umich.edu  or landman@ctaalliance.org
  web:   http://ctaalliance.org/MCBI/
voice:  +1 734 612 4615