On Thu, 2002-12-05 at 21:02, Jeremy Mann wrote: > After ryncing the BLAST databases from bio-mirror.net, I am now in the > process of formatting. I am encountering stdout errors when formatting > the nr database. Here is snip: > > <snip> > > formatdb] ERROR: nr.phrOutput > Blast-def-line-set.E.<title> > Invalid value(s) [1] in VisibleString > [AIG1#gi|12324508|gb|AAG52213.1|AC022288_12 AIG1; 4264-2635 [Arabidopsis ^ | Whoops... looks like the ">" character and the newline before it got munged. Did you perchance uncompress it on a PC running windows? Or did you grab the .zip file? > I am using: > > uncompress -c nr.Z | formatdb -i stdin -o T -n nr Nope, not the .zip. Try a simple uncompress nr.Z then hand edit (vi) nr vi nr After it loads, you should get a whole lotta FASTA formatted records: >gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR) MDVVDSLFVNGSNITSACELGFENETLFCLDRPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFL LSLAVSDLMLCLFCMPFNLIPSLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLQSRVWQTKSHALK VIAATWCLSFTIMTPYPIYSNLVPFTKNNNQTGNMCRFLLPNDVMQQTWHTFLLLILFLIPGIVMMVAYGLISLELYQGI KFDAIQKKSAKERKTSTGSSGPMEDSDGCYLQKSRHPRKLELRQLSPSSSGSNRINRIRSSSSTANLMAKKRVIRMLIVI VVLFFLCWMPIFSANAWRAYDTVSAERHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGTPGVRG EMGEEEEGRTTGASLSRYSYSHMSTSAPPP >gi|1708198|sp|P80487|HHP_THICU HETEROTROPH-SPECIFIC PROTEIN AADDVTVVIGSAAPMSGPQ >gi|13878750|sp|Q9CDN0|RS18_LACLA 30S ribosomal protein S18 MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ N >gi|13878816|sp|Q9CI15|TIG_LACLA Trigger factor (TF) MTVSFEKTSDTKGTLSFSIDQETIKTGLDKAFNKVKANISVPGFRKGKISRQMFNKMYGEEALFEEALNAVLPTAYDAAV KEAGIEPVAQPKIDVAKMEKGSDWELTAEVVVKPTVSLGDYKDLTVEVEATKEVSDEEVETRLTNSQNNLAELVVKETAA Then run this through formatdb. Do you still get errors? -- Joseph Landman, Ph.D. Michigan Center for Biological Information University of Michigan email: scalable@umich.edu or landman@ctaalliance.org web: http://ctaalliance.org/MCBI/ voice: +1 734 612 4615