Submit | Open tickets | Closed tickets

[ Ticket #1100 ] trouble with large fasta files
Date:
09/02/09 10:52
Submitted by:
confused
Assigned to:
unset
Category:
Unset
Priority:
5
Ticket group:
Unset
Resolution:
Resolved
Summary:
trouble with large fasta files
Original submission:
hi folks,

I'm running cd-hit on a fasta file containing 8.7 million fasta sequences. I've taken chunks from this file (up to 400,000 sequences so far) and run on cd-hit, with no problems. However, when I run the full file (nohup ./cd-hit1 -i test.fasta -o test-out-cluster -c 1.0 -n 5 -M 2000&), it takes a day or so, then the job ends, and the output cluster file remains empty. The nohup.out file ends with

..........3712000 finished 3231208 clusters
.SEG 2 3712119 7064775
Reading swap
Comparing with SEG 1

Now, I wonder if there might be another error message than nohup isn't picking up on.

I'm working on a server with this memory.

total used free shared buffers cached
Mem: 16380068 10612808 5767260 0 107004 8076380
-/+ buffers/cache: 2429424 13950644
Swap: 10241364 140 10241224

There's two possibilities I can see:
1. my server is running out of memory in some way
2. cd-hit doesn't like dealing with such large input files (8.7 million seqs).

Could anyone confirm that cd-hit has been used with a file of similar size before? This would allow me to discount the second possibility.

best regards, and thanks for any help.

Paul
Please log in to add comments and receive followups via email.
Followups
Comment Date By
Apologies. Checked around, and it's clearly a problem with memory. Would it be possible to add an error message to report that the program is out of memory? Other than that issue, this bug can be closed. 09/02/09 11:49 confused
No results for "Dependent on ticket"
No results for "Dependent on Task"
No other tickets are dependent on this ticket
Ticket change history
Field Old value Date By
status_id Pending 05/16/11 00:41 liwz
resolution_id Unset 05/16/11 00:41 liwz
close_date 12/31/69 19:00 05/16/11 00:41 liwz

© 1998-2025 Scilico, LLC. All rights reserved.