Date:
09/02/09 10:52
|
Submitted by:
confused
|
Assigned to:
unset
|
Category:
Unset
|
Priority:
5
|
Ticket group:
Unset
|
Resolution:
Resolved
|
Summary:
trouble with large fasta files
|
Original submission:
hi folks,
I'm running cd-hit on a fasta file containing 8.7 million fasta sequences. I've taken chunks from this file (up to 400,000 sequences so far) and run on cd-hit, with no problems. However, when I run the full file (nohup ./cd-hit1 -i test.fasta -o test-out-cluster -c 1.0 -n 5 -M 2000&), it takes a day or so, then the job ends, and the output cluster file remains empty. The nohup.out file ends with
..........3712000 finished 3231208 clusters
.SEG 2 3712119 7064775
Reading swap
Comparing with SEG 1
Now, I wonder if there might be another error message than nohup isn't picking up on.
I'm working on a server with this memory.
total used free shared buffers cached
Mem: 16380068 10612808 5767260 0 107004 8076380
-/+ buffers/cache: 2429424 13950644
Swap: 10241364 140 10241224
There's two possibilities I can see:
1. my server is running out of memory in some way
2. cd-hit doesn't like dealing with such large input files (8.7 million seqs).
Could anyone confirm that cd-hit has been used with a file of similar size before? This would allow me to discount the second possibility.
best regards, and thanks for any help.
Paul
|
Please log in to add comments and receive followups via email.
|
Followups
Comment
|
Date
|
By
|
Apologies. Checked around, and it's clearly a problem with memory. Would it be possible to add an error message to report that the program is out of memory? Other than that issue, this bug can be closed.
|
09/02/09 11:49
|
confused
|
|
No results for "Dependent on ticket" |
No results for "Dependent on Task" |
No other tickets are dependent on this ticket
|
Ticket change history
Field
|
Old value
|
Date
|
By
|
status_id
|
Pending |
05/16/11 00:41
|
liwz
|
resolution_id
|
Unset |
05/16/11 00:41
|
liwz
|
close_date
|
12/31/69 19:00 |
05/16/11 00:41
|
liwz
|
|