Ticket detail: 1100 - Bioinformatics.org

[Next-Generation Sequencing Book]

Submit | Open tickets | Closed tickets

[ Ticket #1100 ] trouble with large fasta files

Date:
09/02/09 10:52

Submitted by:
confused

Assigned to:
unset

Category:
Unset

Priority:
5

Ticket group:
Unset

Resolution:
Resolved

Summary:
trouble with large fasta files

Original submission:

hi folks,

I'm running cd-hit on a fasta file containing 8.7 million fasta sequences. I've taken chunks from this file (up to 400,000 sequences so far) and run on cd-hit, with no problems. However, when I run the full file (nohup ./cd-hit1 -i test.fasta -o test-out-cluster -c 1.0 -n 5 -M 2000&), it takes a day or so, then the job ends, and the output cluster file remains empty. The nohup.out file ends with

..........3712000 finished 3231208 clusters
.SEG 2 3712119 7064775
Reading swap
Comparing with SEG 1

Now, I wonder if there might be another error message than nohup isn't picking up on.

I'm working on a server with this memory.

total used free shared buffers cached
Mem: 16380068 10612808 5767260 0 107004 8076380
-/+ buffers/cache: 2429424 13950644
Swap: 10241364 140 10241224

There's two possibilities I can see:
1. my server is running out of memory in some way
2. cd-hit doesn't like dealing with such large input files (8.7 million seqs).

Could anyone confirm that cd-hit has been used with a file of similar size before? This would allow me to discount the second possibility.

best regards, and thanks for any help.

Paul

Please log in to add comments and receive followups via email.

Followups

Comment	Date	By
Apologies. Checked around, and it's clearly a problem with memory. Would it be possible to add an error message to report that the program is out of memory? Other than that issue, this bug can be closed.	09/02/09 11:49	confused

No results for "Dependent on ticket"

No results for "Dependent on Task"

No other tickets are dependent on this ticket

Ticket change history

Field	Old value	Date	By
status_id	Pending	05/16/11 00:41	liwz
resolution_id	Unset	05/16/11 00:41	liwz
close_date	12/31/69 19:00	05/16/11 00:41	liwz

© 1998-2025 Scilico, LLC. All rights reserved.