Date:
05/06/08 10:48
|
Submitted by:
unset
|
Assigned to:
unset
|
Category:
Clustering
|
Priority:
5
|
Ticket group:
Critical
|
Resolution:
Unset
|
Summary:
failing when long gene description
|
Original submission:
I used cd-hit weekly to cluster nr.fa. Lately the program failed, just hanging, no reporting any progress nor any error. Thinking that nr.fa was too big I split it and submit by parts...all parts but 1 was clustered. Subdivided the failing part, re submiting, again all the parts but one was clustered...repeated the process several times, till got to the offendiing sequence
The annotation only is over 300K....still is a problem.
Do you think you can solve this?
Thanks
Raquel Norel
rn98@columbia.edu
|
Please log in to add comments and receive followups via email.
|
Followups
Comment
|
Date
|
By
|
I had same problem and contacted Weizhong Li. He hinted to me that the long description is the problem. I wrote the following small perl script using bioperl to remove the description of the fasta sequences.
CD-HIT works now.
#!/usr/local/bin/perl
use Bio::Seq;
use Bio::SeqIO;
$seqin = Bio::SeqIO->new( -format => 'Fasta', -file => 'nr.fasta');
$seqout= Bio::SeqIO->new( -format => 'Fasta', -file => '>nr_no_desc.fasta');
my $seq_count=0;
while (my $NextSeq = $seqin->next_seq())
{
$NextSeq->desc("");
$seqout->write_seq($NextSeq);
$seq_count = $seq_count+1;
}
print "Finished shortening descriptions of $seq_count sequences!n";
|
08/12/08 09:24
|
unset
|
1 - Change in cd-hi.h the default value (300000):
For example with a new size of 600000:
#define MAX_DES 600000
#define MAX_LINE_SIZE 600000
2- Rebuild cd-hit application
|
05/30/08 14:56
|
chcaron
|
|
No results for "Dependent on ticket" |
No results for "Dependent on Task" |
No other tickets are dependent on this ticket
|
Ticket change history
Field
|
Old value
|
Date
|
By
|
status_id
|
Pending |
07/14/11 01:22
|
liwz
|
close_date
|
12/31/69 19:00 |
07/14/11 01:22
|
liwz
|
|