Bioinformatics.org
[University of Birmingham]
[Patsnap]
Not logged in
  • Log in
  • Bioinformatics.org
    Membership (44429+) Group hosting [?] Wiki
    Franklin Award
    Sponsorships

    Careers
    About bioinformatics
    Bioinformatics jobs

    Research
    All information groups
    Online databases Online analysis tools Online education tools More tools

    Development
    All software groups
    FTP repository
    SVN & CVS repositories [?]
    Mailing lists

    Forums
    News & Commentary
  • Submit
  • Archives
  • Subscribe

  • Jobs Forum
    (Career Center)
  • Submit
  • Archives
  • Subscribe
  • CD-HIT: Sequence clustering software - Support tickets

    Submit | Open tickets | Closed tickets

    [ Ticket #1100 ] trouble with large fasta files
    Date:
    09/02/09 10:52
    Submitted by:
    confused
    Assigned to:
    unset
    Category:
    Unset
    Priority:
    5
    Ticket group:
    Unset
    Resolution:
    Resolved
    Summary:
    trouble with large fasta files
    Original submission:
    hi folks,

    I'm running cd-hit on a fasta file containing 8.7 million fasta sequences. I've taken chunks from this file (up to 400,000 sequences so far) and run on cd-hit, with no problems. However, when I run the full file (nohup ./cd-hit1 -i test.fasta -o test-out-cluster -c 1.0 -n 5 -M 2000&), it takes a day or so, then the job ends, and the output cluster file remains empty. The nohup.out file ends with

    ..........3712000 finished 3231208 clusters
    .SEG 2 3712119 7064775
    Reading swap
    Comparing with SEG 1

    Now, I wonder if there might be another error message than nohup isn't picking up on.

    I'm working on a server with this memory.

    total used free shared buffers cached
    Mem: 16380068 10612808 5767260 0 107004 8076380
    -/+ buffers/cache: 2429424 13950644
    Swap: 10241364 140 10241224

    There's two possibilities I can see:
    1. my server is running out of memory in some way
    2. cd-hit doesn't like dealing with such large input files (8.7 million seqs).

    Could anyone confirm that cd-hit has been used with a file of similar size before? This would allow me to discount the second possibility.

    best regards, and thanks for any help.

    Paul
    Please log in to add comments and receive followups via email.
    Followups
    Comment Date By
    Apologies. Checked around, and it's clearly a problem with memory. Would it be possible to add an error message to report that the program is out of memory? Other than that issue, this bug can be closed. 09/02/09 11:49 confused
    No results for "Dependent on ticket"
    No results for "Dependent on Task"
    No other tickets are dependent on this ticket
    Ticket change history
    Field Old value Date By
    status_id Pending 05/16/11 00:41 liwz
    resolution_id Unset 05/16/11 00:41 liwz
    close_date 12/31/69 19:00 05/16/11 00:41 liwz

     

    Copyright © 2024 Scilico, LLC · Privacy Policy