Bioinformatics.org
[NEU MS in Bioinformatics]
Not logged in
  • Log in
  • Bioinformatics.org
    Membership (37968+) Group hosting [?]
  • Wiki
  • Franklin Award
  • Sponsorships
  • Careers
    About bioinformatics
    Bioinformatics training
    Bioinformatics jobs

    Research
    All information groups
    Online databases Online analysis tools Online education tools More tools

    Development
    All software groups
    FTP repository
    SVN & CVS repositories [?]
    Mailing lists

    Forums
    News & Commentary
  • Submit
  • Archives
  • Subscribe

  • Jobs Forum
    (Career Center)
  • Submit
  • Archives
  • Subscribe
  • News & Commentary - Message forums

    Software: SNPdryad beats PolyPhen2 on gold standard datasets.
    Submitted by Brenden Beckham; posted on Monday, April 14, 2014

    MOTIVATION

    The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance.

    RESULTS

    We have developed a novel prediction algorithm, named SNPdryad, which only includes protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes and uses Random Forest as a classifier. We have tested SNPdryad on several datasets. We found that SNPdryad consistently outperformed other methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions.

    AVAILABILITY & IMPLEMENTATION

    snps.ccbr.utoronto.ca:8080/SNPdryad/
    datadryad.org/hand[...]59124

    Expanded view | Monitor forum | Save place

    Start a new thread:
    You have to be logged in to post a reply.

     

    Copyright © 2016 · Scilico, LLC