Main»Home Page

Main.HomePage History

Hide minor edits - Show changes to markup

May 23, 2007, at 09:35 AM by 207.161.208.178 -
Added lines 20-26:
  • National Microbiology Laboratory / University of Manitoba
    • Morag Graham
    • Ma Luo
    • Ben Liang
    • Gary Van Domselaar
    • Michael Domaratzki
    • Shuan Tyler
Deleted lines 35-41:
  • National Microbiology Laboratory / University of Manitoba
    • Morag Graham
    • Ma Luo
    • Ben Liang
    • Gary Van Domselaar
    • Michael Domaratzki
    • Shuan Tyler
May 11, 2007, at 04:36 PM by 207.161.208.178 -
Added line 34:
  • Michael Domaratzki
February 12, 2007, at 12:53 PM by 64.4.90.98 -
Deleted lines 0-1:

HI Garrett

February 12, 2007, at 12:52 PM by 64.4.90.98 -
Added lines 1-2:

HI Garrett

July 19, 2006, at 04:52 PM by 207.161.208.185 -
Added line 34:
  • Shuan Tyler
July 11, 2006, at 08:56 AM by Gary Van Domselaar -
Changed lines 61-62 from:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.
to:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.
July 11, 2006, at 06:02 AM by 206.45.181.197 -
Deleted line 28:
July 11, 2006, at 06:01 AM by 206.45.181.197 -
Added lines 28-29:
  • Rene Warren
July 10, 2006, at 07:13 PM by 206.45.181.197 -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

July 10, 2006, at 11:33 AM by Gary Van Domselaar -
Changed line 1 from:

Description

to:

Description

Changed line 4 from:

Background

to:

Background

Changed line 7 from:

The Quasispeices Assembly Problem

to:

The Quasispeices Assembly Problem

Changed line 10 from:

Assembly Strategy

to:

Assembly Strategy

Changed line 15 from:

Reference and Test Data

to:

Reference and Test Data

Changed line 19 from:

Development Team

to:

Development Team

Changed line 40 from:

Status

to:

Status

Changed line 43 from:

License

to:

License

Changed line 46 from:

Contact

to:

Contact

Changed line 59 from:

References

to:

References

July 10, 2006, at 11:29 AM by Gary Van Domselaar -
Changed line 40 from:

Status

to:

Status

Changed line 43 from:

License

to:

License

July 10, 2006, at 11:16 AM by Gary Van Domselaar -
Changed line 72 from:
  1. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
to:
  1. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms. in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
July 10, 2006, at 11:13 AM by Gary Van Domselaar -
Changed lines 11-12 from:

The problem of simultaneously assembling multiple highly similar, yet distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

to:

The problem of simultaneously assembling multiple highly similar, yet distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

July 10, 2006, at 11:12 AM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences™.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

Changed lines 5-6 from:

Many important viruses, such as HIV, the SARS Coronavirus, Hepatitis C, and the Influenza virus, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions, with the ultimate goad of identifying strategies for disease treatment and prevention. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using the sequencing-by-synthesis technology recently developed by 454 Life Sciences™ and incorporated in the GS20™ sequencer. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Many important viruses, such as HIV, the SARS Coronavirus, Hepatitis C, and the Influenza virus, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using the sequencing-by-synthesis technology recently developed by 454 Life Sciences and incorporated in the GS20 sequencer. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [1],[2]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20™ can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [1],[2]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454's Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20™ can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20 sequencing technology must take these factors into account.

Changed lines 16-18 from:

Currently we have obtained GS20™ sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We will develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20™ sequencer. The quasispecies genomes assembled using the developed GS20™ methods will be compared with and validated against the cloned sequences.

to:

Currently we have obtained GS20 sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We will develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The quasispecies genomes assembled using the developed GS20 methods will be compared with and validated against the cloned sequences.

July 10, 2006, at 12:20 AM by Gary Van Domselaar -
Added line 18:
Added lines 39-45:

Status

Q Assembler is pre-alpha. There are currently no releases.

License

Q Assembler is being developed under the GNU General Public License

July 09, 2006, at 10:29 PM by Gary Van Domselaar -
Changed lines 36-37 from:
to:

For developer-only content, click here

July 09, 2006, at 08:42 PM by Gary Van Domselaar -
Changed line 18 from:

Development Team

to:

Development Team

Changed line 38 from:

Contact

to:

Contact

Changed line 51 from:

References

to:

References

July 09, 2006, at 08:21 PM by Gary Van Domselaar -
Changed line 39 from:

Questions, comments, and requests to participate should be directed to:\\

to:

Questions, comments, and requests to participate should be directed to:\\\

Changed lines 44-52 from:

1015 Arlington St., Winnipeg, MB, Canada R3E 3R2
gary.vandomselaar [at] gmai.com

Suite H-3570 Phone: +1 204 784 5994 Fax: +1 204 789 2018 gary_van_domselaar@phac-aspc.gc.ca gary.vandomselaar@gmail.com

to:

1015 Arlington St., Winnipeg, MB, Canada R3E 3R2

Suite H-3570
Phone: +1 204 784 5994
Fax: +1 204 789 2018
gary_van_domselaar [at] phac-aspc.gc.ca
gary.vandomselaar [at] gmail.com

July 09, 2006, at 08:15 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences™.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences™.

Changed lines 5-6 from:

Many important viruses, such as HIV, the SARS Coronavirus, Hepatitis C, and the Influenza virus, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions, with the ultimate goad of identifying strategies for disease treatment and prevention. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using the sequencing-by-synthesis technology recently developed by 454 Life Sciences™ and incorporated in the GS20™ sequencer. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Many important viruses, such as HIV, the SARS Coronavirus, Hepatitis C, and the Influenza virus, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions, with the ultimate goad of identifying strategies for disease treatment and prevention. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using the sequencing-by-synthesis technology recently developed by 454 Life Sciences™ and incorporated in the GS20™ sequencer. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

Changed lines 39-40 from:
to:

Questions, comments, and requests to participate should be directed to:
Gary Van Domselaar, PhD
Head of Bioinformatics
National Microbiology Laboratory
Public Health Agency of Canada
1015 Arlington St., Winnipeg, MB, Canada R3E 3R2
gary.vandomselaar [at] gmai.com

Suite H-3570 Phone: +1 204 784 5994 Fax: +1 204 789 2018 gary_van_domselaar@phac-aspc.gc.ca gary.vandomselaar@gmail.com

Changed lines 54-59 from:
  1. #454_paper Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. omingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. (1998) Quasispecies Structure and Persistence of RNA Viruses. Emerg Infect Dis. 4:521-7.

  3. #454_hiv_sequencing Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005
to:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. Domingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. (1998) Quasispecies Structure and Persistence of RNA Viruses. Emerg Infect Dis. 4:521-7.

  3. Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005
Changed line 66 from:
  1. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
to:
  1. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
July 09, 2006, at 07:42 PM by Gary Van Domselaar -
Changed lines 19-37 from:
    * TIGR / University of Maryland / University of Pittsburgh:
         ** Elodie Ghedin
         ** Mihai Pop
         ** Steven Salzberg
    * Michael Smith Genome Sciences Centre:
          ** Steven Jones
          ** Asim Siddiqui
          ** Matthew Bainbridge
    * National Microbiology Laboratory / University of Manitoba
          ** Morag Graham
          ** Ma Luo
          ** Ben Liang
          ** Gary Van Domselaar
    * 454 Life Sciences / Roche
          ** Lei Du
          ** Jolene Osterberger

to:
  • TIGR / University of Maryland / University of Pittsburgh:
    • Elodie Ghedin
    • Mihai Pop
    • Steven Salzberg
  • Michael Smith Genome Sciences Centre:
    • Steven Jones
    • Asim Siddiqui
    • Matthew Bainbridge
  • National Microbiology Laboratory / University of Manitoba
    • Morag Graham
    • Ma Luo
    • Ben Liang
    • Gary Van Domselaar
  • 454 Life Sciences / Roche
    • Lei Du
    • Jolene Osterberger

July 09, 2006, at 07:40 PM by Gary Van Domselaar -
Changed lines 11-12 from:

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

to:

The problem of simultaneously assembling multiple highly similar, yet distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

Changed lines 16-17 from:

Currently we have obtained GS20™ sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

to:

Currently we have obtained GS20™ sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We will develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20™ sequencer. The quasispecies genomes assembled using the developed GS20™ methods will be compared with and validated against the cloned sequences.

Development Team

    * TIGR / University of Maryland / University of Pittsburgh:
         ** Elodie Ghedin
         ** Mihai Pop
         ** Steven Salzberg
    * Michael Smith Genome Sciences Centre:
          ** Steven Jones
          ** Asim Siddiqui
          ** Matthew Bainbridge
    * National Microbiology Laboratory / University of Manitoba
          ** Morag Graham
          ** Ma Luo
          ** Ben Liang
          ** Gary Van Domselaar
    * 454 Life Sciences / Roche
          ** Lei Du
          ** Jolene Osterberger

Contact

July 09, 2006, at 07:14 PM by Gary Van Domselaar -
Changed lines 5-9 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using sequencing-by-synthesis technology recently developed and commercialized by 454 Life Sciences™ with their introduction of the GS20™ sequencer . In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

The Problem

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3],[4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Many important viruses, such as HIV, the SARS Coronavirus, Hepatitis C, and the Influenza virus, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions, with the ultimate goad of identifying strategies for disease treatment and prevention. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using the sequencing-by-synthesis technology recently developed by 454 Life Sciences™ and incorporated in the GS20™ sequencer. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

The Quasispeices Assembly Problem

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3],[4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20™ can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

Changed lines 11-14 from:

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

Our current thinking is that comparative assembly offers the most promising approach to tackle this problem. In this procedure, sequence reads are aligned to a reference genome rather than being assembled de novo using the standard overlap-layout-consensus paradigm. Quasispecies sequences obviously are too diverse to align to any single reference genome, so instead we propose to modify the comparative assembly method with a "phylogenetic partitioning" step: input reads would be aligned initially to a representative sequence from each major clade. Each group of reads would then be realigned to subtypes of said clade, etc. Some initial work by believe it might help to segregate the reads into groups that approximately represent their parent genomes, where the final assembly can occur.

to:

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

Our current thinking is that comparative assembly offers the most promising approach to tackle this problem. In this procedure, sequence reads are aligned to a reference genome rather than being assembled de novo using the standard overlap-layout-consensus paradigm. Quasispecies sequences obviously are too diverse to align to any single reference genome, so instead we propose to modify the comparative assembly method with a "phylogenetic partitioning" step: input reads would be aligned initially to a representative sequence from each major clade. Each group of reads would then be realigned to subtypes of said clade, etc. Our intial studies suggest that this approach can successfully segregate the reads into groups that approximately represent their parent genomes, where the final assembly can occur.

July 09, 2006, at 06:38 PM by Gary Van Domselaar -
Changed lines 5-6 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using sequencing-by-synthesis technology recently developed and commercialized by 454 Life Sciences™ with their introduction of the GS20™ sequencer [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible [2] using sequencing-by-synthesis technology recently developed and commercialized by 454 Life Sciences™ with their introduction of the GS20™ sequencer . In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

July 09, 2006, at 06:36 PM by Gary Van Domselaar -
Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [1],[2]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

Changed lines 11-12 from:

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [5,6]. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving quasispecies sequences.

to:

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [1],[2]?. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving viral quasispecies sequences.

July 09, 2006, at 06:31 PM by Gary Van Domselaar -
Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of legitimate interanal repeat sequences, making the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of true internal repeats, making the problem of connecting fragments into correct genomic sequences a highly challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

July 09, 2006, at 06:29 PM by Gary Van Domselaar -
Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler™ are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors or internal repeats rather than legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by rearrangements and the existence of legitimate interanal repeat sequences, making the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20™'s limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

July 09, 2006, at 06:24 PM by Gary Van Domselaar -
Changed lines 5-6 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using sequencing-by-synthesis technology recently developed and commercialized by 454 Life Sciences™ with their introduction of the GS20™ sequencer [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using this technology , it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454's Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454™'s Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

July 09, 2006, at 06:18 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences™.

Changed lines 5-6 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454™, CuraGen™, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

Changed lines 8-9 from:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454's Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20 sequencing technology must take these factors into account.

to:

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454's Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20™ sequencing technology must take these factors into account.

Changed lines 16-17 from:

Currently we have obtained GS20 sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

to:

Currently we have obtained GS20™ sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

July 09, 2006, at 06:10 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

Changed lines 5-6 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of quasispecies are influenced by host-viral interactions. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of viral quasispecies are influenced by host-viral interactions [1]. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

July 09, 2006, at 06:07 PM by Gary Van Domselaar -
Changed lines 19-31 from:
  1. 454_paper? Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. # viral_quasispecies omingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. (1998) Quasispecies Structure and Persistence of RNA Viruses. Emerg Infect Dis. 4:521-7.

  3. Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

  4. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  5. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  6. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

  7. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
to:
  1. #454_paper Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. omingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. (1998) Quasispecies Structure and Persistence of RNA Viruses. Emerg Infect Dis. 4:521-7.

  3. #454_hiv_sequencing Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

  4. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  5. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  6. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

  7. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
July 09, 2006, at 06:05 PM by Gary Van Domselaar -
Changed lines 19-20 from:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.
to:
  1. 454_paper? Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. # viral_quasispecies omingo E, Baranowski E, Ruiz-Jarabo CM, Martin-Hernandez AM, Saiz JC, Escarmis C. (1998) Quasispecies Structure and Persistence of RNA Viruses. Emerg Infect Dis. 4:521-7.
Changed lines 25-30 from:
  1. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  2. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  3. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.
to:
  1. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  2. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  3. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.
July 09, 2006, at 05:34 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

Changed lines 19-29 from:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

  3. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  4. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  5. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

  6. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
to:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

  3. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  4. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  5. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

  6. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
July 09, 2006, at 05:20 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, the University of Pittsburgh and 454 Life Sciences.

Changed lines 25-29 from:
  1. Edwards RA, Rohwer F. (2005) [[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=3DRetrieve&db=3DPubMed&list_uids=3D15886693&dopt=3DAbstr

act | Viral metagenomics]]. Nat. Rev. Microbiol. 6:504-10.

  1. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.
to:
  1. Edwards RA, Rohwer F. (2005) Viral metagenomics. Nat. Rev. Microbiol. 6:504-10.

  2. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.
July 09, 2006, at 05:13 PM by Gary Van Domselaar -
Changed lines 23-26 from:
  1. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp Biol 1: e24.

  2. Edwards RA, Rohwer F. (2005) Viral metagenomics . Nat. Rev. Microbiol. 6:504-10.
to:
  1. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp. Biol. 1: e24.

  2. Edwards RA, Rohwer F. (2005) [[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=3DRetrieve&db=3DPubMed&list_uids=3D15886693&dopt=3DAbstr

act | Viral metagenomics]]. Nat. Rev. Microbiol. 6:504-10.

July 09, 2006, at 05:10 PM by Gary Van Domselaar -
Changed lines 2-3 from:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced simultaneously using recently developed sequencing-by-synthesis? technology. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

to:

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced in parallel using recently developed sequencing-by-synthesis technology [1]. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

July 09, 2006, at 05:09 PM by Gary Van Domselaar -
Changed lines 19-30 from:

. Margulies M. et al. (2005) [[ http://www.nature.= com/nature/journal/vaop/ncurrent/abs/nature03959.html | Genome sequencing in microfabricated high-density picolitre reactors]]. Nature. 437:326-7.

2. (2005) Simons JF et al. Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

3. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp Biol 1: e24.

4. Edwards RA, Rohwer F. (2005) Viral metagenomics . Nat. Rev. Microbiol. 6:504-10.

5. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

6. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.

to:
  1. Margulies M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437:326-7.

  2. Simons JF et al. (2005) Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

  3. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp Biol 1: e24.

  4. Edwards RA, Rohwer F. (2005) Viral metagenomics . Nat. Rev. Microbiol. 6:504-10.

  5. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

  6. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.
July 09, 2006, at 05:04 PM by Gary Van Domselaar -
Changed lines 1-3 from:

Project Goal

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced simultaneously using recently developed sequencing-by-synthesis technology. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

to:

Description

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced simultaneously using recently developed sequencing-by-synthesis? technology. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

Changed lines 5-6 from:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called quasispecies within infected hosts. Diversity and evolution of quasispecies are influenced by host-viral interactions. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

to:

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called viral quasispecies within infected hosts. Diversity and evolution of quasispecies are influenced by host-viral interactions. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

Changed lines 16-30 from:

Currently we have obtained GS20 sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

to:

Currently we have obtained GS20 sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

References

. Margulies M. et al. (2005) [[ http://www.nature.= com/nature/journal/vaop/ncurrent/abs/nature03959.html | Genome sequencing in microfabricated high-density picolitre reactors]]. Nature. 437:326-7.

2. (2005) Simons JF et al. Ultra-Deep sequencing of HIV from Drug Resistant Patients. XIV International HIV Drug Resistance Workshop. Quebec City, Canada, June 7-11, 2005

3. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comp Biol 1: e24.

4. Edwards RA, Rohwer F. (2005) Viral metagenomics . Nat. Rev. Microbiol. 6:504-10.

5. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.

6. Lancia G. et al. (2001) SNPs, problems, complexity and algorithms , in: 9th Annual European Symposium on Algorithms (BRICS), University of Aarhus, Denmark.

July 09, 2006, at 04:51 PM by Gary Van Domselaar -
Changed line 1 from:

Introduction

to:

Project Goal

July 09, 2006, at 04:50 PM by Gary Van Domselaar -
Changed lines 1-2 from:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the 454 Life Sciences' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

to:

Introduction

The goal of the Q Assembler project is to develop software for assembling multiple viral quasispecies genomes sequenced simultaneously using recently developed sequencing-by-synthesis technology. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

Background

Certain viral families, such as the RNA viruses, possess high mutation, recombination, and replication rates. These viruses generate "clouds" of sequence variants called quasispecies within infected hosts. Diversity and evolution of quasispecies are influenced by host-viral interactions. Characterization of quasispecies genome populations from infected individuals is a first step to study such interactions. A recent proof-of-concept study by researchers at 454, CuraGen, and Yale, suggests that parallel sequencing and identification of the sequence variation present within a population of viral quasispecies is feasible using the recently developed sequencing-by-synthesis technology [2]. In order to realize the potential for sequencing and assembly of quasispecies populations using the GS20, it is necessary to develop and validate a robust methodology for genome-scale quasispecies assembly. We expect the bulk of the challenge will lie in the design and construction of the quasispecies assembler.

The Problem

Assembling and characterizing any quasispecies genome population poses a substantial computational challenge [3,4]. Current assembly programs such as Phred/Phrap, TIGR Assembler, and 454's Newbler Assembler are designed to connect reads into a single consensus sequence. As such, they are not appropriate for simultaneously assembling multiple genome sequences. These programs assume, for example, that base mismatches represent base-calling errors rather than a legitimate sequence variation from a population of input sequences. In addition, assembly is complicated by the existence of repeat sequences and recombination that make the problem of connecting fragments into correct genomic sequences a challenging one. The deep coverage capability of the GS20 can aid greatly in addressing the former problem; however, the GS20's limited unidirectional read length (at ~100 bp per read), and lack of mate-pair information presents a serious challenge in dealing with the latter problem. Any quasispecies sequence assembler for application with the GS20 sequencing technology must take these factors into account.

Assembly Strategy

The problem of simultaneously assembling multiple distinct genome sequences is not novel. Indeed, this situation is encountered routinely in determining the haplotype of diploid eukaryotic DNA (i.e., the mapping of polymorphisms to the correct chromosome). In regions where sufficient sequence variation exists between reads, a technique known as correlated differences can be applied to segregate the two distinct sequences [5,6]. This technique uses repeatedly occurring high quality base call mismatches to segregate and connect sequencing reads. The same strategy can be applied to the separation of quasispecies sequences, although in general the sequences can only be effectively separated to a degree owing to existence of intervening stretches of highly similar sequence that break the connection between variable regions. In addition, the greater number of quasispecies assembly relative to haplotyping, and the lack of foreknowlege about the total number of members present in the quasispecies population will compound the difficulty of applying this technique to resolving quasispecies sequences.

Our current thinking is that comparative assembly offers the most promising approach to tackle this problem. In this procedure, sequence reads are aligned to a reference genome rather than being assembled de novo using the standard overlap-layout-consensus paradigm. Quasispecies sequences obviously are too diverse to align to any single reference genome, so instead we propose to modify the comparative assembly method with a "phylogenetic partitioning" step: input reads would be aligned initially to a representative sequence from each major clade. Each group of reads would then be realigned to subtypes of said clade, etc. Some initial work by believe it might help to segregate the reads into groups that approximately represent their parent genomes, where the final assembly can occur.

Reference and Test Data

Currently we have obtained GS20 sequence reads from overlapping PCR products spanning the entire HIV genomes of two individuals. We propose to develop the assembly strategy and methodology using the GS20 sequence reads from the overlapping PCR products spanning the entire HIV genomes of these two individuals. To aid the testing and validation of our methodology, the NML HIV and Human Genetics Laboratory has PCR products of the HIV gag region (including part of 5'-LTR and part of protease, ~2kb in length) and fully sequenced clones (30 to 90 clones per sample) of the same PCR products from more than 200 patient samples. These samples represent diverse HIV subtypes, from mostly clade A, D, C and recombinant subtypes and were sequenced using standard Sanger sequencing methodology. The GS20 methods and strategy we develop will be tested by sequencing the same PCR products from HIV gag region using the GS20 sequencer. The assembled quasispecies using the developed GS20 methods can then be compared and validated with the cloned sequences.

July 09, 2006, at 04:01 PM by Gary Van Domselaar - Q Assembler Home
Changed lines 1-2 from:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the [http://www.454.com [454 Life Sciences]]' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, The Michael Smith Genome Sciences Centre, the University of Manitoba, and the University of Baltimore.

to:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the 454 Life Sciences' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, Canada's Michael Smith Genome Sciences Centre, the University of Manitoba, the University of Baltimore, and 454 Life Sciences.

July 09, 2006, at 03:42 PM by Gary Van Domselaar -
Changed line 1 from:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the 454 Life Sciences?' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, The Michael Smith Genome Sciences Centre, the University of Manitoba, and the University of Baltimore.

to:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the [http://www.454.com [454 Life Sciences]]' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, The Michael Smith Genome Sciences Centre, the University of Manitoba, and the University of Baltimore.

July 09, 2006, at 03:41 PM by Gary Van Domselaar -
Added line 1:

The Goal of the Q Assembler project is to create an assembler for assembling multiple quasispecies genomes sequenced simulataneously using the 454 Life Sciences?' GS20 sequencer. Q Assembler is a collaboration between the Public Health Agency of Canada's National Microbiology Laboratory, The Michael Smith Genome Sciences Centre, the University of Manitoba, and the University of Baltimore.