BIRCH/New Applications under consideration

From Bioinformatics.Org Wiki

Jump to: navigation, search

Contents

General/Collections

Protein

Genome Assembly

Pre-processing

Quality control and assessment

Error correction

Web site with links to error correction tools - https://omictools.com/error-correction-category

Removel of non-paired reads from paired files

Sometimes one read of a pair is lost when trimming or quality correction are done. For example, if after trimming, a one of the two reads was too short, it might be deleted from one file, but its mate not deleted from the other. Some assembly programs fail if even a single unpaired read is found (eg. rnaspades).

Since read files tend to have 4 lines per read, a crude way to detect the number of reads in a file is 'wc -l'. The number of reads is the number of lines divided by 4. There should be exactly the same number of reads in the left and right read files for a read pair.

I have tried several programs for removing non-paired reads, so far without success:

Assemblers

References

Assembly viewers and Quality Assessment

Post processing

Genome annotation and visualization

Ekblom R, Wolf JBW (2014) A field guide to whole-genome sequencing, assembly and annotation http://onlinelibrary.wiley.com/doi/10.1111/eva.12178/full

Annotation formats and software

Annotation

Pipelines

RNA annotation

Visualization

PathVisioRPC - An XMLRPC interface for PathVisio. In other words, an API for data visualization. Bindings for many languages, including Python, Java and R. http://www.biomedcentral.com/1471-2105/16/267?utm_campaign=BMC24047B&utm_medium=BMCemail&utm_source=Teradata

misFinder - identify mis-assemblies in an unbiased manner using reference and paired-end reads http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0818-3

Comparative Genomics

qod - An alternative approach to multiple genome comparison http://doi.org/10.1093/nar/gkr177

Dotplots

Comparison viewers

Genome Re-sequencing and genotyping

GATK - Genome Analysis Toolkit (MIT Broad Institute)

Gene Expression/Transcriptome Analysis

Pathway analysis


RNA programs

Multiple sequence alignment

Feature annotation

Nice try, but no cigar:

Pattern recognition and detection

Genome Editing/CRISPR

Cloning

Basic Restriction Enzyme Tasks in BioLegato

Can we implement features that persist from step to step? Look at the various file formats eg. SFF. Some of these may be a way to preserve feature annotation without creating a GenBank flat file.

Examples of tasks:

BioPython contains a package called Restriction. This package appears to have classes for Restriction enzymes, which can work with Seq objects do do many of these tasks.

If we use the Restriction class, it might be useful to create new classes as extensions of existing classes. That way, the new classes could be contributed to BioPython.

It might be tempting recognize that much of the above could be accomplished by running BACHREST and DIGEST from wrappers. However, we have to concede that while these are well-written programs, they are not worth the effort to support as Pascal code. The better way is to leverage the BioPython code for what it can do, and adapt the logic from BACHREST and DIGEST to handle the downstream fragment tasks.

Packages for cloning tasks

Package Comments Platforms
Serial Cloner
http://serialbasics.free.fr/Home/Home.html
Looks like a nice GUI. Not as thoroughly tested on Linux as Mac,Windows.

Discussions for previous versions suggest that critical cloning functions may not work on Linux. See http://serialbasics.free.fr/forum/

Confirmed. It is impossible to select restriction sites or features in the Construct menu. Without these, no cloning is possible. The current version 2.6.1 came out in 2013, and there seems to be no commitment to fix these long-standing bugs. Serial Cloner on Linux is therefore considered useless, unless they decide to fix these bugs.
Also a problem is installation. SC is a bit fussy about where you launch it. The binary needs to be in the same directory as the rest of the package. Symbolic links don't work for launching it because it can't find its libraries. There is also no mechanism for specifying an input file on the command line.
Mac, Windows, Linux
UGENE
http://ugene.unipro.ru
Uses a lot of existing software (eg. MUSCLE, BLAST, PRIMER3) with its own interface. Has some NGS stuff in it (eg. Velvet). Mac,Windows,Linux
ApE - A Plasmid Editor
http://biologylabs.utah.edu/jorgensen/wayned/ape/
Appears to be compiled, not a Java application. The last Linux version was released in 2009. OSX,Windows. Adaptable to Linux?

Genetic Mapping/Molecular Markers

Microsatellites/SSRs

SNP/GWAS

blmarker

Genetic Mapping

Packages to look at:


The Laboratory of Statistical Genomics at Rockefeller University maintains what seems to be an up to date list of genetic analysis software.

Phylogeny

Maybe its time to phase out Phylip.

Under consideration

Phylogenomics

Probably not

Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox