BM 20160630

Thu Jun 30 10:46:37 CDT 2016 3662453b2f7e372633c5fe9ccd7e13d15611631c

Combo Benchmark

Tree structure: from simple, to complex

Simple

Complex

Sample data

We create Brownian motion from \(X \sim N_R(0,\Sigma)\) along the lines of Matthew’s lecture notes (there are two flavors to simulate and we implemented both; benchmark on this page uses the first flavor: direct sampling instead of generating from covariance). \(\Sigma_{ij} = COV(X_i, X_j)\) is well defined by the tree structure, i.e. distance to root from the MRCA of \(X_i\) and \(X_j\). Different from other tree simulations we allow observation on any position on the tree: on leaves, internal nodes and in-between notes, as shown in the graph above.

To generate Poisson data we apply a \(log\) link.

Noise

Three types of noise are simulated:

  • Noisy samples, \(X_{ij} = X_{ij} + N(0, \sigma^2)\)
  • Noisy features, \(X = [X, E], E\sim MVN(0, \sigma^2 I)\)
  • Combination of the two above

Methods

  • Phylogenetic tree methods:
  • Global distances:
    • PCA (covariance)
    • Sparse PCA
    • MDS (dissimilarities)
  • Local distances:
    • t-SNE (t-distributed Stochastic Neighbor Embedding)
    • Spectral embedding
    • Locally linear embedding: including standard, LTSA, Hessian and modified LLE
    • LTSA: Local Tangent Space Alignment
  • Factor analysis:
    • SFA
    • CountCluster
    • Flash
    • PMD
    • Positive PMD
  • State of the art:
    • Minimal spanning tree (MST), with different layout on paper.
  • Combo methods:
    • Factor analysis (denoise) + MST

We have 18 methods and 7 combos, though not all of them are interesting.

Implementation

All in one DSC script:

dsc exec benchmark.dsc -j8 -s \
    "(ToyTreeBM, SimTree * PlotTree * SimBM) * \
    (SklearnDR * PlotPY, \
    MST * PlotR[1], \
    (RDR[2], RDR[3]) * PlotR[2], \
    (RDR[1], RDR[4], RDR[5], RDR[6]) * PlotR[3], \
    RDR[7] * PlotR[4], \
    (RDR[8], RDR[9]) * PlotR[5], \
    (RDR[1], RDR[2], RDR[3], RDR[4], RDR[5], RDR[6], RDR[7]) * MST * PlotR[1])"

4285 figures are generated from this benchmark.

Result

Written into this one HTML page. Uninteresting benchmark has not been pruned as of now.