BM 20160630
Thu Jun 30 10:46:37 CDT 2016 3662453b2f7e372633c5fe9ccd7e13d15611631c
Combo Benchmark
Sample data
We create Brownian motion from \(X \sim N_R(0,\Sigma)\) along the lines of Matthew’s lecture notes (there are two flavors to simulate and we implemented both; benchmark on this page uses the first flavor: direct sampling instead of generating from covariance). \(\Sigma_{ij} = COV(X_i, X_j)\) is well defined by the tree structure, i.e. distance to root from the MRCA of \(X_i\) and \(X_j\). Different from other tree simulations we allow observation on any position on the tree: on leaves, internal nodes and in-between notes, as shown in the graph above.
To generate Poisson data we apply a \(log\) link.
Noise
Three types of noise are simulated:
- Noisy samples, \(X_{ij} = X_{ij} + N(0, \sigma^2)\)
- Noisy features, \(X = [X, E], E\sim MVN(0, \sigma^2 I)\)
- Combination of the two above
Methods
- Phylogenetic tree methods:
- Neighbor joining
- FastME
- Global distances:
- PCA (covariance)
- Sparse PCA
- MDS (dissimilarities)
- Local distances:
- t-SNE (t-distributed Stochastic Neighbor Embedding)
- Spectral embedding
- Locally linear embedding: including standard, LTSA, Hessian and modified LLE
- LTSA: Local Tangent Space Alignment
- Factor analysis:
- SFA
- CountCluster
- Flash
- PMD
- Positive PMD
- State of the art:
- Minimal spanning tree (MST), with different layout on paper.
- Combo methods:
- Factor analysis (denoise) + MST
We have 18 methods and 7 combos, though not all of them are interesting.
Implementation
All in one DSC script:
dsc exec benchmark.dsc -j8 -s \
"(ToyTreeBM, SimTree * PlotTree * SimBM) * \
(SklearnDR * PlotPY, \
MST * PlotR[1], \
(RDR[2], RDR[3]) * PlotR[2], \
(RDR[1], RDR[4], RDR[5], RDR[6]) * PlotR[3], \
RDR[7] * PlotR[4], \
(RDR[8], RDR[9]) * PlotR[5], \
(RDR[1], RDR[2], RDR[3], RDR[4], RDR[5], RDR[6], RDR[7]) * MST * PlotR[1])"
4285 figures are generated from this benchmark.
Result
Written into this one HTML page. Uninteresting benchmark has not been pruned as of now.