BM 20160513

Tue May 17 10:21:53 CDT 2016 0422326fa91564c26ab7be9495f441d41f5cda44

Data All Over Tree: Simulate directly from \(Y \sim MVN(0,\Sigma)\) s.t. \(COV(X_a, X_b)\) is well defined. It is implemented in VcvTree.R.

Got stuck here: simulation thus performed resulted in a non-PD Sigma that cannot be used to generate MVN samples.

The eigen values of the alleged covariance matrix is:

  > eigen(dat$CX)$values
    [1]  1.789970e+02  6.142406e+01  2.486828e+01  1.264927e+01  9.372538e+00
    [6]  8.039250e+00  4.431035e+00  4.015122e+00  3.175564e+00  2.850917e+00
   [11]  2.526600e+00  2.093915e+00  1.498628e+00  1.381368e+00  1.018327e+00
   [16]  9.620487e-01  8.287019e-01  5.970333e-01  5.530209e-01  5.169851e-01
   [21]  4.684536e-01  4.021058e-01  3.682859e-01  3.280622e-01  3.001715e-01
   [26]  2.980192e-01  2.872444e-01  2.720233e-01  2.384146e-01  2.280178e-01
   [31]  2.087687e-01  1.979714e-01  1.847480e-01  1.733342e-01  1.620062e-01
   [36]  1.558518e-01  1.530042e-01  1.388383e-01  1.333259e-01  1.285169e-01
   [41]  1.195560e-01  1.187663e-01  1.186274e-01  1.154278e-01  1.026621e-01
   [46]  9.993302e-02  9.778238e-02  9.159631e-02  8.352311e-02  8.293866e-02
   [51]  8.275724e-02  7.965949e-02  7.748859e-02  7.649784e-02  7.454046e-02
   [56]  7.245887e-02  7.173403e-02  6.729299e-02  6.158358e-02  5.595878e-02
   [61]  5.435904e-02  5.417436e-02  5.222962e-02  5.178642e-02  5.146530e-02
   [66]  4.729646e-02  4.578939e-02  4.337204e-02  4.040028e-02  4.016111e-02
   [71]  3.990430e-02  3.630022e-02  3.580704e-02  3.553088e-02  3.552392e-02
   [76]  3.374089e-02  3.281195e-02  3.242253e-02  3.166035e-02  3.152171e-02
   [81]  3.128494e-02  2.988596e-02  2.849007e-02  2.790111e-02  2.788642e-02
   [86]  2.630267e-02  2.561997e-02  2.447837e-02  2.303578e-02  2.295230e-02
   [91]  2.276409e-02  2.232508e-02  2.169163e-02  2.072082e-02  1.989781e-02
   [96]  1.984872e-02  1.967206e-02  1.885717e-02  1.849818e-02  1.808655e-02
  [101]  1.684668e-02  1.561658e-02  1.516015e-02  1.468929e-02  1.376343e-02
  [106]  1.320884e-02  1.306958e-02  1.252726e-02  1.236774e-02  1.136243e-02
  [111]  1.062538e-02  1.023106e-02  9.389231e-03  9.266986e-03  8.936654e-03
  [116]  8.867800e-03  8.664530e-03  8.287420e-03  7.759390e-03  7.539940e-03
  [121]  7.206487e-03  5.692325e-03  5.071696e-03  4.841683e-03  4.770397e-03
  [126]  4.182213e-03  3.921874e-03  3.568019e-03  3.512833e-03  3.302054e-03
  [131]  3.234418e-03  2.965606e-03  2.450557e-03  2.420028e-03  2.251450e-03
  [136]  2.094887e-03  4.956313e-04  4.948536e-04 -9.316704e-16 -5.687145e-01

But if I only keep a handful of points, say, at steps of 10, it is PD:

  > eigen((dat$CX)[seq(1, nrow(dat$CX), 10), seq(1, ncol(dat$CX), 10)])$values
   [1] 20.2691400  7.5020270  2.9131083  1.6008677  1.4460118  1.3159806
   [7]  0.9280006  0.8912301  0.4820055  0.3641582  0.3362937  0.2537439
  [13]  0.2518024  0.2146302

Is the current definition of covariance between data points on the tree valid? David questions it. He says it makes more sense to use covariance definition similar to those in special statistics. He suggested I check a simple scenario where every data point comes from a “line”:

  n = 10
  dat = matrix(0,n,n)
  for (i in 1:n) {
      for (j in 1:n) dat[i,j] = min(i,j)
  }

But the result looks fine:

  > dat
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
   [1,]    1    1    1    1    1    1    1    1    1     1
   [2,]    1    2    2    2    2    2    2    2    2     2
   [3,]    1    2    3    3    3    3    3    3    3     3
   [4,]    1    2    3    4    4    4    4    4    4     4
   [5,]    1    2    3    4    5    5    5    5    5     5
   [6,]    1    2    3    4    5    6    6    6    6     6
   [7,]    1    2    3    4    5    6    7    7    7     7
   [8,]    1    2    3    4    5    6    7    8    8     8
   [9,]    1    2    3    4    5    6    7    8    9     9
  [10,]    1    2    3    4    5    6    7    8    9    10
  > eigen(dat)$values
   [1] 44.7660687  5.0489173  1.8730231  1.0000000  0.6431041  0.4652331
   [7]  0.3662089  0.3079785  0.2737868  0.2556796

The question is, can we prove the covariance matrix thus obtained is PD? Maybe my simulation code is wrong (which I hope!).