################################################################################### # # LASTPIECE_P (Local Alignment-STate Probabilities that Insertion-type and dEletion-type gaps Co-Exist, type P): # A package of programs to compute the probabilities of gapped segments in each of which an insertion-type gap and a deletion-type gap co-exist, (which were referred to as case-(iv) gapped segments by (Ezawa 2016a)), # under a stochastic model of sequence evolution with biologically realistic insertions/deletions. # (Written almost exclusively in Perl.) # # Version 0.3: Copyright (C) 2020 Kiyoshi Ezawa # # This package is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This package is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License, "GNU_GPL.txt", # along with this package. If not, see . # # The author can be contacted by e-mailing to # (replace " dot " and " at " with "." and "@", respectively). # ################################################################################### # * [ Major modification from version 0.3 to version 0.3.1 ] * Information on the references, Ezawa 2020a,b,c, was updated. * [ This file was created while finishing version 0.3 of the LASTPIECE_P package. (See the bottom.) ] --------------------------------------------------------------------------------------------- << README for the archive, "ExOutputs_LASTPIECE.ver0.3.tgz" >> This archive (or the directory, "ExOutputs_LASTPIECE.ver0.3/," extracted from the archive), which accompanies the "LASTPIECE_P" package, contains some example outputs using either the main scripts of "LASTPIECE_P" (ver. 0.3) or those using the files in the "ANALYSES/" directory in the ""LASTPIECE_P" package (ver. 0.3) (Ezawa 2020a). You can either just refer to these files, or compare them with the results of your running either the main scripts or the scripts provided in "ANALYSES/" in "ANEX_P." This archive (or directory) contains two main sub-directories, "ANALYSES/" and "Outputs_LASTPIECE.ver0.3/." (1) The "ANALYSES/" sub-directory. This sub-directory has a directory structure nearly identical to that of "ANALYSES/" in "LASTPIECE_P" (ver. 0.3), except that the former lacks the "Inputs/" sub-sub-directory. (For explanations on the latter, refer to the "ANALYSES/README.ANALYSES.txt" file in "LASTPIECE_P" (ver. 0.3).) Under this sub-directory, every text file (except log files) show some results of some analysis, and every log file provides some important information on the parameter setting or other circumstances under which the analysis was performed. If you are unsure about the contents of these text files, refering to the nearby log-files, then to the scripts that output the files, may answer your questions. [NOTE] There are no Excel spreadsheets summarizing the results of these analyses; The only summary results are those described in (Ezawa 2020a). I apologize to the users for the inconvenience this may cause. (2) The "Outputs_LASTPIECE.ver0.3/" sub-directory. This sub-directory contains two archives, "Example_Outputs_UBC100_lite.tgz" and "Example_Outputs_UBC150_lite.tgz." Each of the archives contains a large log-file (recording the parameters and some diagnostic outputs) and the following sub-sub-directories. + "Basic/," which contains basic "ingredients" of the case-(iv) multiplication factors(, including case-(i),(ii) and (iii) multiplication factors). + "Final/," which contains the final results, that is, the tables of multiplication factors of case-(i) through (iv) gap-configurations; they can be fed into other packages, such as ANEX (Ezawa 2020b), LOLIPOG (Ezawa 2013b, Ezawa 2016c), and ComplLimMent (Ezawa 2016a). + "PreFinal/," in which the "pre-final" results from all evolution patterns of "A(ncestor)"/"D(escendant)" segments are gathered, along with the corresponding log-files. + "Summations/," which contains the results of summing all 2nd-order terms and summing all 3rd-order terms, as well as the results of summing them, along with the corresponding log-files. In a sense, these files are all "SUMMARIES" of the results. Among them, the files in "Summations/" can be regarded as the "SUMMARIES OF SUMMARIES"; they were actually used as inputs of some analyses recorded in the "ANALYSES/" sub-directory. Incidentally, the parameter settings are as follows. [Shared by both archives] {time-lapse} (= $TIME_FIN - $TIME_INIT) = 1 (substitutions/site) . #{Sub-time-intervals} = 100 . {Upper-bound of #{sites} for output} = 100 , ($TOTRATE_INS, $TOTRATE_DEL) = (0.1, 0.1) (indels/substitution) . Length distributions for insertions and deletions: power-law (with exponent = -1.6) for both. Cut-off lengths for insertions and deletions: 100 (sites) for both. [Specific to each archive] {Upper-bound of #{sites} for computation} = 100 for "Example_Outputs_UBC100_lite.tgz," = 150 for "Example_Outputs_UBC150_lite.tgz." ------------------------------------------------------------------------------------------------------------- [ References ] * Cartwright RA. 2005. "DNA assembly with gap (Dawg): simulating sequence evolution." Bioinformatics 21:iii31-iii38. * Ezawa K. 2013a. "DENSERM: DEtecting Negative SElection on Recurrent Mutations," in Bioinformatics.org [URL: "http colon slash slash www.bioinformatics.org slash ftp slash pub slash DENSERM" (replace ' colon ' and ' slash ' with ':' and '/', respectively)]. * Ezawa K. 2013b. "LOLIPOG: LOg-LIkelihood for the Pattern Of Gaps in MSA," in Bioinformatics.org [URL: "http colon slash slash www.bioinformatics.org slash ftp slash pub slash lolipog" (replace ' colon ' and ' slash ' with ':' and '/', respectively)]. * Ezawa K. 2016a. "Characterizing multiple sequence alignment errors using complete-likelihood score and position-shift map." BMC Bioinformatics 17:133; DOI: 10.1186/s12859-016-0945-5. * Ezawa K. 2016b. "General continuous-time Markov model of sequence evolution via insertions/deletions: Are alignment probabilities factorable?" BMC Bioinformatics 17:304; DOI: 10.1186/s12859-016-1105-7. * Ezawa K. 2016c. "General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation." BMC Bioinformatics 17:397; DOI: 10.1186/s12859-016-1167-6. * Ezawa K, Landan G, Graur D. 2013. "Detecting negative selection on recurrent mutations using gene genealogy." BMC Genetics. 14:37. * Ezawa K, Graur D, Landan G. 2015a. "Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part III: Algorithm for first approximation." bioRxiv doi:10.1101/023614. ## * Ezawa K, Graur D, Landan G. 2015b. "Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part IV: Incorporation of substitutions and other mutations." bioRxiv doi:10.1101/023622. * Ezawa K. 2020a. "New perturbation method to compute probabilities of mutually adjoining insertion-type and deletion-type gaps in ancestor-descendant pairwise sequence alignment under genuine sequence evolution model with realistic insertions/deletions: the 'last piece of the puzzle'." (preprint "KEZW_BI_ME00005.lastpiece.pdf" available at: https://www.bioinformatics.org/ftp/pub/anex/Documents/Preprints/.) * Ezawa K. 2020b. "Alingment Neighborhood EXplorer (ANEX): First attempt to apply genuine sequence evolution model with realistic insertions/deletions to Multiple Sequence Alignment reconstruction problem." (preprint "KEZW_BI_ME00006.anex.pdf" available at: https://www.bioinformatics.org/ftp/pub/anex/Documents/Preprints/.) * Ezawa K. 2020c. "Substitutional Residue-Difference Map (SRD Map) to help locate mis-alignments in Multiple Sequence Alignment (MSA): toward Artificial-Intelilgence-assisted probability distribution of alternative MSAs." (preprint "KEZW_BI_ME00007.srdmap.pdf" available at: https://www.bioinformatics.org/ftp/pub/anex/Documents/Preprints/.) # First version of this file was created on August 12th (Wed), 2020 by K. Ezawa. # It was rewritten on August 13th (Thu), 2020, by K. Ezawa, to update information on (Ezawa 2020a,b,c).