ossytrai8p
samolot
Dołączył: 14 Wrz 2010
Posty: 1149
Przeczytał: 0 tematów
Ostrzeżeń: 0/10 Skąd: England
|
Wysłany: Pon 3:06, 29 Lis 2010 Temat postu: puma ferrari 2010 2PuMA Bayesian analysis of part |
|
|
As empiricists have faced a meteoricly increasing pool of models from which to choose, many studies have explored objective methods for model choice (Minin et al., 2003; Posada and Buckley, 2004; Sullivan and Joyce, 2005). However, far less attention has been paid to whether the best model adequately accounts for the processes significant in the generation of a given dataset. This paucity of interest has occurred notwithstanding the evolution of such approaches over 15 years ago (Goldman,[link widoczny dla zalogowanych], 1993). One hindrance to the widespread use of model adequacy tests is a lack of software able to perform such tests for recently developed models that incorporate heterogeneity in process across sites, although model adequacy tests that include heterogeneity in rates can be performed in MAPPS (Bollback, 2002). Here,[link widoczny dla zalogowanych], we describe PuMA, software that implements tests of model adequacy in a Bayesian framework using posterior predictive simulation (Bollback, 2002). PuMA allows model adequacy tests to be performed for partitioned and mixture models of DNA sequence evolution. PuMA will facilitate much broader application of posterior predictive simulation tests of model adequacy, including much-needed benchmarking.
View this table: In this window In a new window Table 1.
Probabilistic approaches to phylogenetic inference require the specification of explicit models of sequence evolution. The dependence of resulting phylogenetic estimates on the underlying model of sequence evolution is well established (Lemmon and Moriarty, 2004; Swofford et al., 2001; Yang et al., 1994). Much work has been done to develop models of sequence evolution that incorporate the intricateities of the evolutionary mode crucial in empirical datasets [see Swofford et al. (1996) and references therein]. In particular, approaches that incorporate heterogeneity in the evolutionary process across sites have recently received much attention (Nylander et al., 2004; Pagel and Meade, 2004).
Open access options for authors - visit Oxford Open
What's this?
? The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions,[link widoczny dla zalogowanych], please email: [link widoczny dla zalogowanych] Previous Section? REFERENCES ? Bollback JP . Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. 2002;19:1171-1180. Abstract/FREE Full Text ? Gamerman D . Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. New York: Chapman and Hall; 1997. ? Gelman A, et al . Bayesian Data Analysis. New York: Chapman and Hall; 1995. ? Goldman N . Statistical tests of models of DNA substitution. J. Mol. Evol. 1993;36:182-198. CrossRefMedlineWeb of Science ? Huelsenbeck JP, Ronquist F . MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 2001;17:754-755. Abstract/FREE Full Text ? Hugall AF, et al . Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG-1. Syst. Biol. 2007;56:543-563. Abstract/FREE Full Text ? Lemmon AR, Moriarty EC . The importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. 2004;53:265-277. Abstract/FREE Full Text ? Li C, et al . Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci. Syst. Biol. 2008;57:519-539. Abstract/FREE Full Text ? Minin V, et al . Performance-based selection of likelihood models for phylogeny estimation. Syst. Biol. 2003;52:674-683. Abstract/FREE Full Text ? Nyearther JAA, et al . Bayesian phylogenetic analysis of combined data. Syst. Biol. 2004;53:47-67. Abstract/FREE Full Text ? Pagel M, Meade A . A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 2004;53:571-581. Abstract/FREE Full Text ? Posada D, Buckley TR . Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 2004;53:793-808. Abstract/FREE Full Text ? Rabeling C, et al . Newly discovered sister lineage sheds light on early ant evolution. Proc. Natl Acad. Sci. USA 2008;105:14913-14917. Abstract/FREE Full Text ? Rambaut A, Grassly NC . Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 1997;13:235-238. Abstract/FREE Full Text ? Rubin DB . Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 1984;12:1151-1172. CrossRefWeb of Science ? Sullivan J, Joyce P . Model selection in phylogenetics. Annu. Rev. Ecol. Evol. Syst. 2005;36:445-466. CrossRef ? Hillis DM, et al. Swofford DL, et al . Phylogenetic inference. In: Hillis DM, et al., editors. Molecular Systematics. 2nd edn. Sunderland, MA, USA: Sinauer Associates; 1996. p. 407-514. ? Swofford DL, et al . Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood avenues. Syst. Biol 2001;50:525-539. FREE Full Text ? Yang Z, et al . Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. 1994;11:316-324. Abstract ? Previous | Next Article ?Table of Contents This Article Bioinformatics (2009) 25 (4): 537-538. doi: 10.1093/bioinformatics/btn651 First published online: December 19, 2008 AbstractFree ? Full Text (HTML)Free Full Text (PDF)Free All Versions of this Article: btn651v1 25/4/537 most recent Classifications APPLICATIONS NOTE PHYLOGENETICS Services Email this article Alert me when cited Alert me if corrected Alert me if commented Find similar articles Similar articles in Web of Science Similar articles in PubMed Add to my archive Download citation Request Permissions Responses Submit a response No responses published Citing Articles Load citing article information Citing articles via CrossRef Citing articles via Scopus Citing articles via Web of Science Citing articles via Google Scholar Google Scholar Articles by Brown, J. M. Articles by ElDabaje, R. Search for related content PubMed PubMed citation Articles by Brown, J. M. Articles by ElDabaje, R. Related Content Load related web page information Share CiteULike Connotea Del.icio.us Facebook Twitter
Associate Editor: Martin Bishop
Model adequacy tests using the multinomial likelihood and comparison to Bayes factivityors (BFs), for example datasets
Skip Navigation
Contact: jembrown{at}mail.utexas.edu
Previous SectionNext Section ACKNOWLEDGEMENTS
Oxford Journals Contact Us My Basket My Account Bioinformatics About This Journal Contrule This Journal Subscriptions View Current Issue (Volume 26 Issue 23 December 2010) Archive Search Oxford Journals Life Sciences Bioinformatics Volume25, Issue4 Pp. 537-538. PuMA: Bayesian analysis of fractionitioned (and uncomponentitioned) model adequacy Jeremy M. Brown1,2,* and Robert ElDabaje1 1Section of Integrative Biology and 2Center for Computational Biology and Bioinformatics, University of Texas – Austin, Austin, TX 78712, USA *To whom correspondence should be addressed. Received August 14, 2008. Revision received December 17, 2008. Accepted December 17, 2008. ?Next Section Abstract
Flowchart for Bayesian phylogenetic analysis, including posterior predictive simulation for the assessment of model adequacy. Shaded analyses are implemented in PuMA.
Summary: The accuracy of Bayesian phylogenetic inference using molecular data depconclusions on the utilize of proper models of sequence evolution. Although choosing the optimal model available from a pool of alternatives has become standard practice in statistical phylogenetics, assessment of the chosen model's adequacy is infrequent. Programs for Bayesian phylogenetic inference have recently begun to implement models of sequence evolution that account for heterogeneity across sites beyond variation in rates of evolution, yet no program exists to assess the adequacy of these models. PuMA implements a posterior predictive simulation approach to assessing the adequacy of partitioned, unpartitioned and mixture models of DNA sequence evolution in a Bayesian context. Assessment of model adequacy allows empirical phylogeneticists to have appropriate confidence in their consequences and guides efforts to improve models of sequence evolution.
Conflict of Interest: none declared.
Navigate This Article Top Abstract 1 INTRODUCTION 2 PuMA ACKNOWLEDGEMENTS Footnotes REFERENCES Current Issue December 2010 26 (23) Alert me to new issues The Journal About this journal Calendar of events Rights & Permissions Dispatch date of the next issue This journal is a member of the Committee on Publication Ethics (COPE) Recent Comments An official journal of The International Society for Computational Biology Impact factor: 4.926 Editors-in-Chief A Bateman A Valencia View full editorial board For Authors Instructions to authors Online submission Submit Now! Self-archiving policy
This journal enables compliance with the NIH Public Access Policy Alerting Services Email table of contents Email Advance Access CiteTrack XML RSS feed Corposize Services Advertising sales Reprints Supplements Onlength ISSN 1460-2059 - Print ISSN 1367-4803 Copyright ?? 2010?Oxford University Press Oxford Journals Oxford University Press Site Map Privacy Policy Frequently Asked Questions Other Oxford University Press sites: Oxford University Press Oxford Journals China Oxford Journals Japan American stateal Biography Booksellers' Information Service Children's Fiction and Poetry Children's Reference Corporate & Special Sales Dictionaries Dictionary of National Biography Digital Reference English Language Teaching Higher Education Textbooks Humanities International Education Unit Law Medicine Music Online Products Oxford English Dictiondiffer Reference Rights and Permissions Science School Books Social Sciences Very Short Inpresentations World's Classics
PuMA is written in Java, extending the JPanel position, and uses Unix commands to manipulate output files. Therefore, it currently requires a Unix-based system (e.g. Mac OS X) that supports a GUI. PuMA calls Seq-Gen (Rambaut and Grassly, 1997) to simulate individual partitions and then combines all partitions into one dataset, if necessary. Analyses can be started using either the GUI interface or PuMA batch input files. PuMA is distributed both as a Java.jar application, as well as a native Mac OS X application. PuMA can also call MrConverge (by A. R. Lemmon; available from
View larger version: In this galeow In a new breezeow Download as PowerPoint Slide Fig. 1.
Previous SectionNext Section 2 PuMA 2.1 Posterior predictive simulation
Previous SectionNext Section 1 INTRODUCTION
Funding: National Science Foundation graduate research fellowship (to J.M.B.); Donald D. Harrington fellowship from the University of Texas – Austin (to J.M.B.).
PuMA implements a posterior predictive simulation approach to the testing of model adequacy (Gamerman, 1997; Gelman et al., 1995; Rubin, 1984), first introduced to phylogenetics by Bollback (2002). Posterior predictive simulation begins with a collection of parameter values and trees resulting from Markov chain Monte Carlo (MCMC) sampling of the posterior distribution during Bayesian phylogenetic analysis (Fig. 1). PuMA currently accepts input from unpartitioned and a priori partitioned analyses performed in MrBayes (Huelsenbeck and Ronquist, 2001), as well as mixture model analyses from BayesPhylogenies (Pagel and Meade, 2004). Each set of sampled parameter values and tree peak,ologies is used to simulate a predictive dataset of the same size as the original, employing the same model of sequence evolution assumed during analysis. If the model of sequence evolution adequately captures the salient features of the evolutionary process, the simulated datasets should ‘look’ very similar to the birthal dataset. The ‘look’ of a dataset is summarized by a test statistic [given by T(X), with X denoting a given dataset]. Well-designed test statistics can probe the adequacy of different assumptions underlying the model. PuMA saves all simulated datasets, allowing users to apply test statistics of their own choosing. PuMA's current implementation uses the unconstrained likelihood as a test statistic, which aims to assess model adequacy very generally (Bollback, 2002; Goldman, 1993). The unconstrained model interprets the data as a series of site patterns, each sampled with some fixed probability. The maximum likelihood estimate of the sampling probcapability for any given site pattern is simply the frequency with which that pattern is observed in the data (Goldman, 1993). Therefore, the unconstrained likelihood of an entire dataset is calculated as where M is the unconstrained model,[link widoczny dla zalogowanych], X is the dataset, n is the number of unique site patterns, Θ(i) is the i-th unique site pattern, NΘ(i) is the number of instances of Θ(i) in the dataset and N is the total number of sites. For convenience, the natural log of this likelihood is taken to be the test statistic. The posterior predictive distribution of T(X) consists of the set of T(X) values calculated from the datasets simulated using the posterior distribution of trees and parameter values. The posterior predictive P-value is the percentage of the posterior predictive distribution with T(X) values greater than or equal to the value of T(X) given by the original dataset. Example assessments of model adequacy for empirical data are given in Table 1. Note that model adequacy analyses may produce results that differ from standard model choice tests, due to effects of priors, the chosen test statistic, and the relative strength of the tests.
2.2 Implementation details
Previous SectionNext Section Footnotes
The comments of A.R. Lemmon, T.A. Heath and D.M. Hillis greatly improved this article.
Availcapacity: This program is available as source regulation, a Java.jar application, and a native Mac OS X application. It is distributed under the terms of the GNU General Public License at
Post został pochwalony 0 razy
|
|