Sains Malaysiana 44(11)(2015): 1643–1651
Performance
Comparison between Bootstrap and Multiscale Bootstrap for Assessing
Phylogenetic Tree for RNA polymerase
(Perbandingan Prestasi antara Butstrap dan Multiskala Butstrap untuk
Menilai Pohon Filogenetik bagi RNA polymerase)
SAFINAH SHARUDDIN*
& NORA MUDA
School of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan
Malaysia, 43600 Bangi, Selangor Darul Ehsan, Malaysia
Diserahkan: 19 Januari 2014/Diterima: 15 Jun 2015
ABSTRACT
Phylogenetic inference refers to the reconstruction of evolutionary
relationships among various species that is usually presented in
the form of a tree. This study constructs the phylogenetic tree
by using a novel distance-based method known as Modified one step
M-estimator (MOM)
method. The branches of the phylogenetic tree constructed were then
evaluated to see their reliability. The performance of the reliability
was then compared between the p-value of multiscale bootstrap (AU value)
and bootstrap p-value (BP value). The aim of this study was
to compare the performance between the AU value and BP value
for assessing phylogenetic tree of RNA polymerase. The results have shown that multiscale
bootstrap analysis can detect high sampling errors but not in bootstrap
analysis. To overcome this problem, the multiscale bootstrap analysis
has reduced the sampling error by increasing the number of replications.
The clusters were indicated as significant if AU values
or BP values were 95% or higher. From the analysis, the results
showed that the BP and AU values
differ at 11th and 15th branch of the phylogenetic tree. The BP values
at both branches were 72 and 85%, respectively, thereby making the
cluster not significant but by looking at the AU values, the two branches were
more than 95% and the clusters were significant. This was due to
the biasness in calculation of the probability of bootstrap analysis,
therefore, the multiscale bootstrap analysis has improved the calculation
of the probability value compared to the bootstrap analysis.
Keywords: Distance-based method; median absolute deviation (MADn);
modified one-step M-estimator (MOM); phylogenetic inference
ABSTRAK
Pentaabiran filogenetik merujuk kepada pembinaan
semula hubungan evolusi dalam kalangan pelbagai spesies yang biasanya
dibentangkan dalam bentuk pohon. Dalam kajian ini, pohon filogenetik dibina menggunakan
kaedah novel berdasarkan jarak yang dikenali sebagai kaedah Penganggar-M
satu langkah terubah suai (MOM). Seterusnya penilaian
ke atas pembinaan pohon filogenetik yang dibangunkan akan dinilai bagi menentukan kebolehpercayaan terhadap cabang
yang terbentuk. Perbandingan cabang-cabang pohon
filogenetik yang dibentuk dinilai dengan melihat nilai-p bagi kaedah
multiskala butstrap (nilai AU) dan dibandingkan dengan nilai-p bagi
kaedah butstrap (nilai BP). Tujuan utama kajian
ini adalah untuk membandingkan prestasi antara nilai AU dan
BP
bagi menilai pohon filogenetik RNA polimerase. Keputusan mendapati
bahawa analisis multiskala butstrap dapat mengesan ralat sampel
yang tinggi berbanding analisis butstrap. Analisis
multiskala butstrap mengurangkan ralat sampel ini dengan menambahkan
bilangan replikasi. Kelompok dikatakan
bererti sekiranya tahap keyakinan menunjukkan peratusan melebihi
95%. Hasil mendapati nilai BP
dan AU
berbeza pada cabang ke-11 dan ke-15 dengan nilai BP masing-masing
adalah 72% dan 85% seterusnya menjadikan kelompok itu tidak bererti
tetapi sebenarnya bererti dengan nilai AU iaitu kedua-dua cabang melebihi
95%. Ini adalah disebabkan oleh pengiraan nilai
kebarangkalian bagi analisis butstrap adalah pincang. Oleh
itu, analisis multiskala telah memperbaiki pengiraan nilai kebarangkalian
bagi analisis butstrap.
Kata kunci: Kaedah berdasarkan jarak; median
sisihan mutlak (MADn); penganggar-M satu
langkah terubahsuai (MOM); pentaabiran
filogenetik
RUJUKAN
Bremer, Kr. 1994. Branch support and tree stability. Cladistics
10: 295-304.
Dayhoff, M.O. 1978. Survey of new data and computer methods
of analysis. Atlas of Protein Sequence and Structure 5(3):
9.
Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann.
Stat. 7: 1-26.
Efron, B., Halloran, E. & Holmes, S. 1996. Bootstrap
confidence levels for phylogenetic trees. Presented at Proc. Natl. Acad.
Sci. U.S.A.
Farris, J.S., Albert, V.A., Källersjö,
M., Lipscomb, D. & Kluge, A.G. 1996. Parsimony jackkniffing outperforms neighbor-joining. Cladistics 12:
99-124.
Felsenstein, J. 1985. Confidence limits on phylogenies: An
approach using the bootstrap. Evolution 39: 783-791.
Felsenstein, J. & Kishino, H. 1993. Is there something
wrong with the bootstrap on phylogenies? A reply to Hillis
and Bull. Syst. Biol. 42: 193-200.
Hillis, D.M. & Bull, J.J. 1993. An
empirical test of bootstrapping as a method for assessing confidence in
phylogenetic analysis. Syst. Biol. 42: 182-192.
Li, W.H. & Zharkikh, A. 1994. What is the bootstrap
technique? Syst. Biol. 43: 424-430.
Makarenkov, V., Boc, A., Xie, J.,
Peres-Neto, P., Lapointe, F-J. &
Legendre, P. 2010. Weighted bootstrapping: A correction method for assessing
the robustness of phylogenetic trees. BMC. Evol. Biol. 10: 250.
Michener, C.D. & Sokal, R.R. 1957. A
quantitative approach to a problem in classification. Evolution 11:
130-162.
Muda, N., Othman, A.R., Najimudin, N.
& Hussein, Z.A.M. 2009. The
phylogenetic tree of RNA polymerase constructed using MOM method. International Conference of Soft Computing and Pattern
Recognition, Malacca. pp. 484-489.
Schliep, K.P. 2010. Phangorn: Phylogenetic analysis in R. Bioinformatics 27: 592-593.
Shimodaira, H. 2002. An approximately
unbiased test of phylogenetic tree selection. Syst. Biol. 51:
492-508.
Sokal, R.R. & Sneath, P.H.A. 1963. Principles
of Numerical Taxonomy. San Francisco, CA: W.H. Freeman.
Suzuki, R. & Shimodaira, H. 2006. Pvclust: an R package
for assessing the uncertainty in hierarchical clustering. Bioinformatics 22:
1540-1542.
Wilkinson, M. 1994. The permutation method
and character compatibility. Syst. Biol. 43: 274-277.
Zharkikh,
A. & Li, W.H. 1992. Statistical properties of bootstrap
estimation of phylogenetic variability from nucleotide sequences. I.
Four taxa with a molecular clock. Mol. Biol. Evol. 9: 1119-1147.
*Pengarang
untuk surat-menyurat; email: safinahukm@gmail.com
|