Sains Malaysiana 44(11)(2015):
1643-1651
Performance Comparison between Bootstrap and Multiscale Bootstrap
for Assessing Phylogenetic Tree for RNA polymerase
(Perbandingan Prestasi antara Butstrap dan Multiskala Butstrap untuk Menilai Pohon Filogenetik
bagi RNA polymerase)
SAFINAH SHARUDDIN*
& NORA MUDA
School
of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Darul Ehsan, Malaysia
Received:
19 January 2014/Accepted: 15 June 2015
ABSTRACT
Phylogenetic inference
refers to the reconstruction of evolutionary relationships among
various species that is usually presented in the form of a tree.
This study constructs the phylogenetic tree by using a novel distance-based
method known as Modified one step M-estimator (MOM) method. The branches of the
phylogenetic tree constructed were then evaluated to see their reliability.
The performance of the reliability was then compared between the
p-value of multiscale bootstrap (AU value) and bootstrap p-value (BP
value). The aim of this study was to compare the performance
between the AU
value and BP value for assessing phylogenetic tree of RNA polymerase. The results have
shown that multiscale bootstrap analysis can detect high sampling
errors but not in bootstrap analysis. To overcome this problem,
the multiscale bootstrap analysis has reduced the sampling error
by increasing the number of replications. The clusters were indicated
as significant if AU values
or BP values were 95% or higher. From the analysis, the results
showed that the BP and AU values
differ at 11th and 15th branch of the phylogenetic tree. The BP values
at both branches were 72 and 85%, respectively, thereby making the
cluster not significant but by looking at the AU values, the two branches were
more than 95% and the clusters were significant. This was due to
the biasness in calculation of the probability of bootstrap analysis,
therefore, the multiscale bootstrap analysis has improved the calculation
of the probability value compared to the bootstrap analysis.
Keywords:
Distance-based method; median absolute deviation (MADn);
modified one-step M-estimator (MOM); phylogenetic inference
ABSTRAK
Pentaabiran filogenetik merujuk
kepada pembinaan
semula hubungan evolusi dalam kalangan
pelbagai spesies
yang biasanya dibentangkan dalam bentuk pohon. Dalam kajian ini,
pohon filogenetik dibina menggunakan kaedah novel berdasarkan jarak yang dikenali sebagai kaedah Penganggar-M satu langkah terubah suai (MOM). Seterusnya
penilaian ke
atas pembinaan pohon filogenetik yang dibangunkan akan
dinilai bagi
menentukan kebolehpercayaan
terhadap cabang yang terbentuk. Perbandingan cabang-cabang pohon
filogenetik yang dibentuk
dinilai dengan melihat nilai-p bagi kaedah multiskala
butstrap (nilai
AU)
dan dibandingkan
dengan nilai-p bagi kaedah butstrap
(nilai BP). Tujuan utama
kajian ini
adalah untuk membandingkan
prestasi antara
nilai AU dan BP
bagi menilai pohon
filogenetik RNA polimerase. Keputusan mendapati bahawa
analisis multiskala
butstrap dapat mengesan ralat sampel yang tinggi berbanding analisis butstrap. Analisis multiskala butstrap
mengurangkan ralat
sampel ini dengan
menambahkan bilangan
replikasi. Kelompok dikatakan bererti
sekiranya tahap
keyakinan menunjukkan peratusan melebihi 95%.
Hasil mendapati nilai BP dan AU
berbeza pada cabang
ke-11 dan ke-15 dengan
nilai BP masing-masing
adalah 72% dan
85% seterusnya menjadikan kelompok itu tidak
bererti tetapi
sebenarnya bererti dengan nilai AU iaitu kedua-dua cabang melebihi 95%. Ini adalah disebabkan oleh pengiraan nilai kebarangkalian bagi analisis butstrap
adalah pincang.
Oleh itu, analisis multiskala
telah memperbaiki
pengiraan nilai kebarangkalian bagi analisis butstrap.
Kata kunci: Kaedah berdasarkan jarak; median sisihan mutlak (MADn); penganggar-M satu langkah terubahsuai (MOM); pentaabiran filogenetik
REFERENCES
Bremer,
Kr. 1994. Branch support and tree stability.
Cladistics 10: 295-304.
Dayhoff,
M.O. 1978. Survey of new data and computer methods of analysis.
Atlas of Protein Sequence and Structure 5(3): 9.
Efron,
B. 1979. Bootstrap methods: Another look at the jackknife. Ann.
Stat. 7: 1-26.
Efron, B.,
Halloran, E. & Holmes, S. 1996. Bootstrap confidence levels for
phylogenetic trees. Presented at Proc. Natl. Acad. Sci.
U.S.A.
Farris,
J.S., Albert, V.A., Källersjö, M., Lipscomb, D. &
Kluge, A.G. 1996. Parsimony jackkniffing outperforms neighbor-joining. Cladistics 12: 99-124.
Felsenstein, J.
1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783-791.
Felsenstein,
J. & Kishino, H. 1993. Is
there something wrong with the bootstrap on phylogenies? A
reply to Hillis and Bull. Syst. Biol. 42: 193-200.
Hillis,
D.M. & Bull, J.J. 1993. An empirical
test of bootstrapping as a method for assessing confidence in phylogenetic
analysis. Syst. Biol. 42: 182-192.
Li,
W.H. & Zharkikh, A. 1994. What
is the bootstrap technique? Syst. Biol. 43: 424-430.
Makarenkov,
V., Boc, A., Xie, J.,
Peres-Neto, P., Lapointe,
F-J. & Legendre, P. 2010. Weighted bootstrapping: A correction
method for assessing the robustness of phylogenetic trees. BMC. Evol. Biol. 10: 250.
Michener,
C.D. & Sokal, R.R. 1957. A quantitative approach to a problem in classification. Evolution 11: 130-162.
Muda,
N., Othman, A.R., Najimudin, N. & Hussein, Z.A.M.
2009. The phylogenetic tree of RNA polymerase constructed using MOM method. International Conference of Soft Computing and Pattern
Recognition, Malacca. pp. 484-489.
Schliep,
K.P. 2010. Phangorn: Phylogenetic
analysis in R. Bioinformatics 27: 592-593.
Shimodaira, H.
2002. An approximately unbiased test of phylogenetic tree
selection. Syst. Biol. 51: 492-508.
Sokal,
R.R. & Sneath, P.H.A. 1963. Principles of Numerical Taxonomy. San Francisco, CA:
W.H. Freeman.
Suzuki,
R. & Shimodaira, H. 2006. Pvclust: an R package for assessing the uncertainty in
hierarchical clustering. Bioinformatics 22: 1540-1542.
Wilkinson,
M. 1994. The permutation method and character compatibility. Syst. Biol. 43: 274-277.
Zharkikh, A.
& Li, W.H. 1992. Statistical properties of bootstrap estimation of
phylogenetic variability from nucleotide sequences. I. Four taxa with a
molecular clock. Mol. Biol. Evol. 9:
1119-1147.
*Corresponding author; email: safinahukm@gmail.com
|