Sains Malaysiana 47(12)(2018): 2951–2960
http://dx.doi.org/10.17576/jsm-2018-4712-03
Next Generation Sequencing-Data Analysis for Cellulose-
and Xylan-Degrading Enzymes from POME Metagenome
(Analisis Data-Penjujukan Generasi Seterusnya bagi
Enzim Selulosa dan Xilan Mendegradasi daripada Metagenom POME)
FARAH FADWA BENBELGACEM1, MOHD NOOR MAT ISA2, MUHAMMAD ALFATIH MUDDATHIR ABDELRAHIM3, AFIDALINA TUMIAN3, OUALID ABDELKADER BELLAG1, ADIBAH PARMAN1, IBRAHIM ALI NOORBATCHA1 & HAMZAH MOHD SALLEH4*
1Bioprocess & Molecular
Engineering Research Unit (BPMERU), Department of Biotechnology Engineering, Kulliyyah
of Engineering, International Islamic University Malaysia, Jalan Gombak, 53100
Kuala Lumpur, Federal Territory, Malaysia
2Malaysia Genome Institute, Jalan
Bangi, 43000 Kajang, Selangor Darul Ehsan, Malaysia
3Department of Computer Science, Kulliyyah
of Information and Communication Technology
International
Islamic University Malaysia, Jalan Gombak, 53100 Kuala Lumpur, Federal
Territory
Malaysia
4International Institute for Halal
Research and Training (INHART)
International
Islamic University Malaysia, Jalan Gombak, 53100 Kuala Lumpur, Federal
Territory
Malaysia
Received: 30 May 2018/Accepted: 18 September
2018
ABSTRACT
Metagenomic DNA library
from palm oil mill effluent (POME) was constructed and
subjected to high-throughput screening to find genes encoding cellulose- and
xylan-degrading enzymes. DNA of 30 positive fosmid clones were
sequenced with next generation sequencing technology and the raw data (short
insert-paired) was analyzed with bioinformatic tools. First, the quality of
64,821,599 reverse and forward sequences of 101 bp length raw data was tested
using Fastqc and SOLEXA. Then, raw data filtering was
carried out by trimming low quality values and short reads and the vector
sequences were removed and again the output was checked and the trimming was
repeated until a high quality read sets was obtained. The second step was the de novo assembly
of sequences to reconstruct 2900 contigs following de Bruijn graph
algorithm. Pre-assembled contigs were arranged in order, the distances between
contigs were identified and oriented with SSPACE,
where 2139 scaffolds have been reconstructed. 16,386 genes have been identified
after gene prediction using Prodigal and putative ID assignment
with Blastp vs NR protein. The acceptable strategy to
handle metagenomic NGS-data in order to detect known and
potentially unknown genes is presented and we showed the computational
efficiency of de Bruijn graph algorithm of de novo assembly to 21
bioprospect genes encoding cellulose-degrading enzymes and 6 genes encoding
xylan-degrading enzymes of 30.3% to 100% identity percentage.
Keywords: de Bruijn; de novo assembly; metagenomics; scaffold; SSPACE
ABSTRAK
Sebuah pangkalan data yang menyimpan
DNA
metagenom daripada efluen kilang minyak kelapa sawit
telah dibina dan disaring dengan menggunakan kaedah penyaringan
berskala besar untuk mencari enzim selulosa dan xilan. DNA daripada fosmid berklon positif telah
disusun dengan menggunakan teknologi penjujukan berskala besar dan
data mentah (dalam susunan pendek berpasangan) telah dianalisis
dengan kaedah bioinformatik. Pertama, kualiti susunan 64,821,599
balikan dan ke depan sebanyak 101 bp panjang data mentah telah diuji
menggunakan Fastqc dan SOLEXA.
Kemudian, penyaringan data mentah dilakukan dengan memotong susunan
yang berkualiti rendah dan pendek. Malah, vektor juga telah dikeluarkan
dan susunan output telah diperiksa dan ditrim berulang kali sehingga
set bacaan berkualiti tinggi diperoleh. Langkah kedua adalah himpunan
de novo iaitu
untuk menyusun semula 2900 contigs mengikut algoritma graf de
Bruijn. Contigs awal sebelum himpunan telah diatur mengikut susunan,
jarak antara contigs telah dikenal pasti berorientasikan SSPACE
dengan 2139 perancah telah dibina. 16,386 gen telah dikenal pasti
selepas kaedah peramalan gen menggunakan Prodigal dan penugasan
ID putatif dengan Blastp vs protein NR. Strategi yang betul dalam
mengendalikan data NGS-metagenom untuk mengesan gen-gen yang diketahui
dan juga yang berpotensi tetapi masih belum diketahui telah ditunjukkan.
Dalam kajian ini, kami menunjukkan kecekapan pengiraan komputer
berdasarkan algoritma graf himpunan de Bruijn de novo kepada
bioprospek 21 gen yang mengekodkan enzim selulosa dan 6 gen yang
mengekod enzim xilan daripada 30.3% kepada 100% peratusan identiti
yang serupa.
Kata
kunci: de
Bruijn; himpunan de novo; metagenom; perancah; SSPACE
REFERENCES
Armstrong,
Z., Mewis, K., Strachan, C. & Hallam, S.J. 2015. Biocatalysts for biomass
deconstruction from environmental genomics. Current Opinion in Chemical
Biology 29(18): 18-25.
Boetzer, M.,
Henkel, C.V., Jansen, H.J., Butler, D. & Pirovano, W. 2011. Scaffolding
pre-assembled contigs using SSPACE. Bioinformatics 27(4): 578-579.
Cox, M.P.,
Peterson, D.A. & Biggs, P.J. 2010. SolexaQA: At-a-glance quality assessment
of Illumina second-generation sequencing data. BMC Bioinformatics 11(1):
485.
Henson, J.,
Tischler, G. & Ning, Z. 2012. Next-generation sequencing and large genome
assemblies. Pharmacogenomics 13(8): 901-915.
Hyatt, D.,
Chen, G.L., LoCascio, P.F., Land, M.L., Larimer, F.W. & Hauser, L.J. 2010.
Prodigal: Prokaryotic gene recognition and translation initiation site
identification. BMC Bioinformatics 11(1): 119.
Kumar, S.,
Krishnani, K.K., Bhushan, B. & Brahmane, M.P. 2015. Metagenomics:
Retrospect and prospects in high throughput age. Biotechnology Research
International 2015: 1-13.
Li, Z., Chen,
Y., Mu, D., Yuan, J., Shi, Y., Zhang, H., Gan, J., Li, N., Hu, X., Liu, B.,
Yang, B. & Fan, W. 2012. Comparison of the two major classes of assembly
algorithms: Overlap-layout-consensus and de-bruijn-graph. Briefings in
Functional Genomics 11(1): 25-37.
Mewis, K.,
Armstrong, Z., Song, Y.C., Baldwin, S.A., Withers, S.G. & Hallam, S.J.
2013. Biomining active cellulases from a mining bioremediation system. Journal
of Biotechnology 167(4): 462-471.
Minakshi,
P., Ranjan, K., Brar, B., Ambawat, S., Shafiq, M., Alisha, A., Kumar, P.,
Ganesharao, J.V., Jakhar, S., Balodi, S., Singh, A. & Prasad, G. 2014. New
approaches for diagnosis of viral diseases in animals. Advances in Animal
and Veterinary Sciences 2(4S): 55-63.
Taupp, M.,
Mewis, K. & Hallam, S.J. 2011. The art and design of functional metagenomic
screens. Current Opinion in Biotechnology 22(3): 465-472.
Zerbino, D.R. & Birney,
E. 2008. Velvet: Algorithms for de novo short read assembly using de
Bruijn graphs. Genome Research 18(5): 821-829.
*Corresponding author; email:
hamzah@iium.edu.my
|