A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 6 Issue 6
Nov.  2019

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Weijun Zhu, Xiaokai Liu, Mingliang Xu and Huanmei Wu, "Predicting the Results of RNA Molecular Specific Hybridization Using Machine Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1384-1396, Nov. 2019. doi: 10.1109/JAS.2019.1911756
Citation: Weijun Zhu, Xiaokai Liu, Mingliang Xu and Huanmei Wu, "Predicting the Results of RNA Molecular Specific Hybridization Using Machine Learning," IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1384-1396, Nov. 2019. doi: 10.1109/JAS.2019.1911756

Predicting the Results of RNA Molecular Specific Hybridization Using Machine Learning

doi: 10.1109/JAS.2019.1911756
Funds:  This work was supported by the National Natural Science Foundation of China (U1204608, 61472370, 61672469, 61822701)
More Information
  • Ribonucleic acid (RNA) hybridization is widely used in popular RNA simulation software in bioinformatics. However, limited by the exponential computational complexity of combinatorial problems, it is challenging to decide, within an acceptable time, whether a specific RNA hybridization is effective. We hereby introduce a machine learning based technique to address this problem. Sample machine learning (ML) models tested in the training phase include algorithms based on the boosted tree (BT), random forest (RF), decision tree (DT) and logistic regression (LR), and the corresponding models are obtained. Given the RNA molecular coding training and testing sets, the trained machine learning models are applied to predict the classification of RNA hybridization results. The experiment results show that the optimal predictive accuracies are 96.2%, 96.6%, 96.0% and 69.8% for the RF, BT, DT and LR-based approaches, respectively, under the strong constraint condition, compared with traditional representative methods. Furthermore, the average computation efficiency of the RF, BT, DT and LR-based approaches are 208 679, 269 756, 184 333 and 187 458 times higher than that of existing approach, respectively. Given an RNA design, the BT-based approach demonstrates high computational efficiency and better predictive accuracy in determining the biological effectiveness of molecular hybridization.

     

  • loading
  • [1]
    K. Zimmermann, M. I. Martínez-Pérez, and I. Zoya, DNA Computing Models. US: Springer, 2008.
    [2]
    D. Faulhammer, A. R. Cukras, and R. J. Lipton, " Molecular computation: RNA solutions to chess problems,” in Proc. National Academy of Sciences, vol. 97, no. 4, pp. 1385–1389, Mar. 2000. doi: 10.1073/pnas.97.4.1385
    [3]
    S. C. Li, J. Xu, and L. Q. Pan, " Operational rules for digital coding of rna sequences based on DNA computing in high dimensional space,” Bulletin of Science &Technology, vol. 19, no. 6, pp. 461–465, Jun. 2003.
    [4]
    D. Ibrahim, " An overview of soft computing,” Procedia Computer Science, vol. 102, pp. 34–38, Dec. 2016. doi: 10.1016/j.procs.2016.09.366
    [5]
    B. Cohen and S. Skiena, " Optimizing RNA secondary structure over all possible encodings of a given protein,” Currents in Computational Biology Universal Academy, vol. 86, pp. 174–175, 2000.
    [6]
    H. E. Erhan, S. Sav, and S. Kalashnikov, " Examining the annealing schedules for RNA design algorithm,” in Proc. IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 2016, pp. 1295–1302.
    [7]
    J. Haleš, J. Maňuch, and Y. Ponty, Combinatorial RNA Design: Designability and Structure-Approximating Algorithm, USA: Springer International Publishing, 2015, pp. 231–246.
    [8]
    S. V. Le, J. H. Chen, and K. M. Currey, " A program for predicting significant RNA secondary structures,” Computer Applications in the Biosciences Cabios, vol. 4, no. 1, pp. 153–159, Mar. 1988.
    [9]
    J. Perochondorisse, F. Chetouani, and S. Aurel, " RNA-d2: a computer program for editing and display of RNA secondary structures,” Computer Applications in the Biosciences Cabios, vol. 11, no. 1, pp. 101–109, Feb. 1995.
    [10]
    NUPACK. NUPACK: Nucleic Acid Package. [Online]. Available: http://www.nupack.org. Jan. 7, 2019.
    [11]
    H. He and Y. Cao, " SSC: a classifier combination method based on signal strength,” IEEE Trans. Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1100–1117, May 2012. doi: 10.1109/TNNLS.2012.2198227
    [12]
    G. Sun, T. Chen, and Y. Su, " Internet traffic classification based on incremental support vector machines,” Mobile Networks and Applications, vol. 23, no. 4, pp. 789–796, Aug. 2018. doi: 10.1007/s11036-018-0999-x
    [13]
    G. Sun, L. Liang, and T. Chen, " Network traffic classification based on transfer learning,” Computers &Electrical Engineering, vol. 69, pp. 920–927, Jul. 2018. doi: 10.1016/j.compeleceng.2018.03.005
    [14]
    Z. Pan, S. Liu, and W. Fu, " A review of visual moving target tracking,” Multimedia Tools and Applications, vol. 76, no. 16, pp. 16989–17018, Aug. 2017. doi: 10.1007/s11042-016-3647-0
    [15]
    S. Liu, Z. Pan, and X. Cheng, " A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface,” Fractals, vol. 25, no. 4, pp. 1740004.1–174004.11, Jun. 2017. doi: 10.1142/S0218348X17400047
    [16]
    W. J. Zhu, J. P. C. Rodrigues, J. W. Niu, Q. L. Zhou, Y. F. Li, M. L. Xu, and B. H. Huang, " Detecting air-gapped attacks using machine learning, vol. 57, pp. 92–100, Oct. 2019.
    [17]
    J. Zhou, X. B. Li, and H. S. Mitri, " Comparative performance of six supervised learning methods for the development of models of hard rock pillar stability prediction,” Natural Hazards, vol. 79, no. 1, pp. 291–316, 2015. doi: 10.1007/s11069-015-1842-3
    [18]
    J. Zhou, X. B. Li, and H. S. Mitri, " Classification of rock burst in underground projects: comparison of ten supervised learning methods,” J. Computing in Civil Engineering, vol. 30, no. 5, pp. 04016003, 2016. doi: 10.1061/(ASCE)CP.1943-5487.0000553
    [19]
    L. M. Surhone, M. T. Tennoe, and S. F. Henssonow, " Random forest,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2010.
    [20]
    G. Yu, J. Yuan, and Z. Liu, " Unsupervised random forest indexing for fast action search,” in Proc. Computer Vision and Pattern Recognition, Colorado, CO, USA, Springs. 2011, pp. 865−872.
    [21]
    F. Herrera, " On the use of MapReduce for imbalanced big data using random forest,” Information Sciences, vol. 285, pp. 112–137, Nov. 2014. doi: 10.1016/j.ins.2014.03.043
    [22]
    Y. Cheng, X. Qiao, and X. Wang, " Random forest classifier for zero-shot learning based onrelative attribute,” IEEE Trans. Neural Networks &Learning Systems, vol. 29, no. 5, pp. 1662–1674, May 2018. doi: 10.1109/TNNLS.2017.2677441
    [23]
    M. Kumar and M. Thenmozhi, " Forecasting stock index movement: a comparison of support vector machines and random forest,” SSRN Electronic Journal, Jan. 2006. doi: 10.2139/ssrn.876544
    [24]
    X. J. Peng, S. Setlur, V. Govindaraju, and S. Ramachandrula, " Using a boosted tree classifier for text segmentation in hand-annotated documents,” Pattern Recognition Letters, vol. 33, no. 7, pp. 943–950, May 2012. doi: 10.1016/j.patrec.2011.09.007
    [25]
    C. Demirkir and B. Sankur, " Face detection using boosted tree classifier stages,” in Proc. 12th IEEE Signal Processing and Communications Applications Conf., Kusadasi, Turkey, 2004, pp. 575−578.
    [26]
    T. Parag and A. Elgammal, " Unsupervised learning of boosted tree classifier using graph cuts for hand pose recognition,” in Proc. British Machine Vision Conf., Edinburgh UK, 2006, pp. 1259–1268.
    [27]
    B. Wu and R. Nevatia, " Cluster boosted tree classifier for multi-view multi-pose object detection,” in Proc. IEEE Int. Conf. Computer Vision, Rio de Janeiro, Brazil, 2007, pp. 1–8.
    [28]
    D. Matthew, " Emotion recognition with boosted tree classifiers,” in Proc. ACM Int. Conf. Multimodal Interaction, Sydney, Australia, 2013, pp. 531–534.
    [29]
    J. H. Friedman, " Greedy function approximation: a gradient boosting machine,” Annal of Statistics, vol. 29, no. 5, pp. 1189–1232, Oct. 2001. doi: 10.1214/aos/1013203451
    [30]
    Scikit-learn. Ensemble Methods. [Online]. Available: http://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting.
    [31]
    M. Jaworski, P. Duda, and L. Rutkowski, " New splitting criteria for decision trees in stationary data streams,” IEEE Trans. Neural Networks &Learning Systems, vol. 29, no. 6, pp. 2516–2529, Jun. 2018. doi: 10.1109/TNNLS.2017.2698204
    [32]
    X. Dong, M. Qian, and R. Jiang, " Packet classification based on the decision tree with information entropy,” J. Supercomputing, pp. 1–15, Jan. 2018. doi: 10.1007/s11227-017-2227-z
    [33]
    K. Pliakos, P. Geurts, and C. Vens, " Global multi-output decision trees for interaction prediction,” Machine Learning, vol. 107, pp. 1257–1281, Sep. 2018. doi: 10.1007/s10994-018-5700-x
    [34]
    Y. Zhang, D. Kwon, and K. M. Pohl, " Computing group cardinality constraint solutions for logistic regression problems,” Medical Image Analysis, vol. 35, pp. 58–69, Jan. 2017. doi: 10.1016/j.media.2016.05.011
    [35]
    Y. Yang and L. Marco, " A benchmark and comparison of active learning for logistic regression,” Pattern Recognition, vol. 83, pp. 401–415, Nov. 2018. doi: 10.1016/j.patcog.2018.06.004
    [36]
    Graphlab Create. Fast, Scalable Machine Learning Modeling in Python. [Online]. Available: https://turi.com. Jan. 3, 2018.
    [37]
    H. He and E. A. Garcia, " Learning from imbalanced data,” IEEE Trans. Knowledge & Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sep. 2009, http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.239.
    [38]
    Receiver Operating Characteristic, Wikipedia, https://en.wikipedia.org/wiki/Receiver_operating_characteristic, [Online]. Available: Mar. 20, 2019.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(8)

    Article Metrics

    Article views (1110) PDF downloads(58) Cited by()

    Highlights

    • The RNA hybridization is one of the most important biochemical reactions.
    • How to compute RNA hybridizations in simulation software? It suffers from exponential complexity.
    • The above problem is reduced to a binary classification problem of machine learning in this paper.
    • As a result, the above problem has a polynomial approximate solution.
    • The new method is 269756 times faster than state of the art of RNA hybridization algorithm.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return