• Prediction of ionizing radiation resistance in bacteria using a multiple-instance learning model
    • (Accepted in Journal of Computational Biology)
  • Authors   |   Datasets   |   Results   |   Download

    • Authors:
      • Sabeur Aridhi

        Tel: +39 04 61 28 39 92
        Email: sabeur.aridhi@isima.fr
        Web page: http://fc.isima.fr/~aridhi/
        DISI, Department of Information Engineering and Computer Science
        University of Trento
        Via Sommarive 9, 38123 Trento, Italy
      • Haïtham Sghaier

        Email: sghaier.haitham@gmail.com
        Google Scholar

        National Center for Nuclear Sciences and Technology (CNSTN)
        Sidi Thabet Technopark
        Ariana 2020, Tunisia
      • Manel Zoghlami

        Tel: +33 4 73 40 76 29
        Email: manel.zoghlami@gmail.com

        LIMOS, UMR 6158
        Blaise Pascal University - Clermont University
        BP 10125, 63173, Clermont Ferrand, France
      • Mondher Maddouri

        Email: maddourimondher@yahoo.fr

        Faculty of sciences of Tunis, LIPAH
        University of Tunis El Manar
        1060, Tunis, Tunisia
      • Engelbert Mephu Nguifo

        Tel: +33 4 73 40 76 29
        Email: mephu@isima.fr
        Web page: http://www.isima.fr/~mephu/

        LIMOS, UMR 6158
        Blaise Pascal University - Clermont University
        BP 10125, 63173, Clermont Ferrand, France
    • Datasets:
    • General description
    • For our experiments, we constructed a database containing 14 IRRB and 14 IRSB. Table 1 presents the used Bacteria.
      Each bacterium contains 25 to 31 proteins implicated in basal DNA repair in IRRB (see Table 2).
    • Data sources
    • Proteins of the bacterium Deinococcus radiodurans were downloaded from the UniProt web site. http://www.uniprot.org/uniprot/
    • prfectBLAST tool was used to identify orthologous proteins of the other bacteria. Download prfectBLAST
    • Proteomes of other bacteria were downloaded from the NCBI FTP web site. http://www.ncbi.nlm.nih.gov/Ftp/
    • Used bacteria and proteins.
    • Table. 1: IRRB and IRSB learning set.
      Phenotype ID Bacterium Phylogenetic group D10
      IRRB B1 Chroococcidiopsis thermalis PCC 7203 Cyanobacteria 4*
      B2 Deinococcus deserti VCD115 Deinococcus-Thermus >7.5
      B3 Deinococcus geothermalis DSM 11300 Deinococcus-Thermus 10-16
      B4 Deinococcus gobiensis I 0 Deinococcus-Thermus 12.7
      B5 Deinococcus maricopensis DSM 21211 Deinococcus-Thermus ~11
      B6 Deinococcus proteolyticus MRP Deinococcus-Thermus >15
      B7 Deinococcus radiodurans R1 Deinococcus-Thermus 10
      B8 Geodermatophilus obscurus DSM 43160 Actinobacteria 9
      B9 Kineococcus radiotolerans SRS30216 Actinobacteria 2
      B10 Kocuria rhizophila DC2201 Actinobacteria 2**
      B11 Methylobacterium radiotolerans JCM 2831 Proteobacteria 1
      B12 Modestobacter marinus Actinobacteria 6
      B13 Rubrobacter xylanophilus DSM 9941 Actinobacteria 5.5
      B14 Truepera radiovictrix DSM 17093 Deinococcus-Thermus >5
      IRSB B15 Brucella abortus S19 Proteobacteria 0.34
      B16 Escherichia coli B REL606 Proteobacteria 0.7
      B17 Escherichia coli str. K-12 substr. DH10B Proteobacteria 0.7
      B18 Neisseria gonorrhoeae FA 1090 Proteobacteria 0.07-0.125
      B19 Neisseria gonorrhoeae TCDC NG08107 Proteobacteria 0.07-0.125
      B20 Pseudomonas putida S16 Proteobacteria 0.25
      B21 Shewanella oneidensis MR-1 Proteobacteria 0.07
      B22 Shigella dysenteriae1617 Proteobacteria 0.22
      B23 Thermus thermophilus HB27 Deinococcus-Thermus 0.8
      B24 Thermus thermophilus HB8 Deinococcus-Thermus 0.8***
      B25 Thermus thermophilus JL-18 Deinococcus-Thermus 0.8***
      B26 Thermus thermophilus SG0.5JP17-16 Deinococcus-Thermus 0.8***
      B27 Vibrio parahaemolyticus RIMD 2210633 Proteobacteria 0.03-0.06
      B28 Yersinia enterocolitica 8081 Proteobacteria 0.1-0.21

      *for Chroococcidiopsis spp
      **for Kocuria rosea
      ***for T. thermophilus HB27

    • Table. 2: Replication, repair and recombination proteins.
      ID Protein Function
      P1 Hypothetical DNA polymerase DNA polymerase
      P2 DNA polymerase III, α subunit
      P3 DNA-directed DNA polymerase
      P4 DNA polymerase III, τ/γ subunit
      P5 Single-stranded DNA-binding protein Replication complex
      P6 Replicative DNA helicase
      P7 DNA primase
      P8 DNA gyrase, subunit B
      P9 DNA topoisomerase I
      P10 DNA gyrase subunit A
      P11 smf protein Other DNA-associated proteins
      P12 Endonuclease III
      P13 Holliday junction resolvase
      P14 Formamidopyrimidine-DNA glycosylase
      P15 Holliday junction DNA helicase
      P16 RecF protein
      P17 DNA repair protein radA
      P18 Holliday junction binding protein
      P19 Excinuclease ABC, subunit C
      P20 DNA repair protein RecN
      P21 Transcription-repair coupling factor
      P22 Excinuclease ABC, subunit A
      P23 DNA helicase II
      P24 DNA helicase RecG
      P25 Exonuclease SbcD, putative
      P26 Exonuclease SbcC
      P27 Ribonuclease HII
      P28 Excinuclease ABC, subunit B
      P29 A/G-specific adenine glycosylase
      P30 RecA protein
      P31 DNA-3-methyladenine glycosidase II, putative
    • Results:
    • The computations were carried out on a i7 CPU 2.49 GHz PC with 6 GB of memory, operating on Linux Ubuntu. In the classification process, we used the Leave-One-Out (LOO) evaluation technique.


      Table 3. Experimental results of MIL-ALIGN with LOO-based evaluation technique.
      Used proteins Aggregation method Accuracy (%) Sensitivity (%) Specificity (%)
      All proteins SMS

      92.80

      92.80

      92.80

      WAMS 89.2 92.30 86.6
      DNA Polymerase proteins SMS 89.2 92.30 86.6
      WAMS 89.2 92.30 86.6
      Replication complex proteins SMS

      92.80

      92.80

      92.80

      WAMS

      92.80

      92.80

      92.80

      Other DNA-associated proteins SMS

      92.80

      92.80

      92.80

      WAMS

      92.80

      92.80

      92.80

    • Table. 4: Percentage of successful predictions using MIL.
      Phenotype Bacterium ID Successful predictions (%)
      IRRB B1 100
      B2 100
      B3 100
      B4 100
      B5 100
      B6 100
      B7 100
      B8 100
      B9 100
      B10 100
      B11 0
      B12 100
      B13 100
      B14 62.5*
      IRSBB15 0
      B16 100
      B17 100
      B18 100
      B19 100
      B20 100
      B21 100
      B22 100
      B23 100
      B24 100
      B25 100
      B26 100
      B27 100
      B28 100

      *successfully classified bacterium using: (1) all proteins with SMS aggregation method (2) replication complex proteins with SMS and WAMS aggregation methods and (3) other DNAassociated proteins with SMS and WAMS aggregation methods.

    • Table. 5: Learning results with the traditional setting of machine learning.
      Protein ID Accuracy (%) Sensitivity (%) Specificity (%)
      P1 85.7 100 77.7
      P2 89.2 92.3 86.6
      P3 82.1 90.9 76.4
      P4 89.2 92.3 86.6
      P5 89.2 92.3 86.6
      P6 89.2 92.3 86.6
      P7 89.2 92.3 86.6
      P8 78.5 83.3 75
      P9 89.2 92.3 86.6
      P1089.2 92.3 86.6
      P11 89.2 92.3 86.6
      P12 89.2 92.3 86.6
      P13 78.5 90 72.2
      P14 89.2 92.3 86.6
      P15 85.7 91.6 81.2
      P16 89.2 92.3 86.6
      P17 85.7 91.6 81.2
      P18 85.7 91.6 81.2
      P19 89.2 92.3 86.6
      P20 85.7 91.6 81.2
      P21 85.7 91.6 81.2
      P22 89.2 92.3 86.6
      P2389.2 92.3 86.6
      P24 89.2 92.3 86.6
      P25 85.7 91.6 81.2
      P26 82.1 90.9 76.4
      P27 82.1 100 73.6
      P28 89.2 92.3 86.6
      P29 78.5 90 72.2
      P30 89.2 92.3 86.6
      P31 78.5 78.5 78.5
    • Download:
    • MIL-ALIGN is used to predict bacterial ionizing radiation resistance using a multiple-instance learning model. It runs on a Windows or a UNIX platform that contains a Java Runtime Environment (JRE).
      MIL-ALIGN (64-bit) is downloadable here.
      MIL-ALIGN (32-bit) is downloadable here.
      You can download the dataset used in the experiments of the paper here.
      Instructions describing how to use MIL-ALIGN can be found in the ReadMe file. You can get a copy from here.
    • Number of visitors : 3001