**# Biodegradability predictor

This bot uses the machine learning technique to analyse moleculesâ€™ chemical structure in order to predict their readiness for biodegradation. The model has been built upon 1056 dataset from QSAR Research Group and has been cross-validated (less than 30% missed positive labels, up to 87% accuracy).

The bot outputs either of the following labels:

- [yes]: Biodegradable
- [no]: Not biodegradable

The next molecular descriptors are required as inputs when feeding the bot. More info is available in [1]:

- SpMax_L: Leading eigenvalue from Laplace matrix
- J_Dz_e: Balaban-like index from Barysz matrix weighted by Sanderson electronegativity
- nHM: Number of heavy atoms
- F01_N_N: Frequency of N-N at topological distance 1
- F04_C_N: Frequency of C-N at topological distance 4
- NssssC: Number of atoms of type ssssC
- nCb_: Number of substituted benzene C(sp2)
- C: Percentage of C atoms
- nCp: Number of terminal primary C(sp3)
- nO: Number of oxygen atoms
- F03_C_N: Frequency of C-N at topological distance 3
- SdssC: Sum of dssC E-states
- HyWi_B_m: Hyper-Wiener-like index (log function) from Burden matrix weighted by mass
- LOC: Lopping centric index
- SM6_L: Spectral moment of order 6 from Laplace matrix
- F03_C_O: Frequency of C – O at topological distance 3
- Me: Mean atomic Sanderson electronegativity (scaled on Carbon atom)
- Mi: Mean first ionization potential (scaled on Carbon atom)
- nN_N: Number of N hydrazines
- nArNO2: Number of nitro groups (aromatic)
- nCRX3: Number of CRX3
- SpPosA_B_p: Normalized spectral positive sum from Burden matrix weighted by polarizability
- nCIR: Number of circuits
- B01_C_Br: Presence/absence of C – Br at topological distance 1
- B03_C_Cl: Presence/absence of C – Cl at topological distance 3
- N_073: Ar2NH / Ar3N / Ar2N-Al / R..N..R
- SpMax_A: Leading eigenvalue from adjacency matrix (Lovasz-Pelikan index)
- Psi_i_1d: Intrinsic state pseudoconnectivity index – type 1d
- B04_C_Br: Presence/absence of C – Br at topological distance 4
- SdO: Sum of dO E-states
- TI2_L: Second Mohar index from Laplace matrix
- nCrt: Number of ring tertiary C(sp3)
- C_026
- F02_C-N: Frequency of C – N at topological distance 2
- nHDon: Number of donor atoms for H-bonds (N and O)
- SpMax_B_m: Leading eigenvalue from Burden matrix weighted by mass
- Psi_i_A: Intrinsic state pseudoconnectivity index – type S average
- nN: Number of Nitrogen atoms
- SM6_B_m: Spectral moment of order 6 from Burden matrix weighted by mass
- nArCOOR: Number of esters (aromatic)
- nX: Number of halogen atoms

[1] Data source: Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V. (2013). Quantitative Structure – Activity Relationship models for ready biodegradability of chemicals. Journal of Chemical Information and Modeling, 53, 867-878

