Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.

Download Full-text

ASSESSMENT OF EFFICIENCY AND OFF-TARGET ACTIVITY OF CRISPR/CAS RIBONUCLEOPROTEIN COMPLEXES

Molecular Diagnostics and Biosafety – 2020. Russian national scientific and practical conference with international participation (October, 6–8, 2020): Conference Proceedings ◽

10.36233/978-5-9900432-9-9-98 ◽

2020 ◽

Author(s):

Y.V. Mikhaylova ◽

◽

M.A. Tyumentseva ◽

A.A. Shelenkov ◽

Y.G. Yanushevich ◽

...

Keyword(s):

High Sensitivity ◽

Correct Choice ◽

Target Sequence ◽

Dna Breaks ◽

Gene Encoding ◽

Guide Rna ◽

Guide Rnas ◽

Ribonucleoprotein Complexes ◽

Chemokine Receptor Ccr5 ◽

Target Activity

In this study, we assessed the efficiency and off-target activity of the CRISPR/CAS complex with one of the selected guide RNAs using the CIRCLE-seq technology. The gene encoding the human chemokine receptor CCR5 was used as a target sequence for genome editing. The results of this experiment indicate the correct choice of the guide RNA and efficient work of the CRISPR- CAS ribonucleoprotein complex used. CIRCLE-seq technology has shown high sensitivity compared to bioinformatic methods for predicting off-target activity of CRISPR/CAS complexes. We plan to evaluate the efficiency and off-target activity of CRISPR/CAS ribonucleoprotein complexes with other guide RNAs by slightly adjusting the CIRCLE-seq-technology protocol in order to reduce nonspecific DNA breaks and increase the number of reliable reads.

Download Full-text

MicroRNAs-1299, -126-3p and -30e-3p as Potential Diagnostic Biomarkers for Prediabetes

Diagnostics ◽

10.3390/diagnostics11060949 ◽

2021 ◽

Vol 11 (6) ◽

pp. 949

Author(s):

Cecil J. Weale ◽

Don M. Matshazi ◽

Saarah F. G. Davids ◽

Shanel Raghubeer ◽

Rajiv T. Erasmus ◽

...

Keyword(s):

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Predictive Ability ◽

Roc Curves ◽

Tolerance Test ◽

Cross Sectional Study ◽

Oral Glucose ◽

Cross Sectional ◽

Receiver Operating

This cross-sectional study investigated the association of miR-1299, -126-3p and -30e-3p with and their diagnostic capability for dysglycaemia in 1273 (men, n = 345) South Africans, aged >20 years. Glycaemic status was assessed by oral glucose tolerance test (OGTT). Whole blood microRNA (miRNA) expressions were assessed using TaqMan-based reverse transcription quantitative-PCR (RT-qPCR). Receiver operating characteristic (ROC) curves assessed the ability of each miRNA to discriminate dysglycaemia, while multivariable logistic regression analyses linked expression with dysglycaemia. In all, 207 (16.2%) and 94 (7.4%) participants had prediabetes and type 2 diabetes mellitus (T2DM), respectively. All three miRNAs were significantly highly expressed in individuals with prediabetes compared to normotolerant patients, p < 0.001. miR-30e-3p and miR-126-3p were also significantly more expressed in T2DM versus normotolerant patients, p < 0.001. In multivariable logistic regressions, the three miRNAs were consistently and continuously associated with prediabetes, while only miR-126-3p was associated with T2DM. The ROC analysis indicated all three miRNAs had a significant overall predictive ability to diagnose prediabetes, diabetes and the combination of both (dysglycaemia), with the area under the receiver operating characteristic curve (AUC) being significantly higher for miR-126-3p in prediabetes. For prediabetes diagnosis, miR-126-3p (AUC = 0.760) outperformed HbA1c (AUC = 0.695), p = 0.042. These results suggest that miR-1299, -126-3p and -30e-3p are associated with prediabetes, and measuring miR-126-3p could potentially contribute to diabetes risk screening strategies.

Download Full-text

Comparison of numerical and standard sarnat grading using the NICHD and SIBEN methods

Journal of Perinatology ◽

10.1038/s41372-021-01180-w ◽

2021 ◽

Author(s):

Brian H. Walsh ◽

Chelsea Munster ◽

Hoda El-Shibiny ◽

Edward Yang ◽

Terrie E. Inder ◽

...

Keyword(s):

Predictive Ability ◽

Roc Curves ◽

Cerebral Injury ◽

Neonatal Encephalopathy ◽

Minimum Threshold ◽

Term Outcome ◽

Long Term Outcome ◽

Grading Systems ◽

Good Agreement

Abstract Objective The NICHD and SIBEN assessments are adapted from the Sarnat grade, and used to determine severity of neonatal encephalopathy (NE). We compare NICHD and SIBEN methods, and their ability to define a minimum threshold associated with significant cerebral injury. Study design Between 2016 and 2019, 145 infants with NE (77-mild; 65-moderate; 3-severe) were included. NICHD and SIBEN grade and numerical scores were assigned. Kappa scores described agreement between methods, and ROC curves their ability to predict MR injury. Results Good agreement existed between grading systems (K = 0.86). SIBEN defined more infants as moderate, and less as mild, than NICHD (p < 0.001). Both numerical scores were superior to standard grades in predicting MR injury. Conclusion Despite good agreement between methods, SIBEN defines more infants as moderate NE. Both numerical scores were superior to standard grade, and comparable to each other, in defining a minimum threshold for cerebral injury. Further assessment contrasting their predictive ability for long-term outcome is required.

Download Full-text

Early Detection of Red Palm Weevil, Rhynchophorus ferrugineus (Olivier), Infestation Using Data Mining

Plants ◽

10.3390/plants10010095 ◽

2021 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Heba Kurdi ◽

Amal Al-Aldawsari ◽

Isra Al-Turaiki ◽

Abdulrahman S. Aldawood

Keyword(s):

Data Mining ◽

Plant Size ◽

Support Vector ◽

Classification Algorithms ◽

Palm Tree ◽

Rhynchophorus Ferrugineus ◽

Red Palm Weevil ◽

Palm Weevil ◽

Using Data ◽

F Measure

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.

Download Full-text

THE EXPLORATION OF CYP17A1 LIGAND SPACE BY THE QSAR MODEL

10.46793/iccbi21.439b ◽

2021 ◽

Author(s):

Natalia Boboriko ◽

◽

He Liying ◽

Yaraslau Dzichenka

Keyword(s):

Hydrophobic Effect ◽

Predictive Ability ◽

Distance Matrix ◽

Qsar Model ◽

Aromatic Rings ◽

Qsar Study ◽

Test Set ◽

Highly Active ◽

High Efficient ◽

F Measure

Cytochrome P450 17A1 (CYP17A1) is a critically important enzyme in humans that catalyzes the formation of all endogenous androgens. This enzyme is often considered a molecular target for the development of novel high efficient drugs against prostate cancer. In the present work, the random forest algorithm was used to conduct a QSAR study on 370 CYP17A1 ligands with different structures that were collected from the literature and databases, and a QSAR model was created based on the five important descriptors screened out – 2D adjacency and distance matrix descriptors, 2D atom counts and bond counts and 3D surface area, volume and shape descriptors. The model was verified by the test set (accuracy, specificity, sensitivity, F-measure, MCC, and AUC were calculated). It was revealed that the hydrophobic properties of the vdW surface of the ligand have a significant contribution to the activity prediction. The hydrophobic effect of the molecules may be aroused by the presence of the hydrophobic groups or aromatic rings in the molecules. The created QSAR model shows that the molecules with more aromatic rings have better activity. The accuracy of the model on the test set was 84%, precision – 81%, sensitivity – 93%, specificity – 72%, F-measure – 0.87, MCC – 0.67, AUC – 0.88. The model has good robustness and predictive ability and can be used to screen and discover new highly active CYP17A1 inhibitors.

Download Full-text

Landslide Susceptibility Assessment Based on Different MaChine Learning Methods in Zhaoping County of Eastern Guangxi

Remote Sensing ◽

10.3390/rs13183573 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3573

Author(s):

Chunfang Kong ◽

Yiping Tian ◽

Xiaogang Ma ◽

Zhengping Weng ◽

Zhiting Zhang ◽

...

Keyword(s):

Particle Swarm Optimization ◽

Random Forest ◽

Landslide Susceptibility ◽

Roc Curves ◽

Support Vector ◽

Swarm Optimization ◽

Svm Model ◽

Vector Machines ◽

Susceptibility Evaluation ◽

Landslide Disaster

Regarding the ever increasing and frequent occurrence of serious landslide disaster in eastern Guangxi, the current study was implemented to adopt support vector machines (SVM), particle swarm optimization support vector machines (PSO-SVM), random forest (RF), and particle swarm optimization random forest (PSO-RF) methods to assess landslide susceptibility in Zhaoping County. To this end, 10 landslide disaster-related variables including digital elevation model (DEM)-derived, meteorology-derived, Landsat8-derived, geology-derived, and human activities factors were provided. Of 345 landslide disaster locations found, 70% were used to train the models, and the rest of them were performed for model verification. The aforementioned four models were run, and landslide susceptibility evaluation maps were produced. Then, receiver operating characteristics (ROC) curves, statistical analysis, and field investigation were performed to test and verify the efficiency of these models. Analysis and comparison of the results denoted that all four landslide models performed well for the landslide susceptibility evaluation as indicated by the area under curve (AUC) values of ROC curves from 0.863 to 0.934. Among them, it has been shown that the PSO-RF model has the highest accuracy in comparison to other landslide models, followed by the PSO-SVM model, the RF model, and the SVM model. Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models. Furthermore, the landslide models devolved in the present study are promising methods that could be transferred to other regions for landslide susceptibility evaluation. In addition, the evaluation results can provide suggestions for disaster reduction and prevention in Zhaoping County of eastern Guangxi.

Download Full-text

Predictive Analytic Techniques to Identify Hidden Relationships between Training Load, Fatigue and Muscle Strains in Young Soccer Players

Sports ◽

10.3390/sports10010003 ◽

2021 ◽

Vol 10 (1) ◽

pp. 3

Author(s):

Mauro Mandorino ◽

António J. Figueiredo ◽

Gianluca Cima ◽

Antonio Tessitore

Keyword(s):

Area Under The Curve ◽

Predictive Ability ◽

Peak Height ◽

Neuromuscular Fatigue ◽

Training Load ◽

Soccer Players ◽

Height Velocity ◽

Support Vector ◽

Peak Height Velocity ◽

Injury Surveillance System

This study aimed to analyze different predictive analytic techniques to forecast the risk of muscle strain injuries (MSI) in youth soccer based on training load data. Twenty-two young soccer players (age: 13.5 ± 0.3 years) were recruited, and an injury surveillance system was applied to record all MSI during the season. Anthropometric data, predicted age at peak height velocity, and skeletal age were collected. The session-RPE method was daily employed to quantify internal training/match load, and monotony, strain, and cumulative load over the weeks were calculated. A countermovement jump (CMJ) test was submitted before and after each training/match to quantify players’ neuromuscular fatigue. All these data were used to predict the risk of MSI through different data mining models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM). Among them, SVM showed the best predictive ability (area under the curve = 0.84 ± 0.05). Then, Decision tree (DT) algorithm was employed to understand the interactions identified by the SVM model. The rules extracted by DT revealed how the risk of injury could change according to players’ maturity status, neuromuscular fatigue, anthropometric factors, higher workloads, and low recovery status. This approach allowed to identify MSI and the underlying risk factors.

Download Full-text

Integration of synthetic minority oversampling technique for imbalanced class

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i1.pp102-108 ◽

2019 ◽

Vol 13 (1) ◽

pp. 102

Author(s):

Noviyanti Santoso ◽

Wahyu Wibowo ◽

Hilda Hikmawati

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Class Imbalance ◽

Original Data ◽

Support Vector ◽

Classification Methods ◽

Problematic Issue ◽

Imbalanced Class ◽

F Measure

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.

Download Full-text

An Evolutionary-Based Sentiment Analysis Approach for Enhancing Government Decisions during COVID-19 Pandemic: The Case of Jordan

Applied Sciences ◽

10.3390/app11199080 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9080

Author(s):

Ruba Obiedat ◽

Osama Harfoushi ◽

Raneem Qaddoura ◽

Laila Al-Qaisi ◽

Ala’ M. Al-Zoubi

Keyword(s):

Decision Support ◽

Decision Support System ◽

Sentiment Analysis ◽

Support System ◽

Support Vector ◽

Whale Optimization ◽

Vector Machines ◽

Standard Classification ◽

The Government ◽

F Measure

The world has witnessed recently a global outbreak of coronavirus disease (COVID-19). This pandemic has affected many countries and has resulted in worldwide health concerns, thus governments are attempting to reduce its spread and impact on different aspects of life such as health, economics, education, and politics by making emergent decisions and policies (e.g., lockdown and social distancing). These new regulations influenced people’s daily life and cast significant burdens, concerns, and disparities on various population groups. Taking the wrong actions and enforcing bad decisions by some countries result in increasing the contagion rate and more catastrophic results. People start to post their opinions and feelings about their government’s decisions on different social media networks, and the data received through these platforms present a very useful source of information that affects how governments perceive and cope with the current the pandemic. Jordan was one of the top affected countries. In this paper, we proposed a decision support system based on the sentiment analysis mechanism by combining support vector machines with a whale optimization algorithm for automatically tuning the hyperparameters and performing feature weighting. The work is based on a hybrid evolutionary approach that aims to perform sentiment analysis combined with a decision support system to study people’s posts on Facebook to investigate their attitudes and feelings toward the government’s decisions during the pandemic. The government regulations were divided into two periods: the first and latter regulations. Studying public sentiments during these periods allows decision-makers in the government to sense people’s feelings, alert them in case of possible threats, and help in making proactive actions if needed to better handle the current pandemic situation. Five different versions were generated for each of the two collected datasets. The results demonstrate the superiority of the proposed Whale Optimization Algorithm & Support Vector Machines (WOA-SVM) against other metaheuristic algorithms and standard classification models as WOA-SVM has achieved 78.78% in terms of accuracy and 84.64% in term of f-measure, while other standard classification models such as NB, k-NN, J84, and SVM achieved an accuracy of 69.25%, 69.78%, 70.17%, and 69.29%, respectively, with 64.15%, 62.90%, 60.51%, and 59.09% F-measure. Moreover, when comparing our proposed WOA-SVM approach with other metaheuristic algorithms, which are GA-SVM, PSO-SVM, and MVO-SVM, WOA-SVM proved to outperform the other approaches with results of 78.78% in terms of accuracy and 84.64% in terms of F-measure. Further, we investigate and analyze the most relevant features and their effect to improve the decision support system of government decisions.

Download Full-text

Neutrophils, lymphocytes and their ratio as predictors of outcome in patients with COVID-19

ZHurnal «Patologicheskaia fiziologiia i eksperimental`naia terapiia» ◽

10.25557/0031-2991.2021.04.34-41 ◽

2021 ◽

pp. 34-41

Author(s):

Б.И. Кузник ◽

Ю.Н. Смоляков ◽

В.Х. Хавинсон ◽

К.Г. Шаповалов ◽

С.А. Лукьянов ◽

...

Keyword(s):

Roc Analysis ◽

Predictive Ability ◽

High Sensitivity ◽

Roc Curves ◽

Threshold Values ◽

Predictors Of Outcome ◽

Timely Manner ◽

Neutrophil Lymphocyte Ratio ◽

Lymphocyte Ratio ◽

Early Stages

Актуальность. До сих пор в литературе практически не существует работ, в которых бы описывались на ранних стадиях COVID-19 простые методы исследования, позволяющие прогнозировать исход этого коварного заболевания. Вместе с тем, наличие предикторов благоприятного и летального исходов при COVID-19 имеет важное значение, так как своевременно позволяет клиницисту вмешаться в тактику лечения больного. Цель исследования - разработка простых и доступных предикторов, позволяющих с большой долей вероятности на ранних стадиях заболевания COVID-19 прогнозировать его исход. Методика. Исследования проведены на 125 больных COVID-19, у которых на 1-, 5-, 7-, 10-, 14-е и 21-е сут пребывания в стационаре определялось число лейкоцитов, нейтрофилов, лимфоцитов и отношение нейтрофилы/лимфоциты (NEU/LYM). Для расчета пороговых значений выживаемости и летальности, имеющих предиктивную ценность, проводился ROC-анализ. Для оценки значимости роста AUC в динамике заболевания сопоставление ROC кривых производили попарно (1-5, 5-7, 7-10, 10-14 и 14-е - 21-е сут с использованием непараметрического алгоритма E.R. DeLong. Результаты. Установлено, что между числом лейкоцитов, нейтрофилов, лимфоцитов и отношением NEU/LYM у больных с благоприятным исходом и больных впоследствии умерших существуют значительные различия. Наиболее значимыми предикторами исхода заболевания при COVID-19 являются число нейтрофилов и особенно индекс NEU/LYM, при повышении которого резко возрастает вероятность летального исхода. С помощью ROC-анализа установлено, что уже в 1-е сут заболевания предсказательная способность (AUC) для отношения NEU/LYM в качестве предиктора исхода заболевания соответствовала 79%, к 5-м сут 84%, начиная с 10-х сут и до окончания исследования баланс качества этого теста превышал 90%. При высоких значениях показателей возможного летального исхода необходимо вводить иммуномодуляторы. Мы рекомендуем с этой целью применять комплекс полипептидов вилочковой железы - тималин, хорошо зарекомендовавший себя при лечении больных со среднетяжелым и тяжелым течением COVID-19. Заключение. Предиктором тяжелого течения и неблагоприятного исхода COVID-19 с высокой чувствительностью и специфичностью является отношение нейтрофилы/лимфоциты (индекс NEU/LYM). Background. There have been practically no reports that describe, in early stages of COVID-19, simple methods to predict the outcome of this insidious disease. At the same time, predictors of favorable or fatal COVID-19 outcome are important, since they would allow clinicians to adjust treatment in a timely manner. Aim. To develop simple and affordable predictors that are highly likely to forecast outcome at early stages of COVID-19. Methods. The study was conducted in 125 patients with COVID-19, in whom the number of leukocytes, neutrophils, lymphocytes, and the neutrophil/lymphocyte ratio (NEU/LYM) were determined on days 1, 5, 7, 10, 14, and 21 of hospitalization. To calculate predictive threshold values of survival and mortality, ROC analyses were performed. To assess the significance of changes in the areas under the ROC curves (AUC) in the illness dynamics, the ROC curves were compared in pairs (1-5, 5-7, 7-10, 10-14, 14-21 days) using the DeLong nonparametric algorithm. Results. There were significant differences between the number of leukocytes, neutrophils, lymphocytes, and the NEU/LYM ratio in patients with a favorable outcome and those that later died. The most significant outcome predictors were the number of neutrophils and, especially, the NEU/LYM index, with an increase in which, the likelihood of death sharply increased. The ROC-analysis showed that on day 1, the outcome predictive ability of AUC for the NEU/LYM ratio was 79%; by day 5, it increased to 84%; from day 10 to day 21, it exceeded 90 %. In the presence of high indicators for potentially lethal outcomes, it is necessary to administer immunomodulators. For this purpose, we recommend using a complex of polypeptides from the thymus gland, i.e., thymalin, which has proven beneficial for treatment of patients with moderate to severe COVID-19. Conclusion. The neutrophil/lymphocyte ratio predicts of the outcome of severe COVID-19 with high sensitivity and specificity.

Download Full-text