scholarly journals Modeling CRISPR gene drives for suppression of invasive rodents using a supervised machine learning framework

2021 ◽  
Vol 17 (12) ◽  
pp. e1009660
Author(s):  
Samuel E. Champer ◽  
Nathan Oakes ◽  
Ronin Sharma ◽  
Pablo García-Díaz ◽  
Jackson Champer ◽  
...  

Invasive rodent populations pose a threat to biodiversity across the globe. When confronted with these invaders, native species that evolved independently are often defenseless. CRISPR gene drive systems could provide a solution to this problem by spreading transgenes among invaders that induce population collapse, and could be deployed even where traditional control methods are impractical or prohibitively expensive. Here, we develop a high-fidelity model of an island population of invasive rodents that includes three types of suppression gene drive systems. The individual-based model is spatially explicit, allows for overlapping generations and a fluctuating population size, and includes variables for drive fitness, efficiency, resistance allele formation rate, as well as a variety of ecological parameters. The computational burden of evaluating a model with such a high number of parameters presents a substantial barrier to a comprehensive understanding of its outcome space. We therefore accompany our population model with a meta-model that utilizes supervised machine learning to approximate the outcome space of the underlying model with a high degree of accuracy. This enables us to conduct an exhaustive inquiry of the population model, including variance-based sensitivity analyses using tens of millions of evaluations. Our results suggest that sufficiently capable gene drive systems have the potential to eliminate island populations of rodents under a wide range of demographic assumptions, though only if resistance can be kept to a minimal level. This study highlights the power of supervised machine learning to identify the key parameters and processes that determine the population dynamics of a complex evolutionary system.

2020 ◽  
Author(s):  
Samuel E. Champer ◽  
Nathan Oakes ◽  
Ronin Sharma ◽  
Pablo García-Díaz ◽  
Jackson Champer ◽  
...  

ABSTRACTInvasive rodent populations pose a threat to biodiversity across the globe. When confronted with these new invaders, native species that evolved independently are often defenseless. CRISPR gene drive systems could provide a solution to this problem by spreading transgenes among invaders that induce population collapse. Such systems might be deployed even where traditional control methods are impractical or prohibitively expensive. Here, we develop a high-fidelity model of an island population of invasive rodents that includes three types of suppression gene drive systems. The individual-based model is spatially explicit and allows for overlapping generations and a fluctuating population size. Our model includes variables for drive fitness, efficiency, resistance allele formation rate, as well as a variety of ecological parameters. The computational burden of evaluating a model with such a high number of parameters presents a substantial barrier to a comprehensive understanding of its outcome space. We therefore accompany our population model with a meta-model that utilizes supervised machine learning to approximate the outcome space of the underlying model with a high degree of accuracy. This enables us to conduct an exhaustive inquiry of the population model, including variance-based sensitivity analyses using tens of millions of evaluations. Our results suggest that sufficiently capable gene drive systems have the potential to eliminate island populations of rodents under a wide range of demographic assumptions, but only if resistance can be kept to a minimal level. This study highlights the power of supervised machine learning for identifying the key parameters and processes that determine the population dynamics of a complex evolutionary system.


Author(s):  
Sylvia Aponte-Hao ◽  
Sabrina T. Wong ◽  
Manpreet Thandi ◽  
Paul Ronksley ◽  
Kerry McBrien ◽  
...  

IntroductionFrailty is a medical syndrome, commonly affecting people aged 65 years and over and is characterized by a greater risk of adverse outcomes following illness or injury. Electronic medical records contain a large amount of longitudinal data that can be used for primary care research. Machine learning can fully utilize this wide breadth of data for the detection of diseases and syndromes. The creation of a frailty case definition using machine learning may facilitate early intervention, inform advanced screening tests, and allow for surveillance. ObjectivesThe objective of this study was to develop a validated case definition of frailty for the primary care context, using machine learning. MethodsPhysicians participating in the Canadian Primary Care Sentinel Surveillance Network across Canada were asked to retrospectively identify the level of frailty present in a sample of their own patients (total n = 5,466), collected from 2015-2019. Frailty levels were dichotomized using a cut-off of 5. Extracted features included previously prescribed medications, billing codes, and other routinely collected primary care data. We used eight supervised machine learning algorithms, with performance assessed using a hold-out test set. A balanced training dataset was also created by oversampling. Sensitivity analyses considered two alternative dichotomization cut-offs. Model performance was evaluated using area under the receiver-operating characteristic curve, F1, accuracy, sensitivity, specificity, negative predictive value and positive predictive value. ResultsThe prevalence of frailty within our sample was 18.4%. Of the eight models developed to identify frail patients, an XGBoost model achieved the highest sensitivity (78.14%) and specificity (74.41%). The balanced training dataset did not improve classification performance. Sensitivity analyses did not show improved performance for cut-offs other than 5. ConclusionSupervised machine learning was able to create well performing classification models for frailty. Future research is needed to assess frailty inter-rater reliability, and link multiple data sources for frailty identification.


2020 ◽  
Vol 29 (03n04) ◽  
pp. 2060009
Author(s):  
Tao Ding ◽  
Fatema Hasan ◽  
Warren K. Bickel ◽  
Shimei Pan

Social media contain rich information that can be used to help understand human mind and behavior. Social media data, however, are mostly unstructured (e.g., text and image) and a large number of features may be needed to represent them (e.g., we may need millions of unigrams to represent social media texts). Moreover, accurately assessing human behavior is often difficult (e.g., assessing addiction may require medical diagnosis). As a result, the ground truth data needed to train a supervised human behavior model are often difficult to obtain at a large scale. To avoid overfitting, many state-of-the-art behavior models employ sophisticated unsupervised or self-supervised machine learning methods to leverage a large amount of unsupervised data for both feature learning and dimension reduction. Unfortunately, despite their high performance, these advanced machine learning models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important to behavior scientists and public health providers, we explore new methods to build machine learning models that are not only accurate but also interpretable. We evaluate the effectiveness of the proposed methods in predicting Substance Use Disorders (SUD). We believe the methods we proposed are general and applicable to a wide range of data-driven human trait and behavior analysis applications.


Author(s):  
Yuning Wu ◽  
Xuan Zhu ◽  
Chi-Luen Huang ◽  
Sangmin Lee ◽  
Marcus Dersch ◽  
...  

Abstract Effective Rail Neutral Temperature (RNT) management is needed for continuous welded rail (CWR). RNT is the temperature at which the longitudinal stress of a rail is zero. Due to the lack of expansion joints, CWR develops internal tensile or compressive stresses when the rail temperature is below or above, respectively, the RNT. Mismanagement of RNT can lead to rail fracture or buckling when thermal stresses exceed the limits of rail steel. In this work, we propose an effective RNT estimation method structured around four hypotheses. The work leverages field-collected vibration test data, high-fidelity numerical models, and machine learning techniques. First, a contactless non-destructive and non-disruptive sensing technology was developed to collect real-world rail vibrational data. Second, the team established an instrumented field test site at a revenue-service line in the state of Illinois and performed multi-day data collection to cover a wide range of temperature and thermal stress levels. Third, numerical models were developed to understand and predict rail vibration behavior under the influence of temperature and longitudinal load. Excellent agreement between model and experimental results were obtained using an optimization approach. Finally, a supervised machine learning algorithm was developed to estimate RNT using the field-collected rail vibration data. Sensitivity studies and error analyses were included in this work. The system performance with field data indicates that the proposed framework can support reasonable RNT estimation accuracy when measurement or model noise is low.


Neurosurgery ◽  
2020 ◽  
Author(s):  
Nicolai Maldaner ◽  
Anna M Zeitlberger ◽  
Marketa Sosnova ◽  
Johannes Goldberg ◽  
Christian Fung ◽  
...  

Abstract BACKGROUND Current prognostic tools in aneurysmal subarachnoid hemorrhage (aSAH) are constrained by being primarily based on patient and disease characteristics on admission. OBJECTIVE To develop and validate a complication- and treatment-aware outcome prediction tool in aSAH. METHODS This cohort study included data from an ongoing prospective nationwide multicenter registry on all aSAH patients in Switzerland (Swiss SOS [Swiss Study on aSAH]; 2009-2015). We trained supervised machine learning algorithms to predict a binary outcome at discharge (modified Rankin scale [mRS] ≤ 3: favorable; mRS 4-6: unfavorable). Clinical and radiological variables on admission (“Early” Model) as well as additional variables regarding secondary complications and disease management (“Late” Model) were used. Performance of both models was assessed by classification performance metrics on an out-of-sample test dataset. RESULTS Favorable functional outcome at discharge was observed in 1156 (62.0%) of 1866 patients. Both models scored a high accuracy of 75% to 76% on the test set. The “Late” outcome model outperformed the “Early” model with an area under the receiver operator characteristics curve (AUC) of 0.85 vs 0.79, corresponding to a specificity of 0.81 vs 0.70 and a sensitivity of 0.71 vs 0.79, respectively. CONCLUSION Both machine learning models show good discrimination and calibration confirmed on application to an internal test dataset of patients with a wide range of disease severity treated in different institutions within a nationwide registry. Our study indicates that the inclusion of variables reflecting the clinical course of the patient may lead to outcome predictions with superior predictive power compared to a model based on admission data only.


2020 ◽  
Vol 3 (S1) ◽  
Author(s):  
Michael Egger ◽  
Günther Eibl ◽  
Dominik Engel

Abstract Electrical networks of transmission system operators are mostly built up as isolated networks without access to the Internet. With the increasing popularity of smart grids, securing the communication network has become more important to avoid cyber-attacks that could result in possible power outages. For misuse detection, signature-based approaches are already in use and special rules for a wide range of protocols have been developed. However, one big disadvantage of signature-based intrusion detection is that zero-day exploits cannot be detected. Machine-learning-based anomaly detection methods have the potential to achieve that. In this paper, various such methods for intrusion detection in substations, which use the asynchronous communication protocol International Electrotechnical Commission (IEC) 60870-5-104, are tested and compared. The evaluation of the proposed methods is performed by applying them to a data set which includes normal operation traffic and four different attacks. While the results of supervised and semi-supervised machine learning approaches are rather encouraging, the unsupervised and signature-based methods suffer from general bad performance and had difficulties to detect some attacks.


2019 ◽  
Vol 286 (1914) ◽  
pp. 20191606 ◽  
Author(s):  
John Godwin ◽  
Megan Serr ◽  
S. Kathleen Barnhill-Dilling ◽  
Dimitri V. Blondel ◽  
Peter R. Brown ◽  
...  

Invasive rodents impact biodiversity, human health and food security worldwide. The biodiversity impacts are particularly significant on islands, which are the primary sites of vertebrate extinctions and where we are reaching the limits of current control technologies. Gene drives may represent an effective approach to this challenge, but knowledge gaps remain in a number of areas. This paper is focused on what is currently known about natural and developing synthetic gene drive systems in mice, some key areas where key knowledge gaps exist, findings in a variety of disciplines relevant to those gaps and a brief consideration of how engagement at the regulatory, stakeholder and community levels can accompany and contribute to this effort. Our primary species focus is the house mouse, Mus musculus , as a genetic model system that is also an important invasive pest. Our primary application focus is the development of gene drive systems intended to reduce reproduction and potentially eliminate invasive rodents from islands. Gene drive technologies in rodents have the potential to produce significant benefits for biodiversity conservation, human health and food security. A broad-based, multidisciplinary approach is necessary to assess this potential in a transparent, effective and responsible manner.


2019 ◽  
Author(s):  
Jaye Sudweeks ◽  
Brandon Hollingsworth ◽  
Dimitri V. Blondel ◽  
Karl J. Campbell ◽  
Sumit Dhole ◽  
...  

AbstractInvasive species pose a major threat to biodiversity on islands. While successes have been achieved using traditional removal methods, such as toxicants aimed at rodents, these approaches have limitations and various off-target effects on island ecosystems. Gene drive technologies designed to eliminate a population provide an alternative approach, but the potential for drive-bearing individuals to escape from the target release area and impact populations elsewhere is a major concern. Here we propose the “Locally Fixed Alleles” approach as a novel means for localizing elimination by a drive to an island population that exhibits significant genetic isolation from neighboring populations. Our approach is based on the assumption that in small island populations of rodents, genetic drift will lead to multiple genomic alleles becoming fixed. In contrast, multiple alleles are likely to be maintained in larger populations on mainlands. Utilizing the high degree of genetic specificity achievable using homing drives, for example based on the CRISPR/Cas9 system, our approach aims at employing one or more locally fixed alleles as the target for a gene drive on a particular island. Using mathematical modeling, we explore the feasibility of this approach and the degree of localization that can be achieved. We show that across a wide range of parameter values, escape of the drive to a neighboring population in which the target allele is not fixed will at most lead to modest transient suppression of the non-target population. While the main focus of this paper is on elimination of a rodent pest from an island, we also discuss the utility of the locally fixed allele approach for the goals of population suppression or population replacement. Our analysis also provides a threshold condition for the ability of a gene drive to invade a partially resistant population.


2018 ◽  
Author(s):  
Héctor M. Sánchez C. ◽  
Sean L. Wu ◽  
Jared B. Bennett ◽  
John M. Marshall

AbstractMalaria, dengue, Zika, and other mosquito-borne diseases continue to pose a major global health burden through much of the world, despite the widespread distribution of insecticide-based tools and antimalarial drugs. The advent of CRISPR/Cas9-based gene editing and its demonstrated ability to streamline the development of gene drive systems has reignited interest in the application of this technology to the control of mosquitoes and the diseases they transmit. The versatility of this technology has also enabled a wide range of gene drive architectures to be realized, creating a need for their population-level and spatial dynamics to be explored. To this end, we present MGDrivE (Mosquito Gene Drive Explorer): a simulation framework designed to investigate the population dynamics of a variety of gene drive architectures and their spread through spatially-explicit mosquito populations. A key strength of the MGDrivE framework is its modularity: a) a genetic inheritance module accommodates the dynamics of gene drive systems displaying user-defined inheritance patterns, b) a population dynamic module accommodates the life history of a variety of mosquito disease vectors and insect agricultural pest species, and c) a landscape module accommodates the distribution of insect metapopulations connected by migration in space. Example MGDrivE simulations are presented to demonstrate the application of the framework to CRISPR/Cas9-based homing gene drive for: a) driving a disease-refractory gene into a population (i.e. population replacement), and b) disrupting a gene required for female fertility (i.e. population suppression), incorporating homing-resistant alleles in both cases. We compare MGDrivE with other genetic simulation packages, and conclude with a discussion of future directions in gene drive modeling.


2019 ◽  
Vol 8 (2) ◽  
pp. 5662-5668

Agriculture is the most important sector in the Indian economy and contributes 18% of Gross Domestic Product(GDP). India is the second largest producer of sugarcane crop and produces about 20% of the world's sugarcane. Sugarcane is cultivated in tropics and subtropic regions, on a wide range of soils from fertile well-drained mollisols to through heavy cracking vertisols, infertile acid oxisols, peaty histosols, to rocky andisols. Minimum moisture of 60cms, rich water supply and plenty of sunshine. In this paper, a novel approach to sugarcane yield forecasting in Karnataka, India region using Long Term Time Series(LTTS), weather-and-soil attributes, Normalized Vegetation Index(NDVI) and Supervised Machine Learning(SML) algorithms have been proposed. Sugarcane cultivation life cycle(SCLC) in the Karnataka region is about 12 months, with plantation beginning at three different seasons in weather condition. Our approach has been verified using historical dataset and results have shown that our approach has successfully modeled crop prediction. The application of the Custom-Kernel gives us a considerable boost in accuracy with SVM-Kernel Multiple giving 86.31% of accuracy, SVM-RBF kernel in second with an accuracy of 83.40%, GPR producing an accuracy score of 81.75%, Lasso giving an accuracy score of 26.81% and Kernel Ridge-RBF with the least accuracy score of 21.46% for final yield prediction.


Sign in / Sign up

Export Citation Format

Share Document