scholarly journals Using Machine Learning to design AAV2 with high likelihood of viral assembly

2021 ◽  
Author(s):  
Cuong T. To ◽  
Christian Wirsching

We study the application of Machine Learning in designing AAV2 capsid sequences with high likelihood of viral assembly, i.e. capsid viability. Specifically, we design and implement Origami, a model-based optimization algorithm, to identify highly viable capsid sequences within the vast space of 2033 possibilities. Our evaluation shows that Origami performs well in terms of optimality and diversity of model-designed sequences. Moreover, these sequences are ranked according to their viability score. This helps designing experiments given budget constraint.

Agronomy ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 35
Author(s):  
Xiaodong Huang ◽  
Beth Ziniti ◽  
Michael H. Cosh ◽  
Michele Reba ◽  
Jinfei Wang ◽  
...  

Soil moisture is a key indicator to assess cropland drought and irrigation status as well as forecast production. Compared with the optical data which are obscured by the crop canopy cover, the Synthetic Aperture Radar (SAR) is an efficient tool to detect the surface soil moisture under the vegetation cover due to its strong penetration capability. This paper studies the soil moisture retrieval using the L-band polarimetric Phased Array-type L-band SAR 2 (PALSAR-2) data acquired over the study region in Arkansas in the United States. Both two-component model-based decomposition (SAR data alone) and machine learning (SAR + optical indices) methods are tested and compared in this paper. Validation using independent ground measurement shows that the both methods achieved a Root Mean Square Error (RMSE) of less than 10 (vol.%), while the machine learning methods outperform the model-based decomposition, achieving an RMSE of 7.70 (vol.%) and R2 of 0.60.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


Minerals ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 159
Author(s):  
Nan Lin ◽  
Yongliang Chen ◽  
Haiqi Liu ◽  
Hanlin Liu

Selecting internal hyperparameters, which can be set by the automatic search algorithm, is important to improve the generalization performance of machine learning models. In this study, the geological, remote sensing and geochemical data of the Lalingzaohuo area in Qinghai province were researched. A multi-source metallogenic information spatial data set was constructed by calculating the Youden index for selecting potential evidence layers. The model for mapping mineral prospectivity of the study area was established by combining two swarm intelligence optimization algorithms, namely the bat algorithm (BA) and the firefly algorithm (FA), with different machine learning models. The receiver operating characteristic (ROC) and prediction-area (P-A) curves were used for performance evaluation and showed that the two algorithms had an obvious optimization effect. The BA and FA differentiated in improving multilayer perceptron (MLP), AdaBoost and one-class support vector machine (OCSVM) models; thus, there was no optimization algorithm that was consistently superior to the other. However, the accuracy of the machine learning models was significantly enhanced after optimizing the hyperparameters. The area under curve (AUC) values of the ROC curve of the optimized machine learning models were all higher than 0.8, indicating that the hyperparameter optimization calculation was effective. In terms of individual model improvement, the accuracy of the FA-AdaBoost model was improved the most significantly, with the AUC value increasing from 0.8173 to 0.9597 and the prediction/area (P/A) value increasing from 3.156 to 10.765, where the mineral targets predicted by the model occupied 8.63% of the study area and contained 92.86% of the known mineral deposits. The targets predicted by the improved machine learning models are consistent with the metallogenic geological characteristics, indicating that the swarm intelligence optimization algorithm combined with the machine learning model is an efficient method for mineral prospectivity mapping.


2020 ◽  
pp. 1-11
Author(s):  
Tang Yan ◽  
Li Pengfei

In marketing, problems such as the increase in customer data, the increase in the difficulty of data extraction and access, the lack of reliability and accuracy of data analysis, the slow efficiency of data processing, and the inability to effectively transform massive amounts of data into valuable information have become increasingly prominent. In order to study the effect of customer response, based on machine learning algorithms, this paper constructs a marketing customer response scoring model based on machine learning data analysis. In the context of supplier customer relationship management, this article analyzes the supplier’s precision marketing status and existing problems and uses its own development and management characteristics to improve marketing strategies. Moreover, this article uses a combination of database and statistical modeling and analysis to try to establish a customer response scoring model suitable for supplier precision marketing. In addition, this article conducts research and analysis with examples. From the research results, it can be seen that the performance of the model constructed in this article is good.


Sign in / Sign up

Export Citation Format

Share Document