57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

2020 ◽  
Author(s):  
Hanna Meyer ◽  
Edzer Pebesma

<p>Spatial mapping is an important task in environmental science to reveal spatial patterns and changes of the environment. In this context predictive modelling using flexible machine learning algorithms has become very popular. However, looking at the diversity of modelled (global) maps of environmental variables, there might be increasingly the impression that machine learning is a magic tool to map everything. Recently, the reliability of such maps have been increasingly questioned, calling for a reliable quantification of uncertainties.</p><p>Though spatial (cross-)validation allows giving a general error estimate for the predictions, models are usually applied to make predictions for a much larger area or might even be transferred to make predictions for an area where they were not trained on. But by making predictions on heterogeneous landscapes, there will be areas that feature environmental properties that have not been observed in the training data and hence not learned by the algorithm. This is problematic as most machine learning algorithms are weak in extrapolations and can only make reliable predictions for environments with conditions the model has knowledge about. Hence predictions for environmental conditions that differ significantly from the training data have to be considered as uncertain.</p><p>To approach this problem, we suggest a measure of uncertainty that allows identifying locations where predictions should be regarded with care. The proposed uncertainty measure is based on distances to the training data in the multidimensional predictor variable space. However, distances are not equally relevant within the feature space but some variables are more important than others in the machine learning model and hence are mainly responsible for prediction patterns. Therefore, we weight the distances by the model-derived importance of the predictors. </p><p>As a case study we use a simulated area-wide response variable for Europe, bio-climatic variables as predictors, as well as simulated field samples. Random Forest is applied as algorithm to predict the simulated response. The model is then used to make predictions for entire Europe. We then calculate the corresponding uncertainty and compare it to the area-wide true prediction error. The results show that the uncertainty map reflects the patterns in the true error very well and considerably outperforms ensemble-based standard deviations of predictions as indicator for uncertainty.</p><p>The resulting map of uncertainty gives valuable insights into spatial patterns of prediction uncertainty which is important when the predictions are used as a baseline for decision making or subsequent environmental modelling. Hence, we suggest that a map of distance-based uncertainty should be given in addition to prediction maps.</p>


2020 ◽  
Vol 7 ◽  
pp. 1-26 ◽  
Author(s):  
Silas Nyboe Ørting ◽  
Andrew Doyle ◽  
Arno Van Hilten ◽  
Matthias Hirth ◽  
Oana Inel ◽  
...  

Rapid advances in image processing capabilities have been seen across many domains, fostered by the  application of machine learning algorithms to "big-data". However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that has proven effective for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach. Finally, we discuss future opportunities for development within this emerging domain.


2021 ◽  
Author(s):  
Bruno Barbosa Miranda de Paiva ◽  
Polianna Delfino Pereira ◽  
Claudio Moises Valiense de Andrade ◽  
Virginia Mara Reis Gomes ◽  
Maria Clara Pontello Barbosa Lima ◽  
...  

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning


2020 ◽  
Author(s):  
Emad Kasaeyan Naeini ◽  
Ajan Subramanian ◽  
Michael-David Calderon ◽  
Kai Zheng ◽  
Nikil Dutt ◽  
...  

BACKGROUND There is a strong demand for an accurate and objective means for assessing acute pain among hospitalized patients to help clinicians provide a proper dosage of pain medications and in a timely manner. Heart rate variability (HRV) comprises changes in the time intervals between consecutive heartbeats, which can be measured through acquisition and interpretation of electrocardiogram (ECG) captured from bedside monitors or wearable devices. As increased sympathetic activity affects the HRV, an index of autonomic regulation of heart rate, ultra-short-term HRV analysis can provide a reliable source of information for acute pain monitoring. In this study, widely used HRV time- and frequency-domain measurements are used in acute pain assessments among postoperative patients. The existing approaches have only focused on stimulated pain on healthy subjects, whereas, to the best of our knowledge, there is no work in the literature building models using real pain data and on postoperative patients. OBJECTIVE To develop and evaluate an automatic and adaptable pain assessment algorithm based on ECG features for assessing acute pain in postoperative patients likely experiencing mild to moderate pain. METHODS The study used a prospective observational design. The sample consisted of 25 patient participants aged 18 to 65 years. In part 1 of the study, a Transcutaneous Electrical Nerve Stimulation unit was employed to obtain baseline discomfort threshold for the patients. In part 2, a multichannel biosignal acquisition device was used as patients were engaging in non-noxious activities. At all times, pain intensity was measured using patient self-reports based on the Numerical Rating Scale (NRS). A weak supervision framework was inherited for rapid training data creation. The collected labels were then transformed from 11 intensity levels to 5 intensity levels. Prediction models were developed using 5 different machine-learning methods. Mean prediction accuracy was calculated using Leave-One-Subject-Out cross-validation. We compared the performance of these models with the results from a previously published research study. RESULTS Five different machine-learning algorithms were applied to perform binary classification of no pain (NP) vs. 4 distinct pain levels (PL1 through PL4). Highest validation accuracy using 3 time-domain HRV features of BioVid research paper for no pain vs. any other pain level was achieved by SVM 62.72% (NP vs. PL4) to 84.14% (NP vs. PL2). Similar results were achieved for the top 8 features based on the Gini Index using the SVM method; with an accuracy ranging from 63.86% (NP vs. PL4) to 84.79% (NP vs. PL2). CONCLUSIONS We propose a novel pain assessment method for postoperative patients using the ECG signal. Weak supervision applied for labeling and feature extraction improves the robustness of the approach. Our results show the viability of using a machine-learning algorithm to accurately and objectively assess acute pain among hospitalized patients. INTERNATIONAL REGISTERED REPORT RR2-10.2196/17783


2020 ◽  
Vol 7 ◽  
pp. 1-26
Author(s):  
Silas Nyboe Ørting ◽  
Andrew Doyle ◽  
Arno Van Hilten ◽  
Matthias Hirth ◽  
Oana Inel ◽  
...  

Rapid advances in image processing capabilities have been seen across many domains, fostered by the  application of machine learning algorithms to "big-data". However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that has proven effective for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach. Finally, we discuss future opportunities for development within this emerging domain.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 420
Author(s):  
Dongho Choi ◽  
Janghyuk Yim ◽  
Minjin Baek ◽  
Sangsun Lee

Predicting the trajectories of surrounding vehicles is important to avoid or mitigate collision with traffic participants. However, due to limited past information and the uncertainty in future driving maneuvers, trajectory prediction is a challenging task. Recently, trajectory prediction models using machine learning algorithms have been addressed solve to this problem. In this paper, we present a trajectory prediction method based on the random forest (RF) algorithm and the long short term memory (LSTM) encoder-decoder architecture. An occupancy grid map is first defined for the region surrounding the target vehicle, and then the row and the column that will be occupied by the target vehicle at future time steps are determined using the RF algorithm and the LSTM encoder-decoder architecture, respectively. For the collection of training data, the test vehicle was equipped with a camera and LIDAR sensors along with vehicular wireless communication devices, and the experiments were conducted under various driving scenarios. The vehicle test results demonstrate that the proposed method provides more robust trajectory prediction compared with existing trajectory prediction methods.


Author(s):  
Diwakar Naidu ◽  
Babita Majhi ◽  
Surendra Kumar Chandniha

This study focuses on modelling the changes in rainfall patterns in different agro-climatic zones due to climate change through statistical downscaling of large-scale climate variables using machine learning approaches. Potential of three machine learning algorithms, multilayer artificial neural network (MLANN), radial basis function neural network (RBFNN), and least square support vector machine (LS-SVM) have been investigated. The large-scale climate variable are obtained from National Centre for Environmental Prediction (NCEP) reanalysis product and used as predictors for model development. Proposed machine learning models are applied to generate projected time series of rainfall for the period 2021-2050 using the Hadley Centre coupled model (HadCM3) B2 emission scenario data as predictors. An increasing trend in anticipated rainfall is observed during 2021-2050 in all the ACZs of Chhattisgarh State. Among the machine learning models, RBFNN found as more feasible technique for modeling of monthly rainfall in this region.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Yan Zhang ◽  
Jinxiao Wen ◽  
Guanshu Yang ◽  
Zunwen He ◽  
Xinran Luo

Recently, unmanned aerial vehicle (UAV) plays an important role in many applications because of its high flexibility and low cost. To realize reliable UAV communications, a fundamental work is to investigate the propagation characteristics of the channels. In this paper, we propose path loss models for the UAV air-to-air (AA) scenario based on machine learning. A ray-tracing software is employed to generate samples for multiple routes in a typical urban environment, and different altitudes of Tx and Rx UAVs are taken into consideration. Two machine-learning algorithms, Random Forest and KNN, are exploited to build prediction models on the basis of the training data. The prediction performance of trained models is assessed on the test set according to the metrics including the mean absolute error (MAE) and root mean square error (RMSE). Meanwhile, two empirical models are presented for comparison. It is shown that the machine-learning-based models are able to provide high prediction accuracy and acceptable computational efficiency in the AA scenario. Moreover, Random Forest outperforms other models and has the smallest prediction errors. Further investigation is made to evaluate the impacts of five different parameters on the path loss. It is demonstrated that the path visibility is crucial for the path loss.


Author(s):  
Erkut Yigit ◽  
Mehmet Zeki Bilgin ◽  
Ahmet Erdem Oner

The main purpose of Industry 4.0 applications is to provide maximum uptime throughout the production chain, to reduce production costs and to increase productivity. Thanks to Big Data, Internet of Things (IoT) and Machine Learning (ML), which are among the Industry 4.0 technologies, Predictive Maintenance (PdM) studies have gained speed. Implementing Predictive Maintenance in the industry reduces the number of breakdowns with long maintenance and repair times, and minimizes production losses and costs. With the use of machine learning, equipment malfunctions and equipment maintenance needs can be predicted for unknown reasons. A large amount of data is needed to train the machine learning algorithm, as well as adequate analytical method selection suitable for the problem. The important thing is to get the valuable signal by cleaning the data from noise with data processing. In order to create prediction models with machine learning, it is necessary to collect accurate information and to use many data from different systems. The existence of large amounts of data related to predictive maintenance and the need to monitor this data in real time, delays in data collection, network and server problems are major difficulties in this process. Another important issue concerns the use of artificial intelligence. For example, obtaining training data, dealing with variable environmental conditions, choosing the ML algorithm better suited to a specific scenario, necessity of information sensitive to operational conditions and production environment are of great importance for analysis. In this study, predictive maintenance studies for the transfer press machine used in the automotive industry, which can predict the maintenance need time and give warning messages to the relevant people when abnormal situations approach, are examined. First of all, various sensors have been placed in the machine for the detection of past malfunctions and it has been determined which data will be collected from these sensors. Then, machine learning algorithms used to detect anomalies with the collected data and model past failures were created and an application was made in a factory that produces automotive parts.


Sign in / Sign up

Export Citation Format

Share Document