scholarly journals Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration

Life ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 1
Author(s):  
Alexandra Danyi ◽  
Myrthe Jager ◽  
Jeroen de Ridder

Identifying the cell of origin of cancer is important to guide treatment decisions. Machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles from solid biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. Liquid biopsies are promising alternatives but their somatic mutation profile is sparse and current machine learning models fail to perform in this setting. We propose an improved method to deal with sparsity in liquid biopsy data. Firstly, data augmentation is performed on sparse data to enhance model robustness. Secondly, we employ data integration to merge information from: (i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 and 0.65 on data where only 70% and 2% of SNVs are retained, compared to 0.83 and 0.41 with the original model, respectively. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from liquid biopsy data.

2021 ◽  
Author(s):  
Alexandra Danyi ◽  
Myrthe Jager ◽  
Jeroen de Ridder

AbstractIdentifying the cell of origin of cancer is important to guide treatment decisions. However, in patients with ‘cancer of unknown primary’ (CUP), standard diagnostic tools often fail to identify the primary tumor. As an alternative, machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles in the genome of solid tissue biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. A promising alternative would be liquid biopsies, which contain ctDNA originating from the tumor. Problematically, somatic mutation profiles of tumors obtained from liquid biopsies are inherently extremely sparse and current machine learning models fail to perform in this setting.Here we propose an improved machine learning method to deal with the sparse nature of liquid biopsy data. Firstly, we downsample the SNVs in the samples in order to mimic sparse data conditions. Then extensive data augmentation is performed to artificially increase the number of training samples in order to enhance model robustness under sparse data conditions. Finally, we employ data integration to merge information from i) somatic single nucleotide variant (SNV) density across the genome, ii) somatic SNVs in driver genes and iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 on the data where only 70% of SNVs are retained, which is comparable to an average accuracy of 0.87 with the original model on the full SNV data. Even when only 2% of the data is retained, the average accuracy is 0.65 compared to 0.41 with the original model. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from sparse liquid biopsy data.Author SummaryThe identification of the ‘cell of origin’ of cancer is an important step towards more personalized cancer care, but this remains a challenge for patients with ‘cancer of unknown primary’ (CUP) where the source of the malignancy cannot be identified even after extensive clinical assessment with standard diagnostic methods. Somatic mutation profile-based ‘cell of origin’ classification has emerged in recent years as a promising alternative diagnostic tool that could circumvent the issues of standard CUP diagnostic. In this approach the somatic mutations are obtained from whole genome sequencing (WGS) of solid tissue biopsies from the tumor. However, needle biopsies from tumor tissue can be challenging, as accessibility to the tumor can be limited and taking a biopsy can cause further complications. For these reasons, liquid biopsies have been proposed as a safer alternative to solid tissue biopsies. Problematically, the circulating tumor DNA fragments available in e.g. blood typically represent a much scarcer tumor source than conventional solid tissue biopsies and therefore liquid biopsies give rise to sparse somatic mutation profiles. Therefore it is crucial to investigate the applicability of sparse somatic mutation profiles in the identification of ‘cell of origin’ and explore potential improvements of the data analysis and prediction models to overcome sparsity.


An electrocardiogram (ECG) can be dependablyused as a measuring device to monitor cardiovascular function. The abnormal heartbeat appears in the ECG pattern and these abnormal signals are called arrhythmias. Classification and automatic arrhythmia signals can provide a faster and more accurate result. Several machine learning approaches have been applied to enhance the accuracy of results and increase the speed and robustness of models. This paper proposes a method based on Time-series Classification using deep Convolutional -LSTM neural networks and Discrete Wavelet Transform to classify 4 different types of Arrhythmia in the MIT-BIH Database. According to the results, the suggested method gives predictions with an average accuracy of 97% without needing to do feature extraction or data augmentation.


Microscopy ◽  
2020 ◽  
Vol 69 (2) ◽  
pp. 92-109 ◽  
Author(s):  
Teruyasu Mizoguchi ◽  
Shin Kiyohara

Abstract Materials characterization is indispensable for materials development. In particular, spectroscopy provides atomic configuration, chemical bonding and vibrational information, which are crucial for understanding the mechanism underlying the functions of a material. Despite its importance, the interpretation of spectra using human-driven methods, such as manual comparison of experimental spectra with reference/simulated spectra, is becoming difficult owing to the rapid increase in experimental spectral data. To overcome the limitations of such methods, we develop new data-driven approaches based on machine learning. Specifically, we use hierarchical clustering, a decision tree and a feedforward neural network to investigate the electron energy loss near edge structures (ELNES) spectrum, which is identical to the X-ray absorption near edge structure (XANES) spectrum. Hierarchical clustering and the decision tree are used to interpret and predict ELNES/XANES, while the feedforward neural network is used to obtain hidden information about the material structure and properties from the spectra. Further, we construct a prediction model that is robust against noise by data augmentation. Finally, we apply our method to noisy spectra and predict six properties accurately. In summary, the proposed approaches can pave the way for fast and accurate spectrum interpretation/prediction as well as local measurement of material functions.


2021 ◽  
Author(s):  
Enrico Randellini ◽  
Leonardo Rigutini ◽  
Claudio Saccà

The face expression is the first thing we pay attention to when we want to understand a person’s state of mind. Thus, the ability to recognize facial expressions in an automatic way is a very interesting research field. In this paper, because the small size of available training datasets, we propose a novel data augmentation technique that improves the performances in the recognition task. We apply geometrical transformations and build from scratch GAN models able to generate new synthetic images for each emotion type. Thus, on the augmented datasets we fine tune pretrained convolutional neural networks with different architectures. To measure the generalization ability of the models, we apply extra-database protocol approach, namely we train models on the augmented versions of training dataset and test them on two different databases. The combination of these techniques allows to reach average accuracy values of the order of 85% for the InceptionResNetV2 model.


Author(s):  
Ahmad Fathan Hidayatullah ◽  
Siwi Cahyaningtyas ◽  
Rheza Daffa Pamungkas

This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM (ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input layer, convolution layer, max pooling layer, batch normalization layer, bidirectional LSTM layer, attention layer, fully connected layer and softmax layer. In the attention layer, we applied a hierarchical attention networks using word and sentence level attention to observe the level of importance from the content. As comparison, we also experimented with other several classical machine learning and deep learning approaches. Among the classical machine learning, the Linear Regression with unigram achieved the best performance with average accuracy of 0.9647. In addition, our observation with the deep learning models outperformed the traditional machine learning models significantly. Our experiments showed that the ACBiL architecture achieved the best performance among the other deep learning methods with the accuracy of 0.9944.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Madeleine Darbyshire ◽  
Zachary du Toit ◽  
Mark F. Rogers ◽  
Tom R. Gaunt ◽  
Colin Campbell

Abstract For cancers, such as common solid tumours, variants in the genome give a selective growth advantage to certain cells. It has recently been argued that the mean count of coding single nucleotide variants acting as disease-drivers in common solid tumours is frequently small in size, but significantly variable by cancer type (hypermutation is excluded from this study). In this paper we investigate this proposal through the use of integrative machine-learning-based classifiers we have proposed recently for predicting the disease-driver status of single nucleotide variants (SNVs) in the human cancer genome. We find that predicted driver counts are compatible with this proposal, have similar variabilities by cancer type and, to a certain extent, the drivers are identifiable by these machine learning methods. We further discuss predicted driver counts stratified by stage of disease and driver counts in non-coding regions of the cancer genome, in addition to driver-genes.


2021 ◽  
Vol 11 (9) ◽  
pp. 3776
Author(s):  
Luis Enciso-Salas ◽  
Gustavo Pérez-Zuñiga ◽  
Javier Sotomayor-Moriano

Implementation of model-based fault diagnosis systems can be a difficult task due to the complex dynamics of most systems, an appealing alternative to avoiding modeling is to use machine learning-based techniques for which the implementation is more affordable nowadays. However, the latter approach often requires extensive data processing. In this paper, a hybrid approach using recent developments in neural ordinary differential equations is proposed. This approach enables us to combine a natural deep learning technique with an estimated model of the system, making the training simpler and more efficient. For evaluation of this methodology, a nonlinear benchmark system is used by simulation of faults in actuators, sensors, and process. Simulation results show that the proposed methodology requires less processing for the training in comparison with conventional machine learning approaches since the data-set is directly taken from the measurements and inputs. Furthermore, since the model used in the essay is only a structural approximation of the plant; no advanced modeling is required. This approach can also alleviate some pitfalls of training data-series, such as complicated data augmentation methodologies and the necessity for big amounts of data.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi116-vi117
Author(s):  
Cheng Zhou ◽  
zhaoming Zhou ◽  
Lei Wen ◽  
Mingyao Lai ◽  
Linbo Cai

Abstract Accurate molecular stratification of glioma patients is key to an optimal design of therapeutic strategy to maximize patient survival. Here we leveraged multi-omics analysis of glioma and detailed clinical follow-up to build a refined classification system for glioma patients using support vector machines. The model input included the number of non-synonymous mutations in cancer driver genes, the number of non-synonymous mutations in cancer related genes, the transcriptomic grouping information, the immune infiltrations predicted by RNA-seq dataset, the site of tumor occurrence, as well as other well-known markers including IDH mutation status and 1p19q co-deletion status. We validated key model predictions using TCGA and CGGA datasets. Our refined classification system outperforms current state-of-the-art framework used in clinic. Taken together, we propose a refined molecular classification for glioma combining multi-omics profiling and machine learning approaches.


2021 ◽  
Vol 23 (07) ◽  
pp. 977-994
Author(s):  
Josmy Mathew ◽  
◽  
Dr. N. Srinivasan ◽  

Deep Learning is an area of machine learning which, because of its capability to handle a large quantity of data, has demonstrated amazing achievements in each field, notably in biomedicine. Its potential and abilities were evaluated and utilised with an effective prognosis in the identification of brain tumours with MRI pictures. The diagnosis of MRI images by computer-aided brain tumours includes tumour identification, segmentation and classification. Many types of research have concentrated in recent years on conventional or basic machine learning approaches in the detection of brain tumours. Throughout this overview, we offer a comprehensive assessment of the surveys that have been reported so far and the current approaches for detecting tumours. Our review examines the major processes in deep learning approaches for detecting brain tumours including preprocessing, extraction of features and classification and their performance and limitations. We also explore state-of-the-art neural network models to identify brain tumours through extensive trials with and without data augmentation. This review also discusses existing data sets for brain tumour detection assessments.


2021 ◽  
Author(s):  
Aayushi Vishnoi ◽  
Rati Sharma

The chemical basis of smell remains an unsolved problem, with ongoing studies mapping perceptual descriptor data from human participants to the chemical structures using computational methods. These approaches are, however, limited by linguistic capabilities and inter-individual differences in participants. We use olfactory behaviour data from the nematode C. elegans, which has isogenic populations in a laboratory setting, and employ machine learning approaches for a binary classification task predicting whether or not the worm will be attracted to a given monomolecular odorant. Among others, we use architectures based on Natural Language Processing methods on the SMILES representation of chemicals for molecular descriptor generation and show that machine learning algorithms trained on the descriptors give robust prediction results. We further show, by data augmentation, that increasing the number of samples increases the accuracy of the models. From this detailed analysis, we are able to achieve accuracies comparable to that in human studies and infer that there exists a non trivial relationship between the features of chemical structures and the nematode's behaviour.


Sign in / Sign up

Export Citation Format

Share Document