scholarly journals Bioinformatics analysis of NSCLC multi-omics data

Author(s):  
Janne Lehtiö ◽  
Taner Arslan ◽  
Ioannis Siavelis ◽  
Yanbo Pan ◽  
Fabio Socciarelli ◽  
...  

Abstract The associated publication reports proteogenomic analysis of non-small cell lung cancer (NSCLC), where we identified molecular subtypes with distinct immune evasion mechanisms and therapeutic targets, and validated our classification method in separate clinical cohorts. This protocol describes sections of the bioinformatics analysis of the multi-omics data, namely, data analysis and processing for panel sequencing, identification of cancer- and driver-related proteins in proteomics data, proteogenomics search, and machine learning-based classifiers for NSCLC subtyping. Specifically, a cohort classifier was built using support-vector machine-recursive feature elimination (SVM-RFE) algorithm applied to in-depth proteomics data from a cohort of 141 samples. The classifier was then validated in three external datasets. Another classifier, suitable for single-sample subtyping, was built using k-top scoring pairs (k-TSP) algorithm applied to label-free data from a cohort of 136 samples. The k-TSP-based classifier was validated in two independent cohorts and an additional external dataset.

Molecules ◽  
2019 ◽  
Vol 24 (12) ◽  
pp. 2220 ◽  
Author(s):  
Csaba Váradi ◽  
Károly Nehéz ◽  
Olivér Hornyák ◽  
Béla Viskolcz ◽  
Jonathan Bones

In this study, we present the application of a novel capillary electrophoresis (CE) method in combination with label-free quantitation and support vector machine-based feature selection (support vector machine-estimated recursive feature elimination or SVM-RFE) to identify potential glycan alterations in Parkinson’s disease. Specific focus was placed on the use of neutral coated capillaries, by a dynamic capillary coating strategy, to ensure stable and repeatable separations without the need of non-mass spectrometry (MS) friendly additives within the separation electrolyte. The developed online dynamic coating strategy was applied to identify serum N-glycosylation by CE-MS/MS in combination with exoglycosidase sequencing. The annotated structures were quantified in 15 controls and 15 Parkinson’s disease patients by label-free quantitation. Lower sialylation and increased fucosylation were found in Parkinson’s disease patients on tri-antennary glycans with 2 and 3 terminal sialic acids. The set of potential glycan alterations was narrowed by a recursive feature elimination algorithm resulting in the efficient classification of male patients.


2021 ◽  
Author(s):  
Joseph Bloom ◽  
Aaron Triantafyllidis ◽  
Paula Burton (Ngov) ◽  
Giuseppe Infusini ◽  
Andrew Webb

AbstractLabel Free Quantification (LFQ) of shotgun proteomics data is a popular and robust method for the characterization of relative protein abundance between samples. Many analytical pipelines exist for the automation of this analysis and some tools exist for the subsequent representation and inspection of the results of these pipelines. Mass Dynamics 1.0 (MD 1.0) is a web based analysis environment that can analyze and visualize LFQ data produced by software such as Maxquant. Unlike other tools, MD 1.0 utilizes cloud-based architecture to enable researchers to store their data, enabling researchers to not only automatically process and visualize their LFQ data but annotate and share their findings with collaborators and, if chosen, to easily publish results to the community. With a view toward increased reproducibility and standardisation in proteomics data analysis and streamlining collaboration between researchers, MD 1.0 requires minimal parameter choices and automatically generates quality control reports to verify experiment integrity. Here, we demonstrate that MD 1.0 provides reliable results for protein expression quantification, emulating Perseus on benchmark datasets over a wide dynamic range.The MD 1.0 platform is available globally via: https://app.massdynamics.com/[email protected]


2017 ◽  
Vol 14 (1) ◽  
pp. 58-77
Author(s):  
Sevgi Gezici ◽  
Mehmet Ozaslan ◽  
Gurler Akpinar ◽  
Murat Kasap ◽  
Maruf Sanli ◽  
...  

Diagnostics ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 574
Author(s):  
Gennaro Tartarisco ◽  
Giovanni Cicceri ◽  
Davide Di Pietro ◽  
Elisa Leonardi ◽  
Stefania Aiello ◽  
...  

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 1994
Author(s):  
Qian Ma ◽  
Wenting Han ◽  
Shenjin Huang ◽  
Shide Dong ◽  
Guang Li ◽  
...  

This study explores the classification potential of a multispectral classification model for farmland with planting structures of different complexity. Unmanned aerial vehicle (UAV) remote sensing technology is used to obtain multispectral images of three study areas with low-, medium-, and high-complexity planting structures, containing three, five, and eight types of crops, respectively. The feature subsets of three study areas are selected by recursive feature elimination (RFE). Object-oriented random forest (OB-RF) and object-oriented support vector machine (OB-SVM) classification models are established for the three study areas. After training the models with the feature subsets, the classification results are evaluated using a confusion matrix. The OB-RF and OB-SVM models’ classification accuracies are 97.09% and 99.13%, respectively, for the low-complexity planting structure. The equivalent values are 92.61% and 99.08% for the medium-complexity planting structure and 88.99% and 97.21% for the high-complexity planting structure. For farmland with fragmentary plots and a high-complexity planting structure, as the planting structure complexity changed from low to high, both models’ overall accuracy levels decreased. The overall accuracy of the OB-RF model decreased by 8.1%, and that of the OB-SVM model only decreased by 1.92%. OB-SVM achieves an overall classification accuracy of 97.21%, and a single-crop extraction accuracy of at least 85.65%. Therefore, UAV multispectral remote sensing can be used for classification applications in highly complex planting structures.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
Abdullah Feroze ◽  
Eric C Holland ◽  
Linda Shapiro ◽  
...  

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.


2018 ◽  
Vol 2018 ◽  
pp. 1-13
Author(s):  
Xiaowei Cao ◽  
Zhenyu Wang ◽  
Liyan Bi ◽  
Jie Zheng

Surface-enhanced Raman spectroscopy (SERS) is a good candidate for the development of fast and easy-to-use diagnostic tools, possibly used on serum in screening tests. In this study, a potential label-free serum test based on SERS spectroscopy was developed to analyze human serum for the diagnosis of the non-small cell lung cancer (NSCLC). We firstly synthesized novel highly branched gold nanoparticles (HGNPs) at high yield through a one-step reduction of HAuCl4 with dopamine hydrochloride at 60°C. Then, HGNP substrates with good reproducibility, uniformity, and high SERS effect were fabricated by the electrostatically assisted (3-aminopropyl) triethoxysilane-(APTES-) functionalized silicon wafer surface-sedimentary self-assembly method. Using as-prepared HGNP substrates as a high-performance sensing platform, SERS spectral data of serum obtained from healthy subjects, lung adenocarcinoma patients, lung squamous carcinoma patients, and large cell lung cancer patients were collected. The difference spectra among different types of NSCLC were compared, and analysis result revealed their intrinsic difference in types and contents of nucleic acids, proteins, carbohydrates, amino acids, and lipids. SERS spectra were analyzed by principal component analysis (PCA), which was able to distinguish different types of NSCLC. Considering its time efficiency, being label-free, and sensitivity, SERS based on HGNP substrates is very promising for mass screening NSCLC and plays an important role in the detection and prevention of other diseases.


2006 ◽  
Vol 04 (06) ◽  
pp. 1159-1179 ◽  
Author(s):  
JUNG HUN OH ◽  
ANIMESH NANDI ◽  
PREM GURNANI ◽  
LYNNE KNOWLES ◽  
JOHN SCHORGE ◽  
...  

Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms.


RSC Advances ◽  
2016 ◽  
Vol 6 (55) ◽  
pp. 50027-50033 ◽  
Author(s):  
S. Bakhtiaridoost ◽  
H. Habibiyan ◽  
S. Muhammadnejad ◽  
M. Haddadi ◽  
H. Ghafoorifard ◽  
...  

Wavelet transform and SVM applied to Raman spectra makes a powerful and accurate tool for identification of rare cells such as CTCs.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi139-vi139
Author(s):  
Jan Lost ◽  
Tej Verma ◽  
Niklas Tillmanns ◽  
W R Brim ◽  
Harry Subramanian ◽  
...  

Abstract PURPOSE Identifying molecular subtypes in gliomas has prognostic and therapeutic value, traditionally after invasive neurosurgical tumor resection or biopsy. Recent advances using artificial intelligence (AI) show promise in using pre-therapy imaging for predicting molecular subtype. We performed a systematic review of recent literature on AI methods used to predict molecular subtypes of gliomas. METHODS Literature review conforming to PRSIMA guidelines was performed for publications prior to February 2021 using 4 databases: Ovid Embase, Ovid MEDLINE, Cochrane trials (CENTRAL), and Web of Science core-collection. Keywords included: artificial intelligence, machine learning, deep learning, radiomics, magnetic resonance imaging, glioma, and glioblastoma. Non-machine learning and non-human studies were excluded. Screening was performed using Covidence software. Bias analysis was done using TRIPOD guidelines. RESULTS 11,727 abstracts were retrieved. After applying initial screening exclusion criteria, 1,135 full text reviews were performed, with 82 papers remaining for data extraction. 57% used retrospective single center hospital data, 31.6% used TCIA and BRATS, and 11.4% analyzed multicenter hospital data. An average of 146 patients (range 34-462 patients) were included. Algorithms predicting IDH status comprised 51.8% of studies, MGMT 18.1%, and 1p19q 6.0%. Machine learning methods were used in 71.4%, deep learning in 27.4%, and 1.2% directly compared both methods. The most common algorithm for machine learning were support vector machine (43.3%), and for deep learning convolutional neural network (68.4%). Mean prediction accuracy was 76.6%. CONCLUSION Machine learning is the predominant method for image-based prediction of glioma molecular subtypes. Major limitations include limited datasets (60.2% with under 150 patients) and thus limited generalizability of findings. We recommend using larger annotated datasets for AI network training and testing in order to create more robust AI algorithms, which will provide better prediction accuracy to real world clinical datasets and provide tools that can be translated to clinical practice.


Sign in / Sign up

Export Citation Format

Share Document