scholarly journals Infrared molecular fingerprinting of blood-based liquid biopsies for the detection of cancer

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Marinus Huber ◽  
Kosmas V Kepesidis ◽  
Liudmila Voronina ◽  
Frank Fleischmann ◽  
Ernst Fill ◽  
...  

Recent omics analyses of human biofluids provide opportunities to probe selected species of biomolecules for disease diagnostics. Fourier-transform infrared (FTIR) spectroscopy investigates the full repertoire of molecular species within a sample at once. Here, we present a multi-institutional study in which we analysed infrared fingerprints of plasma and serum samples from 1639 individuals with different solid tumours and carefully matched symptomatic and non-symptomatic reference individuals. Focusing on breast, bladder, prostate, and lung cancer, we find that infrared molecular fingerprinting is capable of detecting cancer: training a support vector machine algorithm allowed us to obtain binary classification performance in the range of 0.78–0.89 (area under the receiver operating characteristic curve [AUC]), with a clear correlation between AUC and tumour load. Intriguingly, we find that the spectral signatures differ between different cancer types. This study lays the foundation for high-throughput onco-IR-phenotyping of four common cancers, providing a cost-effective, complementary analytical tool for disease recognition.

Author(s):  
Duan Mei ◽  
Qiang Liu

Based on MicroRNA (miRNA) expression profiles, this article proposes a new algorithm—SVM-RFE-FKNN, which combines the support vector machine-recursive feature elimination (SVM-RFE) algorithm and the fuzzy K -nearest neighbor (FKNN) algorithm, to realize binary classification of tumors. First, the SVM-RFE algorithm was used to select features from the miRNA expression profile dataset to constitute feature subsets and to determine the maximum number of support vectors. Next, this maximum number was regarded as the upper limit of the parameter K in the FKNN algorithm that was then used to classify the samples to be tested. Finally, the leave-one-out cross-validation method was adopted to assess the classification performance of the proposed algorithm. Through experiments, our proposed algorithm was compared with other twelve classification methods, and the result shows that our algorithm had better classification performance. Specifically, with only a few miRNA biomarkers, the proposed algorithm could reach an accuracy of 99.46% and an area under the receiver operating characteristic curve (AUC) of 0.9874.


Cancers ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1407
Author(s):  
Matyas Bukva ◽  
Gabriella Dobra ◽  
Juan Gomez-Perez ◽  
Krisztian Koos ◽  
Maria Harmati ◽  
...  

Investigating the molecular composition of small extracellular vesicles (sEVs) for tumor diagnostic purposes is becoming increasingly popular, especially for diseases for which diagnosis is challenging, such as central nervous system (CNS) malignancies. Thorough examination of the molecular content of sEVs by Raman spectroscopy is a promising but hitherto barely explored approach for these tumor types. We attempt to reveal the potential role of serum-derived sEVs in diagnosing CNS tumors through Raman spectroscopic analyses using a relevant number of clinical samples. A total of 138 serum samples were obtained from four patient groups (glioblastoma multiforme, non-small-cell lung cancer brain metastasis, meningioma and lumbar disc herniation as control). After isolation, characterization and Raman spectroscopic assessment of sEVs, the Principal Component Analysis–Support Vector Machine (PCA–SVM) algorithm was performed on the Raman spectra for pairwise classifications. Classification accuracy (CA), sensitivity, specificity and the Area Under the Curve (AUC) value derived from Receiver Operating Characteristic (ROC) analyses were used to evaluate the performance of classification. The groups compared were distinguishable with 82.9–92.5% CA, 80–95% sensitivity and 80–90% specificity. AUC scores in the range of 0.82–0.9 suggest excellent and outstanding classification performance. Our results support that Raman spectroscopic analysis of sEV-enriched isolates from serum is a promising method that could be further developed in order to be applicable in the diagnosis of CNS tumors.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7417
Author(s):  
Alex J. Hope ◽  
Utkarsh Vashisth ◽  
Matthew J. Parker ◽  
Andreas B. Ralston ◽  
Joshua M. Roper ◽  
...  

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.


2012 ◽  
Vol 9 (3) ◽  
pp. 33-43 ◽  
Author(s):  
Paulo Gaspar ◽  
Jaime Carbonell ◽  
José Luís Oliveira

Summary Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets.Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive.In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.


2018 ◽  
Vol 25 (7) ◽  
pp. 855-861 ◽  
Author(s):  
Halil Kilicoglu ◽  
Graciela Rosemblat ◽  
Mario Malički ◽  
Gerben ter Riet

Abstract Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff’s α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Bo Zhang ◽  
Zhenmei Chen ◽  
Baorui Tao ◽  
Chenhe Yi ◽  
Zhifei Lin ◽  
...  

AbstractRecent studies have revealed the significant dysregulation of m6A level in peripheral blood in several cancer types and its value in diagnosis. Nonetheless, a biomarker for accurate screening of multiple cancer types has not been established based on the perspective of m6A modification. In this study, we aimed to develop a serum diagnostic signature based on the m6A target miRNAs for the mass detection of cancer. A total of 14965 serum samples with 12 cancer types were included. Based on training cohort (n=7299), we developed the m6A-miRNAs signature using a support vector machine algorithm for cancer detection. The m6A-miRNAs signature showed high accuracy, and its area under the curve (AUC) in the training, internal validation and external validation cohort reached 0.979 (95%CI 0.976 - 0.982), 0.976 (95%CI 0.973 - 0.979) and 0.936 (95%CI 0.922 - 0.951), respectively. In the performance of distinguishing cancer types, the m6A-miRNAs signature showed superior sensitivity in each cancer type and presented a satisfactory AUC in identifying lung cancer, gastric cancer and hepatocellular carcinoma. Additionally, the diagnostic performance of m6A-miRNAs was not interfered by the gender, age and benign disease. In short, this study revealed the value of serum circulating m6A miRNAs in cancer detection and provided a new direction and strategy for the development of novel biomarkers with high accuracy, low cost and less invasiveness for mass cancer screening, such as RNA modification.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Bo Ren ◽  
Jian Wang ◽  
Zhoulin Miao ◽  
Yuwei Xia ◽  
Wenya Liu ◽  
...  

Background. To evaluate the role of radiomics based on magnetic resonance imaging (MRI) in the biological activity of hepatic alveolar echinococcosis (HAE). Methods. In this study, 90 active and 46 inactive cases of HAE patients were analyzed retrospectively. All the subjects underwent MRI and positron emission tomography computed tomography (PET-CT) before surgery. A total of 1409 three-dimensional radiomics features were extracted from the T2-weighted MR images (T2WI). The inactive group in the training cohort was balanced via the synthetic minority oversampling technique (SMOTE) method. The least absolute shrinkage and selection operator (LASSO) regression method was used for feature selection. The machine learning (ML) classifiers were logistic regression (LR), multilayer perceptron (MLP), and support vector machine (SVM). We used a fivefold cross-validation strategy in the training cohorts. The classification performance of the radiomics signature was evaluated using receiver operating characteristic curve (ROC) analysis in the training and test cohorts. Results. The radiomics features were significantly associated with the biological activity, and 10 features were selected to construct the radiomics model. The best performance of the radiomics model for the biological activity prediction was obtained by MLP ( AUC = 0.830 ± 0.053 ; accuracy = 0.817 ; sensitivity = 0.822 ; specificity = 0.811 ). Conclusions. We developed and validated a radiomics model as an adjunct tool to predict the HAE biological activity by combining T2WI images, which achieved results nearly equal to the PET-CT findings.


Author(s):  
Patrick C. M. Wong ◽  
Ching Man Lai ◽  
Peggy H. Y. Chan ◽  
Ting Fan Leung ◽  
Hugh Simon Lam ◽  
...  

Purpose This study aimed to construct an objective and cost-effective prognostic tool to forecast the future language and communication abilities of individual infants. Method Speech-evoked electroencephalography (EEG) data were collected from 118 infants during the first year of life during the exposure to speech stimuli that differed principally in fundamental frequency. Language and communication outcomes, namely four subtests of the MacArthur–Bates Communicative Development Inventories (MCDI)–Chinese version, were collected between 3 and 16 months after initial EEG testing. In the two-way classification, children were classified into those with future MCDI scores below the 25th percentile for their age group and those above the same percentile, while the three-way classification classified them into < 25th, 25th–75th, and > 75th percentile groups. Machine learning (support vector machine classification) with cross validation was used for model construction. Statistical significance was assessed. Results Across the four MCDI measures of early gestures, later gestures, vocabulary comprehension, and vocabulary production, the areas under the receiver-operating characteristic curve of the predictive models were respectively .92 ± .031, .91 ± .028, .90 ± .035, and .89 ± .039 for the two-way classification, and .88 ± .041, .89 ± .033, .85 ± .047, and .85 ± .050 for the three-way classification ( p < .01 for all models). Conclusions Future language and communication variability can be predicted by an objective EEG method that indicates the function of the auditory neural pathway foundational to spoken language development, with precision sufficient for individual predictions. Longer-term research is needed to assess predictability of categorical diagnostic status. Supplemental Material https://doi.org/10.23641/asha.15138546


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Fang Yang ◽  
Murat Hamit ◽  
Chuan B. Yan ◽  
Juan Yao ◽  
Abdugheni Kutluk ◽  
...  

Esophageal cancer is one of the fastest rising types of cancers in China. The Kazak nationality is the highest-risk group in Xinjiang. In this work, an effective computer-aided diagnostic system is developed to assist physicians in interpreting digital X-ray image features and improving the quality of diagnosis. The modules of the proposed system include image preprocessing, feature extraction, feature selection, image classification, and performance evaluation. 300 original esophageal X-ray images were resized to a region of interest and then enhanced by the median filter and histogram equalization method. 37 features from textural, frequency, and complexity domains were extracted. Both sequential forward selection and principal component analysis methods were employed to select the discriminative features for classification. Then, support vector machine andK-nearest neighbors were applied to classify the esophageal cancer images with respect to their specific types. The classification performance was evaluated in terms of the area under the receiver operating characteristic curve, accuracy, precision, and recall, respectively. Experimental results show that the classification performance of the proposed system outperforms the conventional visual inspection approaches in terms of diagnostic quality and processing time. Therefore, the proposed computer-aided diagnostic system is promising for the diagnostics of esophageal cancer.


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4479 ◽  
Author(s):  
Abu Zar Shafiullah ◽  
Jessica Werner ◽  
Emer Kennedy ◽  
Lorenzo Leso ◽  
Bernadette O’Brien ◽  
...  

Sensor technologies that measure grazing and ruminating behaviour as well as physical activities of individual cows are intended to be included in precision pasture management. One of the advantages of sensor data is they can be analysed to support farmers in many decision-making processes. This article thus considers the performance of a set of RumiWatchSystem recorded variables in the prediction of insufficient herbage allowance for spring calving dairy cows. Several commonly used models in machine learning (ML) were applied to the binary classification problem, i.e., sufficient or insufficient herbage allowance, and the predictive performance was compared based on the classification evaluation metrics. Most of the ML models and generalised linear model (GLM) performed similarly in leave-out-one-animal (LOOA) approach to validation studies. However, cross validation (CV) studies, where a portion of features in the test and training data resulted from the same cows, revealed that support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGBoost) performed relatively better than other candidate models. In general, these ML models attained 88% AUC (area under receiver operating characteristic curve) and around 80% sensitivity, specificity, accuracy, precision and F-score. This study further identified that number of rumination chews per day and grazing bites per minute were the most important predictors and examined the marginal effects of the variables on model prediction towards a decision support system.


Sign in / Sign up

Export Citation Format

Share Document