Machine learning methods for quantitative analysis of Raman spectroscopy data

Author(s):  
Michael G. Madden ◽  
Alan G. Ryder
Geofluids ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Ruijie Huang ◽  
Chenji Wei ◽  
Jian Yang ◽  
Xin Xu ◽  
Baozhu Li ◽  
...  

With the high-speed development of artificial intelligence, machine learning methods have become key technologies for intelligent exploration, development, and production in oil and gas fields. This article presents a workflow analysing the main controlling factors of oil saturation variation utilizing machine learning algorithms based on static and dynamic data from actual reservoirs. The dataset in this study generated from 468 wells includes thickness, permeability, porosity, net-to-gross (NTG) ratio, oil production variation (OPV), water production variation (WPV), water cut variation (WCV), neighbouring liquid production variation (NLPV), neighbouring water injection variation (NWIV), and oil saturation variation (OSV). A data processing workflow has been implemented to replace outliers and to increase model accuracy. A total of 10 machine learning algorithms are tested and compared in the dataset. Random forest (RF) and gradient boosting (GBT) are optimal and selected to conduct quantitative analysis of the main controlling factors. Analysis results show that NWIV is the variable with the highest degree of impact on OSV; impact factor is 0.276. Optimization measures are proposed for the development of this kind of sandstone reservoir based on main controlling factor analysis. This study proposes a reference case for oil saturation quantitative analysis based on machine learning methods that will help reservoir engineers make better decision.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jia-Wei Tang ◽  
Qing-Hua Liu ◽  
Xiao-Cong Yin ◽  
Ya-Cheng Pan ◽  
Peng-Bo Wen ◽  
...  

Raman spectroscopy (RS) is a widely used analytical technique based on the detection of molecular vibrations in a defined system, which generates Raman spectra that contain unique and highly resolved fingerprints of the system. However, the low intensity of normal Raman scattering effect greatly hinders its application. Recently, the newly emerged surface enhanced Raman spectroscopy (SERS) technique overcomes the problem by mixing metal nanoparticles such as gold and silver with samples, which greatly enhances signal intensity of Raman effects by orders of magnitudes when compared with regular RS. In clinical and research laboratories, SERS provides a great potential for fast, sensitive, label-free, and non-destructive microbial detection and identification with the assistance of appropriate machine learning (ML) algorithms. However, choosing an appropriate algorithm for a specific group of bacterial species remains challenging, because with the large volumes of data generated during SERS analysis not all algorithms could achieve a relatively high accuracy. In this study, we compared three unsupervised machine learning methods and 10 supervised machine learning methods, respectively, on 2,752 SERS spectra from 117 Staphylococcus strains belonging to nine clinically important Staphylococcus species in order to test the capacity of different machine learning methods for bacterial rapid differentiation and accurate prediction. According to the results, density-based spatial clustering of applications with noise (DBSCAN) showed the best clustering capacity (Rand index 0.9733) while convolutional neural network (CNN) topped all other supervised machine learning methods as the best model for predicting Staphylococcus species via SERS spectra (ACC 98.21%, AUC 99.93%). Taken together, this study shows that machine learning methods are capable of distinguishing closely related Staphylococcus species and therefore have great application potentials for bacterial pathogen diagnosis in clinical settings.


2021 ◽  
Vol 11 ◽  
Author(s):  
Mengya Li ◽  
Haiyan He ◽  
Guorong Huang ◽  
Bo Lin ◽  
Huiyan Tian ◽  
...  

Gastric cancer (GC) is the fifth most common cancer in the world and a serious threat to human health. Due to its high morbidity and mortality, a simple, rapid and accurate early screening method for GC is urgently needed. In this study, the potential of Raman spectroscopy combined with different machine learning methods was explored to distinguish serum samples from GC patients and healthy controls. Serum Raman spectra were collected from 109 patients with GC (including 35 in stage I, 14 in stage II, 35 in stage III, and 25 in stage IV) and 104 healthy volunteers matched for age, presenting for a routine physical examination. We analyzed the difference in serum metabolism between GC patients and healthy people through a comparative study of the average Raman spectra of the two groups. Four machine learning methods, one-dimensional convolutional neural network, random forest, support vector machine, and K-nearest neighbor were used to explore identifying two sets of Raman spectral data. The classification model was established by using 70% of the data as a training set and 30% as a test set. Using unseen data to test the model, the RF model yielded an accuracy of 92.8%, and the sensitivity and specificity were 94.7% and 90.8%. The performance of the RF model was further confirmed by the receiver operating characteristic (ROC) curve, with an area under the curve (AUC) of 0.9199. This exploratory work shows that serum Raman spectroscopy combined with RF has great potential in the machine-assisted classification of GC, and is expected to provide a non-destructive and convenient technology for the screening of GC patients.


Sign in / Sign up

Export Citation Format

Share Document