Identification of Celtis species using random forest with infrared spectroscopy and analysis of spectral feature importance

2021 ◽  
Vol 32 (6) ◽  
pp. 1183-1194
Author(s):  
Tae-Im Heo ◽  
Dong-Hyun Kim ◽  
Sung-Wook Hwang
2019 ◽  
Vol 59 (6) ◽  
pp. 1190 ◽  
Author(s):  
A. Bahri ◽  
S. Nawar ◽  
H. Selmi ◽  
M. Amraoui ◽  
H. Rouissi ◽  
...  

Rapid measurement optical techniques have the advantage over traditional methods of being faster and non-destructive. In this work visible and near-infrared spectroscopy (vis-NIRS) was used to investigate differences between measured values of key milk properties (e.g. fat, protein and lactose) in 30 samples of ewes milk according to three feed systems; faba beans, field peas and control diet. A mobile fibre-optic vis-NIR spectrophotometer (350–2500 nm) was used to collect reflectance spectra from milk samples. Principal component analysis was used to explore differences between milk samples according to the feed supplied, and a partial least-squares regression and random forest regression were adopted to develop calibration models for the prediction of milk properties. Results of the principal component analysis showed clear separation between the three groups of milk samples according to the diet of the ewes throughout the lactation period. Milk fat, protein and lactose were predicted with good accuracy by means of partial least-squares regression (R2 = 0.70–0.83 and ratio of prediction deviation, which is the ratio of standard deviation to root mean square error of prediction = 1.85–2.44). However, the best prediction results were obtained with random forest regression models (R2 = 0.86–0.90; ratio of prediction deviation = 2.73–3.26). The adoption of the vis-NIRS coupled with multivariate modelling tools can be recommended for exploring to differences between milk samples according to different feed systems, and to predict key milk properties, based particularly on the random forest regression modelling technique.


2017 ◽  
Vol 54 (10) ◽  
pp. 103001
Author(s):  
刘 明 Liu Ming ◽  
李忠任 Li Zhongren ◽  
张海涛 Zhang Haitao ◽  
于春霞 Yu Chunxia ◽  
唐兴宏 Tang Xinghong ◽  
...  

2011 ◽  
Vol 25 (4) ◽  
pp. 201-207 ◽  
Author(s):  
Dong-Sheng Cao ◽  
Yi-Zeng Liang ◽  
Qing-Song Xu ◽  
Liang-Xiao Zhang ◽  
Qian-Nan Hu ◽  
...  

2007 ◽  
Vol 15 (2) ◽  
pp. 115-121 ◽  
Author(s):  
B. Jagannadha Reddy ◽  
Ray L. Frost

In this endeavour, near infrared spectroscopy studies show evidence of variable composition in aurichalcite minerals of zinc copper carbonate hydroxides. The observation of a broad feature in the electronic part of the spectrum around 11,500 cm−1 (870 nm) is a strong indication of Cu2+ substitution for Zn2+ in the mineral. Overtones of OH vibrations in the spectra from 7250 to 5400 cm−1 (1380–1850 nm) show strong hydrogen bonding in these carbonates. A band common to spectra of all carbonates appears near 5400 cm−1 (1850 nm) due to the combination of both OH-stretching and HOH-bending vibrations, which may be attributed to adsorbed water. Aurichalcite minerals display a spectral sequence of five absorption bands with variation of both band positions and intensities and this is the chief spectral feature observed in the range 5200–5100 cm−1 (1920–2380 nm) due to vibrational processes of the carbonate ion. The frequency shift of carbonate bands suggests the effect of divalent cations and/or variations of the Zn/Cu ratio in aurichalcite minerals.


2021 ◽  
Author(s):  
Alena Orlenko ◽  
Jason H Moore

Abstract Background: Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer’s, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model’s performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis. Results: To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions. Conclusions: By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.


Author(s):  
Wang Zongbao

The distributed power generation in Gansu Province is dominated by wind power and photovoltaic power. Most of these distributed power plants are located in underdeveloped areas. Due to the weak local consumption capacity, the distributed electricity is mainly sent and consumed outside. A key indicator that affects ultra-long-distance power transmission is line loss. This is an important indicator of the economic operation of the power system, and it also comprehensively reflects the planning, design, production and operation level of power companies. However, most of the current research on line loss is focused on ultra-high voltage (≧110 KV), and there is less involved in distributed power generation lines below 110 KV. In this study, 35 kV and 110 kV lines are taken as examples, combined with existing weather, equipment, operation, power outages and other data, we summarize and integrate an analysis table of line loss impact factors. Secondly, from the perspective of feature relevance and feature importance, we analyze the factors that affect line loss, and obtain data with higher feature relevance and feature importance ranking. In the experiment, these two factors are determined as the final line loss influence factor. Then, based on the conclusion of the line loss influencing factor, the optimized random forest regression algorithm is used to construct the line loss prediction model. The prediction verification results show that the training set error is 0.021 and the test set error is 0.026. The prediction error of the training set and test set is only 0.005. The experimental results show that the optimized random forest algorithm can indeed analyze the line loss of 35 kV and 110 kV lines well, and can also explain the performance of 110-EaR1120 reasonably.


Sign in / Sign up

Export Citation Format

Share Document