Collaborative Analysis on the Marked Ages of Rice Wines by Electronic Tongue and Nose based on Different Feature Data Sets

Aroma and taste are the most important attributes of alcoholic beverages. In the study, the self-developed electronic tongue (e-tongue) and electronic nose (e-nose) were used for evaluating the marked ages of rice wines. Six types of feature data sets (e-tongue data set, e-nose data set, direct-fusion data set, weighted-fusion data set, optimized direct-fusion data set, and optimized weighted-fusion data set) were used for identifying rice wines with different wine ages. Pearson coefficient analysis and variance inflation factor (VIF) analysis were used to optimize the fusion matrixes by removing the multicollinear information. Two types of discrimination methods (principal component analysis (PCA) and locality preserving projections (LPP)) were used for classifying rice wines, and LPP performed better than PCA in the discrimination work. The best result was obtained by LPP based on the weighted-fusion data set, and all the samples could be classified clearly in the LPP plot. Therefore, the weighted-fusion data were used as independent variables of partial least squares regression, extreme learning machine, and support vector machines (LIBSVM) for evaluating wine ages, respectively. All the methods performed well with good prediction results, and LIBSVM presented the best correlation coefficient (R2 ≥ 0.9998).

Download Full-text

Analysis of stratospheric NO2 trends above Jungfraujoch using ground-based UV-visible, FTIR, and satellite nadir observations

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-12-12357-2012 ◽

2012 ◽

Vol 12 (5) ◽

pp. 12357-12389

Author(s):

F. Hendrick ◽

E. Mahieu ◽

G. E. Bodeker ◽

K. F. Boersma ◽

M. P. Chipperfield ◽

...

Keyword(s):

Linear Trend ◽

Stratospheric Aerosol ◽

Atmospheric Composition ◽

Data Sets ◽

Least Squares Regression ◽

Quasi Biennial Oscillation ◽

Data Set ◽

Explanatory Variables ◽

Stratospheric Cooling ◽

Rule Out

Abstract. The trend in stratospheric NO2 column at the NDACC (Network for the Detection of Atmospheric Composition Change) station of Jungfraujoch (46.5° N, 8.0° E) is assessed using ground-based FTIR and zenith-scattered visible sunlight SAOZ measurements over the period 1990 to 2009 as well as a composite satellite nadir data set constructed from ERS-2/GOME, ENVISAT/SCIAMACHY, and METOP-A/GOME-2 observations over the 1996–2009 period. To calculate the trends, a linear least squares regression model including explanatory variables for a linear trend, the mean annual cycle, the quasi-biennial oscillation (QBO), solar activity, and stratospheric aerosol loading is used. For the 1990–2009 period, statistically indistinguishable trends of −3.7 ± 1.1%/decade and −3.6 ± 0.9%/decade are derived for the SAOZ and FTIR NO2 column time series, respectively. SAOZ, FTIR, and satellite nadir data sets show a similar decrease over the 1996–2009 period, with trends of −2.4 ± 1.1%/decade, −4.3 ± 1.4%/decade, and −3.6 ± 2.2%/decade, respectively. The fact that these declines are opposite in sign to the globally observed +2.5%/decade trend in N2O, suggests that factors other than N2O are driving the evolution of stratospheric NO2 at northern mid-latitudes. Possible causes of the decrease in stratospheric NO2 columns have been investigated. The most likely cause is a change in the NO2/NO partitioning in favor of NO, due to a possible stratospheric cooling and a decrease in stratospheric chlorine content, the latter being further confirmed by the negative trend in the ClONO2 column derived from FTIR observations at Jungfraujoch. Decreasing ClO concentrations slows the NO + ClO → NO2 + Cl reaction and a stratospheric cooling slows the NO + O3 → NO2 + O2 reaction, leaving more NOx in the form of NO. The slightly positive trends in ozone estimated from ground- and satellite-based data sets are also consistent with the decrease of NO2 through the NO2 + O3 → NO3 + O2 reaction. Finally, we cannot rule out the possibility that a strengthening of the Dobson-Brewer circulation, which reduces the time available for N2O photolysis in the stratosphere, could also contribute to the observed decline in stratospheric NO2 above Jungfraujoch.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Long-Term Rainfall Forecast Model Based on The TabNet and LightGbm Algorithm

10.21203/rs.3.rs-107107/v1 ◽

2020 ◽

Author(s):

Tianyu Xu ◽

Yongchuan Yu ◽

Jianzhuo Yan ◽

Hongxia Xu

Keyword(s):

Prediction Model ◽

Feature Fusion ◽

Forecast Model ◽

Data Sets ◽

Good Prediction ◽

Data Set ◽

Rainfall Prediction ◽

Improve Model ◽

Probability Prediction

Abstract Due to the problems of unbalanced data sets and distribution differences in long-term rainfall prediction, the current rainfall prediction model had poor generalization performance and could not achieve good prediction results in real scenarios. This study uses multiple atmospheric parameters (such as temperature, humidity, atmospheric pressure, etc.) to establish a TabNet-LightGbm rainfall probability prediction model. This research uses feature engineering (such as generating descriptive statistical features, feature fusion) to improve model accuracy, Borderline Smote algorithm to improve data set imbalance, and confrontation verification to improve distribution differences. The experiment uses 5 years of precipitation data from 26 stations in the Beijing-Tianjin-Hebei region of China to verify the proposed rainfall prediction model. The test set is to predict the rainfall of each station in one month. The experimental results shows that the model has good performance with AUC larger than 92%. The method proposed in this study further improves the accuracy of rainfall prediction, and provides a reference for data mining tasks.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

LSTM-based soft sensor design for oxygen content of flue gas in coal-fired power plant

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220932390 ◽

2020 ◽

pp. 014233122093239

Author(s):

Hongguang Pan ◽

Tao Su ◽

Xiangdong Huang ◽

Zheng Wang

Keyword(s):

Power Plant ◽

Oxygen Content ◽

Flue Gas ◽

Oxygen Sensor ◽

Short Term Memory ◽

Support Vector ◽

Data Sets ◽

Auxiliary Variables ◽

Data Set ◽

Lstm Network

To address problems of high cost, complicated process and low accuracy of oxygen content measurement in flue gas of coal-fired power plant, a method based on long short-term memory (LSTM) network is proposed in this paper to replace oxygen sensor to estimate oxygen content in flue gas of boilers. Specifically, first, the LSTM model was built with the Keras deep learning framework, and the accuracy of the model was further improved by selecting appropriate super-parameters through experiments. Secondly, the flue gas oxygen content, as the leading variable, was combined with the mechanism and boiler process primary auxiliary variables. Based on the actual production data collected from a coal-fired power plant in Yulin, China, the data sets were preprocessed. Moreover, a selection model of auxiliary variables based on grey relational analysis is proposed to construct a new data set and divide the training set and testing set. Finally, this model is compared with the traditional soft-sensing modelling methods (i.e. the methods based on support vector machine and BP neural network). The RMSE of LSTM model is 4.51% lower than that of GA-SVM model and 3.55% lower than that of PSO-BP model. The conclusion shows that the oxygen content model based on LSTM has better generalization and has certain industrial value.

Download Full-text

Analysis of stratospheric NO<sub>2</sub> trends above Jungfraujoch using ground-based UV-visible, FTIR, and satellite nadir observations

Atmospheric Chemistry and Physics ◽

10.5194/acp-12-8851-2012 ◽

2012 ◽

Vol 12 (18) ◽

pp. 8851-8864 ◽

Cited By ~ 19

Author(s):

F. Hendrick ◽

E. Mahieu ◽

G. E. Bodeker ◽

K. F. Boersma ◽

M. P. Chipperfield ◽

...

Keyword(s):

Linear Trend ◽

Stratospheric Aerosol ◽

Atmospheric Composition ◽

Data Sets ◽

Least Squares Regression ◽

Quasi Biennial Oscillation ◽

Data Set ◽

Explanatory Variables ◽

Stratospheric Cooling ◽

Rule Out

Abstract. The trend in stratospheric NO2 column at the NDACC (Network for the Detection of Atmospheric Composition Change) station of Jungfraujoch (46.5° N, 8.0° E) is assessed using ground-based FTIR and zenith-scattered visible sunlight SAOZ measurements over the period 1990 to 2009 as well as a composite satellite nadir data set constructed from ERS-2/GOME, ENVISAT/SCIAMACHY, and METOP-A/GOME-2 observations over the 1996–2009 period. To calculate the trends, a linear least squares regression model including explanatory variables for a linear trend, the mean annual cycle, the quasi-biennial oscillation (QBO), solar activity, and stratospheric aerosol loading is used. For the 1990–2009 period, statistically indistinguishable trends of −3.7 ± 1.1% decade−1 and −3.6 ± 0.9% decade−1 are derived for the SAOZ and FTIR NO2 column time series, respectively. SAOZ, FTIR, and satellite nadir data sets show a similar decrease over the 1996–2009 period, with trends of −2.4 ± 1.1% decade−1, −4.3 ± 1.4% decade−1, and −3.6 ± 2.2% decade−1, respectively. The fact that these declines are opposite in sign to the globally observed +2.5% decade−1 trend in N2O, suggests that factors other than N2O are driving the evolution of stratospheric NO2 at northern mid-latitudes. Possible causes of the decrease in stratospheric NO2 columns have been investigated. The most likely cause is a change in the NO2/NO partitioning in favor of NO, due to a possible stratospheric cooling and a decrease in stratospheric chlorine content, the latter being further confirmed by the negative trend in the ClONO2 column derived from FTIR observations at Jungfraujoch. Decreasing ClO concentrations slows the NO + ClO → NO2 + Cl reaction and a stratospheric cooling slows the NO + O3 → NO2 + O2 reaction, leaving more NOx in the form of NO. The slightly positive trends in ozone estimated from ground- and satellite-based data sets are also consistent with the decrease of NO2 through the NO2 + O3 → NO3 + O2 reaction. Finally, we cannot rule out the possibility that a strengthening of the Dobson-Brewer circulation, which reduces the time available for N2O photolysis in the stratosphere, could also contribute to the observed decline in stratospheric NO2 above Jungfraujoch.

Download Full-text

Classification with Local Clustering in Imbalanced Data Sets

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.151 ◽

2011 ◽

Vol 219-220 ◽

pp. 151-155 ◽

Cited By ~ 2

Author(s):

Hua Ji ◽

Hua Xiang Zhang

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Imbalanced Data Sets ◽

Local Clustering ◽

Rare Class ◽

Novel Method ◽

The Cost

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.

Download Full-text

Multiclass Contour-Preserving Classification with Support Vector Machine (SVM)

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0087 ◽

2017 ◽

Vol 26 (2) ◽

pp. 323-334 ◽

Cited By ~ 1

Author(s):

Piyabute Fuangkhon

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

University Of California ◽

Support Vector ◽

Data Sets ◽

Feed Forward Neural Network ◽

Real World Data ◽

Data Set ◽

The University ◽

Training Sets

AbstractMulticlass contour-preserving classification (MCOV) has been used to preserve the contour of the data set and improve the classification accuracy of a feed-forward neural network. It synthesizes two types of new instances, called fundamental multiclass outpost vector (FMCOV) and additional multiclass outpost vector (AMCOV), in the middle of the decision boundary between consecutive classes of data. This paper presents a comparison on the generalization of an inclusion of FMCOVs, AMCOVs, and both MCOVs on the final training sets with support vector machine (SVM). The experiments were carried out using MATLAB R2015a and LIBSVM v3.20 on seven types of the final training sets generated from each of the synthetic and real-world data sets from the University of California Irvine machine learning repository and the ELENA project. The experimental results confirm that an inclusion of FMCOVs on the final training sets having raw data can improve the SVM classification accuracy significantly.

Download Full-text

Multi-temporal Crop Type and Field Boundary Classification with Google Earth Engine

10.20944/preprints202004.0316.v1 ◽

2020 ◽

Author(s):

Michael Marszalek ◽

Maximilian Lösch ◽

Marco Körner ◽

Urs Schmidhalter

Keyword(s):

Vegetation Index ◽

Google Earth ◽

Federal State ◽

Support Vector ◽

Data Sets ◽

Field Boundary ◽

Data Set ◽

Crop Type ◽

Normalised Difference Vegetation Index ◽

Boundary Classification

Crop type and field boundary mapping enable cost-efficient crop management on the field scale and serve as the basis for yield forecasts. Our study uses a data set with crop types and corresponding field borders from the federal state of Bavaria, Germany, as documented by farmers from 2016 to 2018. The study classified corn, winter wheat, barley, sugar beet, potato, and rapeseed as the main crops grown in Upper Bavaria. Corresponding Sentinel-2 data sets include the normalised difference vegetation index (NDVI) and raw band data from 2016 to 2018 for each selected field. The influences of clouds, raw bands, and NDVI on crop type classification are analysed, and the classification algorithms, i.e., support vector machine (SVM) and random forest (RF), are compared. Field boundary detection and extraction are based on non-iterative clustering and a newly developed procedure based on Canny edge detection. The results emphasise the application of Sentinel’s raw bands (B1–B12) and RF, which outperforms SVM with an accuracy of up to 94%. Furthermore, we forecast data for an unknown year, which slightly reduces the classification accuracy. The results demonstrate the usefulness of the proof-of-concept and its readiness for use in real applications.

Download Full-text

M2 Macrophage-Based Prognostic Nomogram for Gastric Cancer After Surgical Resection

Frontiers in Oncology ◽

10.3389/fonc.2021.690037 ◽

2021 ◽

Vol 11 ◽

Author(s):

Jianwen Hu ◽

Yongchen Ma ◽

Ju Ma ◽

Yanpeng Yang ◽

Yingze Ning ◽

...

Keyword(s):

Gastric Cancer ◽

Tnm Staging ◽

Training Data ◽

M2 Macrophage ◽

Data Sets ◽

Good Prediction ◽

Data Set ◽

Age And Gender ◽

Cd163 Expression ◽

And Gender

A good prediction model is useful to accurately predict patient prognosis. Tumor–node–metastasis (TNM) staging often cannot accurately predict prognosis when used alone. Some researchers have shown that the infiltration of M2 macrophages in many tumors indicates poor prognosis. This approach has the potential to predict prognosis more accurately when used in combination with TNM staging, but there is less research in gastric cancer. A multivariate analysis demonstrated that CD163 expression, TNM staging, age, and gender were independent risk factors for overall survival. Thus, these parameters were assessed to develop the nomogram in the training data set, which was tested in the validation and whole data sets. The model showed a high degree of discrimination, calibration, and good clinical benefit in the training, validation, and whole data sets. In conclusion, we combined CD163 expression in macrophages, TNM staging, age, and gender to develop a nomogram to predict 3- and 5-year overall survivals after curative resection for gastric cancer. This model has the potential to provide further diagnostic and prognostic value for patients with gastric cancer.

Download Full-text