SHOCK PHYSICS DATA RECONSTRUCTION USING SUPPORT VECTOR REGRESSION

This paper considers a set of shock physics experiments that investigate how materials respond to the extremes of deformation, pressure, and temperature when exposed to shock waves. Due to the complexity and the cost of these tests, the available experimental data set is often very sparse. A support vector machine (SVM) technique for regression is used for data estimation of velocity measurements from the underlying experiments. Because of good generalization performance, the SVM method successfully interpolates the experimental data. The analysis of the resulting velocity surface provides more information on the physical phenomena of the experiment. Additionally, the estimated data can be used to identify outlier data sets, as well as to increase the understanding of the other data from the experiment.

Download Full-text

Classification with Local Clustering in Imbalanced Data Sets

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.151 ◽

2011 ◽

Vol 219-220 ◽

pp. 151-155 ◽

Cited By ~ 2

Author(s):

Hua Ji ◽

Hua Xiang Zhang

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Imbalanced Data Sets ◽

Local Clustering ◽

Rare Class ◽

Novel Method ◽

The Cost

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data

10.1101/186197 ◽

2017 ◽

Author(s):

Alexander P. Browning ◽

Scott W. McCue ◽

Rachelle N. Binny ◽

Michael J. Plank ◽

Esha T. Shah ◽

...

Keyword(s):

Experimental Data ◽

Cell Migration ◽

Spatial Clustering ◽

Cancer Cell Line ◽

Movement Direction ◽

Data Sets ◽

Collective Cell Migration ◽

Rejection Sampling ◽

Data Set ◽

Cell Migration And Proliferation

AbstractCollective cell spreading takes place in spatially continuous environments, yet it is often modelled using discrete lattice-based approaches. Here, we use data from a series of cell proliferation assays, with a prostate cancer cell line, to calibrate a spatially continuous individual based model (IBM) of collective cell migration and proliferation. The IBM explicitly accounts for crowding effects by modifying the rate of movement, direction of movement, and the rate of proliferation by accounting for pair-wise interactions. Taking a Bayesian approach we estimate the free parameters in the IBM using rejection sampling on three separate, independent experimental data sets. Since the posterior distributions for each experiment are similar, we perform simulations with parameters sampled from a new posterior distribution generated by combining the three data sets. To explore the predictive power of the calibrated IBM, we forecast the evolution of a fourth experimental data set. Overall, we show how to calibrate a lattice-free IBM to experimental data, and our work highlights the importance of interactions between individuals. Despite great care taken to distribute cells as uniformly as possible experimentally, we find evidence of significant spatial clustering over short distances, suggesting that standard mean-field models could be inappropriate.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

LSTM-based soft sensor design for oxygen content of flue gas in coal-fired power plant

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220932390 ◽

2020 ◽

pp. 014233122093239

Author(s):

Hongguang Pan ◽

Tao Su ◽

Xiangdong Huang ◽

Zheng Wang

Keyword(s):

Power Plant ◽

Oxygen Content ◽

Flue Gas ◽

Oxygen Sensor ◽

Short Term Memory ◽

Support Vector ◽

Data Sets ◽

Auxiliary Variables ◽

Data Set ◽

Lstm Network

To address problems of high cost, complicated process and low accuracy of oxygen content measurement in flue gas of coal-fired power plant, a method based on long short-term memory (LSTM) network is proposed in this paper to replace oxygen sensor to estimate oxygen content in flue gas of boilers. Specifically, first, the LSTM model was built with the Keras deep learning framework, and the accuracy of the model was further improved by selecting appropriate super-parameters through experiments. Secondly, the flue gas oxygen content, as the leading variable, was combined with the mechanism and boiler process primary auxiliary variables. Based on the actual production data collected from a coal-fired power plant in Yulin, China, the data sets were preprocessed. Moreover, a selection model of auxiliary variables based on grey relational analysis is proposed to construct a new data set and divide the training set and testing set. Finally, this model is compared with the traditional soft-sensing modelling methods (i.e. the methods based on support vector machine and BP neural network). The RMSE of LSTM model is 4.51% lower than that of GA-SVM model and 3.55% lower than that of PSO-BP model. The conclusion shows that the oxygen content model based on LSTM has better generalization and has certain industrial value.

Download Full-text

Collaborative Analysis on the Marked Ages of Rice Wines by Electronic Tongue and Nose based on Different Feature Data Sets

Sensors ◽

10.3390/s20041065 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1065 ◽

Cited By ~ 2

Author(s):

Huihui Zhang ◽

Wenqing Shao ◽

Shanshan Qiu ◽

Jun Wang ◽

Zhenbo Wei

Keyword(s):

Electronic Tongue ◽

Alcoholic Beverages ◽

Support Vector ◽

Data Sets ◽

Good Prediction ◽

Least Squares Regression ◽

Data Set ◽

Weighted Fusion ◽

Fusion Data ◽

Direct Fusion

Aroma and taste are the most important attributes of alcoholic beverages. In the study, the self-developed electronic tongue (e-tongue) and electronic nose (e-nose) were used for evaluating the marked ages of rice wines. Six types of feature data sets (e-tongue data set, e-nose data set, direct-fusion data set, weighted-fusion data set, optimized direct-fusion data set, and optimized weighted-fusion data set) were used for identifying rice wines with different wine ages. Pearson coefficient analysis and variance inflation factor (VIF) analysis were used to optimize the fusion matrixes by removing the multicollinear information. Two types of discrimination methods (principal component analysis (PCA) and locality preserving projections (LPP)) were used for classifying rice wines, and LPP performed better than PCA in the discrimination work. The best result was obtained by LPP based on the weighted-fusion data set, and all the samples could be classified clearly in the LPP plot. Therefore, the weighted-fusion data were used as independent variables of partial least squares regression, extreme learning machine, and support vector machines (LIBSVM) for evaluating wine ages, respectively. All the methods performed well with good prediction results, and LIBSVM presented the best correlation coefficient (R2 ≥ 0.9998).

Download Full-text

Multiclass Contour-Preserving Classification with Support Vector Machine (SVM)

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0087 ◽

2017 ◽

Vol 26 (2) ◽

pp. 323-334 ◽

Cited By ~ 1

Author(s):

Piyabute Fuangkhon

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

University Of California ◽

Support Vector ◽

Data Sets ◽

Feed Forward Neural Network ◽

Real World Data ◽

Data Set ◽

The University ◽

Training Sets

AbstractMulticlass contour-preserving classification (MCOV) has been used to preserve the contour of the data set and improve the classification accuracy of a feed-forward neural network. It synthesizes two types of new instances, called fundamental multiclass outpost vector (FMCOV) and additional multiclass outpost vector (AMCOV), in the middle of the decision boundary between consecutive classes of data. This paper presents a comparison on the generalization of an inclusion of FMCOVs, AMCOVs, and both MCOVs on the final training sets with support vector machine (SVM). The experiments were carried out using MATLAB R2015a and LIBSVM v3.20 on seven types of the final training sets generated from each of the synthetic and real-world data sets from the University of California Irvine machine learning repository and the ELENA project. The experimental results confirm that an inclusion of FMCOVs on the final training sets having raw data can improve the SVM classification accuracy significantly.

Download Full-text

The importance of the accuracy of the experimental data for the prediction of solubility

Journal of the Serbian Chemical Society ◽

10.2298/jsc090809022e ◽

2010 ◽

Vol 75 (4) ◽

pp. 483-495 ◽

Cited By ~ 2

Author(s):

Slavica Eric ◽

Marko Kalinic ◽

Aleksandar Popovic ◽

Halid Makic ◽

Elvisa Civic ◽

...

Keyword(s):

Experimental Data ◽

Computational Models ◽

Linear Regression Analysis ◽

Heuristic Method ◽

Parameter Analysis ◽

Solubility Data ◽

Data Sets ◽

Data Set ◽

Experimental Solubility Data ◽

Experimental Solubility

Aqueous solubility is an important factor influencing several aspects of the pharmacokinetic profile of a drug. Numerous publications present different methodologies for the development of reliable computational models for the prediction of solubility from structure. The quality of such models can be significantly affected by the accuracy of the employed experimental solubility data. In this work, the importance of the accuracy of the experimental solubility data used for model training was investigated. Three data sets were used as training sets - Data Set 1 containing solubility data collected from various literature sources using a few criteria (n = 319), Data Set 2 created by substituting 28 values from Data set 1 with uniformly determined experimental data from one laboratory (n = 319) and Data Set 3 created by including 56 additional components, for which the solubility was also determined under uniform conditions in the same laboratory, in the Data Set 2 (n = 375). The selection of the most significant descriptors was performed by the heuristic method, using one-parameter and multi-parameter analysis. The correlations between the most significant descriptors and solubility were established using multi-linear regression analysis (MLR) for all three investigated data sets. Notable differences were observed between the equations corresponding to different data sets, suggesting that models updated with new experimental data need to be additionally optimized. It was successfully shown that the inclusion of uniform experimental data consistently leads to an improvement in the correlation coefficients. These findings contribute to an emerging consensus that improving the reliability of solubility prediction requires the inclusion of many diverse compounds for which solubility was measured under standardized conditions in the data set.

Download Full-text

Multi-temporal Crop Type and Field Boundary Classification with Google Earth Engine

10.20944/preprints202004.0316.v1 ◽

2020 ◽

Author(s):

Michael Marszalek ◽

Maximilian Lösch ◽

Marco Körner ◽

Urs Schmidhalter

Keyword(s):

Vegetation Index ◽

Google Earth ◽

Federal State ◽

Support Vector ◽

Data Sets ◽

Field Boundary ◽

Data Set ◽

Crop Type ◽

Normalised Difference Vegetation Index ◽

Boundary Classification

Crop type and field boundary mapping enable cost-efficient crop management on the field scale and serve as the basis for yield forecasts. Our study uses a data set with crop types and corresponding field borders from the federal state of Bavaria, Germany, as documented by farmers from 2016 to 2018. The study classified corn, winter wheat, barley, sugar beet, potato, and rapeseed as the main crops grown in Upper Bavaria. Corresponding Sentinel-2 data sets include the normalised difference vegetation index (NDVI) and raw band data from 2016 to 2018 for each selected field. The influences of clouds, raw bands, and NDVI on crop type classification are analysed, and the classification algorithms, i.e., support vector machine (SVM) and random forest (RF), are compared. Field boundary detection and extraction are based on non-iterative clustering and a newly developed procedure based on Canny edge detection. The results emphasise the application of Sentinel’s raw bands (B1–B12) and RF, which outperforms SVM with an accuracy of up to 94%. Furthermore, we forecast data for an unknown year, which slightly reduces the classification accuracy. The results demonstrate the usefulness of the proof-of-concept and its readiness for use in real applications.

Download Full-text