Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

Particle Swarm Optimization pada Analisa Review Software Antivirus Menggunakan Metode K-Nearest Neighbors

INFORMATICS FOR EDUCATORS AND PROFESSIONAL : Journal of Informatics ◽

10.51211/itbi.v4i2.1313 ◽

2020 ◽

Vol 4 (2) ◽

pp. 123

Author(s):

Sucitra Sahara ◽

Rizqi Agung Permana ◽

Hariyanto Hariyanto

Keyword(s):

Particle Swarm Optimization ◽

Text Mining ◽

Nearest Neighbor ◽

Particle Swarm ◽

Optimization Method ◽

Data Sets ◽

K Nearest Neighbor ◽

Process Data ◽

Data Set ◽

Swarm Optimization

Abstrak: Virus pada komputer menjadi hal yang membahayakan bagi para pengguna komputer perorangan maupun perusahaan yang telah menerapkan sistem terkomputerisasi. Virus program yang didesain untuk tujuan jahat dapat merusak bagian tertentu dari komputer, bahkan yang paling merugikan adalah dapat merusak data penting pada perusahaan. Dalam hal ini maka diciptakanlah sebuah software anti virus, perkembangan anti virus selalu lebih lambat dari virus itu sendiri, sehingga peneliti akan mengadakan penyeleksian software anti virus pada suatu opini atau berdasarkan komentar masyarakat yang telah menggunakan software anti virus produk tertentu dan dituangkan ke media online seperti komentar pada suatu situs penjualan produk tersebut. Berdasarkan ribuan komentar akan diolah dan dikelompokkan pada jenis kata teks positif dan teks negatif, dan peneliti membuat klasifikasi data dengan menggunakan metode algoritma k-Nearest Neighbor (k-NN), algoritma k-NN adalah salah satu algoritma yang sesuai dalam penelitian kali ini. Peneliti menemukan bahwa algoritma k-NN mampu mengolah data set yang sudah dikelompokan pada teks positif dan negatif khususnya dalam pemilihan teks, dan penerapan metode optimasi Particle Swarm Optimization (PSO) yang dikombinasikan dengan k-NN diharapkan mampu meningkatkan nilai akurasi sehingga datanya lebih kuat dan valid. Sebelum data set diolah menggukanan optimasi PSO hanya menggunakan metode k-NN akurasi data yang diperoleh 70,50% sedangkan hasil akurasi setelah penggunaan metode k-nn dan optimasi PSO didapatkan nilai akurasi sebesar 83,50%. Dapat disimpulkan bahwa penggunaan optimasi PSO dan metode k-NN sangat sesuai pada konsep text mining dan penyeksian pada data set berupa text. Kata kunci: Analisis Review, Optimasi Particle Swarm Optimization, Metode k-Nearest Neighbor. Abstract: Viruses on computers become dangerous for individual computer users and companies that have implemented computerized systems. Virus programs that are designed for malicious purposes can damage certain parts of the computer, even the most detrimental is that it can damage important data on the company. In this case an anti-virus software is created, the development of anti-virus is always slower than the virus itself, so researchers will conduct an anti-virus software selection on an opinion or based on public comments that have used a particular product's anti-virus software and poured it into online media such as comment on a product sales site. Of the thousands of comments will be processed and grouped on the type of positive and negative text words, and researchers make data classification using the k-Nearest Neighbor (k-NN) algorithm method, the k-NN algorithm is one of the appropriate algorithms in this study. The researcher found that the k-NN algorithm is able to process data sets that have been grouped in positive and negative texts, especially in text selection, and the application of the Particle Swarm Optimization (PSO) optimization method combined with k-NN is expected to be able to increase the accuracy value so that the data is stronger and valid. Before the data set is processed using PSO optimization only using the k-NN method the accuracy of the data obtained is 70.50% while the accuracy results after the use of the k-nn method and PSO optimization obtained an accuracy value of 83.50%. It can be concluded that the use of PSO optimization and the k-NN method are very compatible with the concept of text mining and correction of text data sets. Keywords: Analysis Review, k-Nearest Neighbor Method, Particle Swarm Optimization optimization

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Scalable Non-Parametric Methods for Large Data Sets

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch260 ◽

2011 ◽

pp. 1708-1713

Author(s):

V. Suresh Babu ◽

P. Viswanath ◽

Narasimha M. Murty

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Parametric Methods ◽

Clustering Method ◽

Data Set ◽

Computational Burden ◽

Set Size ◽

Non Parametric

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.

Download Full-text

A Research Travelogue on Classification Algorithms using R Programming

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9014.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 9155-9158

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Statistical Tests ◽

Learning Task ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Domain Experts ◽

R Programming ◽

Training Examples

Classification is a machine learning task which consists in predicting the set association of unclassified examples, whose label is not known, by the properties of examples in a representation learned earlier as of training examples, that label was known. Classification tasks contain a huge assortment of domains and real world purpose: disciplines such as medical diagnosis, bioinformatics, financial engineering and image recognition between others, where domain experts can use the model erudite to sustain their decisions. All the Classification Approaches proposed in this paper were evaluate in an appropriate experimental framework in R Programming Language and the major emphasis is on k-nearest neighbor method which supports vector machines and decision trees over large number of data sets with varied dimensionality and by comparing their performance against other state-of-the-art methods. In this process the experimental results obtained have been verified by statistical tests which support the better performance of the methods. In this paper we have survey various classification techniques of Data Mining and then compared them by using diverse datasets from “University of California: Irvine (UCI) Machine Learning Repository” for acquiring the accurate calculations on Iris Data set.

Download Full-text

Implementing Machine Learning Algorithms on Finite Element Analyses Data Sets for Selecting Proper Cellular Structure

International Journal of Applied Mechanics ◽

10.1142/s1758825121500721 ◽

2021 ◽

Author(s):

Mahziyar Darvishi ◽

Omid Ziaee ◽

Arash Rahmati ◽

Mohammad Silani

Keyword(s):

Machine Learning ◽

Finite Element ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Cellular Structures ◽

Data Sets ◽

K Nearest Neighbor ◽

Element Analysis ◽

Data Set ◽

Efficient Alternative

Numerous structure geometries are available for cellular structures, and selecting the suitable structure that reflects the intended characteristics is cumbersome. While testing many specimens for determining the mechanical properties of these materials could be time-consuming and expensive, finite element analysis (FEA) is considered an efficient alternative. In this study, we present a method to find the suitable geometry for the intended mechanical characteristics by implementing machine learning (ML) algorithms on FEA results of cellular structures. Different cellular structures of a given material are analyzed by FEA, and the results are validated with their corresponding analytical equations. The validated results are employed to create a data set used in the ML algorithms. Finally, by comparing the results with the correct answers, the most accurate algorithm is identified for the intended application. In our case study, the cellular structures are three widely used cellular structures as bone implants: Cube, Kelvin, and Rhombic dodecahedron, made of Ti–6Al–4V. The ML algorithms are simple Bayesian classification, K-nearest neighbor, XGBoost, random forest, and artificial neural network. By comparing the results of these algorithms, the best-performing algorithm is identified.

Download Full-text

Clustering Based on a Novel Density Estimation Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.748.590 ◽

2013 ◽

Vol 748 ◽

pp. 590-594

Author(s):

Li Liao ◽

Yong Gang Lu ◽

Xu Rong Chen

Keyword(s):

Density Estimation ◽

Nearest Neighbor ◽

Mean Shift ◽

Estimation Method ◽

Synthetic Data ◽

Real Data ◽

Data Sets ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Data Set

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.

Download Full-text

A Short-Term Traffic Flow Forecasting Method Based on a Three-Layer K-Nearest Neighbor Non-Parametric Regression Algorithm

Journal of Transportation Technologies ◽

10.4236/jtts.2016.64020 ◽

2016 ◽

Vol 06 (04) ◽

pp. 200-206 ◽

Cited By ~ 6

Author(s):

Xiyu Pang ◽

Cheng Wang ◽

Guolin Huang

Keyword(s):

Traffic Flow ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Short Term ◽

Parametric Regression ◽

Traffic Flow Forecasting ◽

Forecasting Method ◽

Non Parametric

Download Full-text

K-NN Method for Review Analys Product Accounting Software

Jurnal Ilmiah Informatika ◽

10.35316/jimi.v5i2.948 ◽

2020 ◽

Vol 5 (2) ◽

pp. 85-92

Author(s):

Sucitra Sahara ◽

Rizqi Agung Permana

Keyword(s):

Financial Management ◽

Nearest Neighbor ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

The Public ◽

Software Products ◽

Software Product ◽

Accounting Software ◽

Selection Stage

Many companies have not implemented accounting software in financial management. Even though the current era of technology is increasingly updated and developing, more and more superior products are being issued by software development companies, especially in accounting software. There are not a few software products whose quality is still below standard or incomplete with features and facilities. So that researchers concentrate on companies or individual businesses that still use manual methods in processing their finances by helping and making it easier to choose the software product they will choose. Researchers first carry out the accounting software product selection stage based on an opinion or opinion of the public who have bought and used the software they choose and they pour this opinion into online media such as comments on a product selling site. Thousands of comments will be processed and grouped into data sets and this time the researcher processes the data classification using the k-Nearest Neighbor (K-NN) algorithm. By using the K-NN method, it is expected to be able to produce the expected accuracy value so that the data set processing is stronger and more valid. It turns out that after applying the data accuracy value obtained by 80.50%, it can be concluded that the K-NN method is very suitable for the concept of text mining this time and for selecting the data set in the form of text.

Download Full-text

Autonomous Fingerprinting and Large Experimental Data Set for Visible Light Positioning

Sensors ◽

10.3390/s21093256 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3256

Author(s):

Tyrel Glass ◽

Fakhrul Alam ◽

Mathew Legg ◽

Frazer Noble

Keyword(s):

Experimental Data ◽

Visible Light ◽

Nearest Neighbor ◽

Channel Model ◽

Ground Truth ◽

Small Data ◽

Large Set ◽

K Nearest Neighbor ◽

Data Set ◽

Model Based

This paper presents an autonomous method of collecting data for Visible Light Positioning (VLP) and a comprehensive investigation of VLP using a large set of experimental data. Received Signal Strength (RSS) data are efficiently collected using a novel method that utilizes consumer grade Virtual Reality (VR) tracking for accurate ground truth recording. An investigation into the accuracy of the ground truth system showed median and 90th percentile errors of 4.24 and 7.35 mm, respectively. Co-locating a VR tracker with a photodiode-equipped VLP receiver on a mobile robotic platform allows fingerprinting on a scale and accuracy that has not been possible with traditional manual collection methods. RSS data at 7344 locations within a 6.3 × 6.9 m test space fitted with 11 VLP luminaires is collected and has been made available for researchers. The quality and the volume of the data allow for a robust study of Machine Learning (ML)- and channel model-based positioning utilizing visible light. Among the ML-based techniques, ridge regression is found to be the most accurate, outperforming Weighted k Nearest Neighbor, Multilayer Perceptron, and random forest, among others. Model-based positioning is more accurate than ML techniques when a small data set is available for calibration and training. However, if a large data set is available for training, ML-based positioning outperforms its model-based counterparts in terms of localization accuracy.

Download Full-text

Learning from Imbalanced Multi-label Data Sets by Using Ensemble Strategies

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v4i1.109 ◽

2015 ◽

Vol 4 (1) ◽

pp. 61-81

Author(s):

Mohammad Masoud Javidi

Keyword(s):

Logistic Regression ◽

Ensemble Learning ◽

Nearest Neighbor ◽

Imbalanced Data ◽

Classification Performance ◽

Training Data ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Stable Algorithm

Multi-label classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Problems of this type are ubiquitous in everyday life. Such as, a movie can be categorized as action, crime, and thriller. Most algorithms on multi-label classification learning are designed for balanced data and donâ€™t work well on imbalanced data. On the other hand, in real applications, most datasets are imbalanced. Therefore, we focused to improve multi-label classification performance on imbalanced datasets. In this paper, a state-of-the-art multi-label classification algorithm, which called IBLR_ML, is employed. This algorithm is produced from combination of k-nearest neighbor and logistic regression algorithms. Logistic regression part of this algorithm is combined with two ensemble learning algorithms, Bagging and Boosting. My approach is called IB-ELR. In this paper, for the first time, the ensemble bagging method whit stable learning as the base learner and imbalanced data sets as the training data is examined. Finally, to evaluate the proposed methods; they are implemented in JAVA language. Experimental results show the effectiveness of proposed methods. Keywords: Multi-label classification, Imbalanced data set, Ensemble learning, Stable algorithm, Logistic regression, Bagging, Boosting

Download Full-text