A data-driven methodology for the classification of different liquids in artificial taste recognition applications with a pulse voltammetric electronic tongue

Electronic tongue-type sensor arrays are devices used to determine the quality of substances and seek to imitate the main components of the human sense of taste. For this purpose, an electronic tongue-based system makes use of sensors, data acquisition systems, and a pattern recognition system. Particularly, in the latter, machine learning techniques are useful in data analysis and have been used to solve classification and regression problems. However, one of the problems in the use of this kind of device is associated with the development of reliable pattern recognition algorithms and robust data analysis. In this sense, this work introduces a taste recognition methodology, which is composed of several steps including unfolding data, data normalization, principal component analysis for compressing the data, and classification through different machine learning models. The proposed methodology is tested using data from an electronic tongue with 13 different liquid substances; this electronic tongue uses multifrequency large amplitude pulse signal voltammetry. Results show that the methodology is able to perform the classification accurately and the best results are obtained when it includes the use of K-nearest neighbor machine in terms of accuracy compared with other kinds of machine learning approaches. Besides, the comparison to evaluate the methodology is made with different classification performance measures that show the behavior of the process in a single number.

Download Full-text

A Survey On Missing Data in Machine Learning

10.21203/rs.3.rs-535520/v1 ◽

2021 ◽

Author(s):

Tlamelo Emmanuel ◽

Thabiso Maupong ◽

Dimane Mpoeleng ◽

Thabo Semong ◽

Mphago Banyatsang ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Human Error ◽

Missing Values ◽

Nearest Neighbor ◽

Research Direction ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approaches ◽

K Nearest Neighbor

Abstract Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

Download Full-text

A survey on missing data in machine learning

Journal Of Big Data ◽

10.1186/s40537-021-00516-9 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Tlamelo Emmanuel ◽

Thabiso Maupong ◽

Dimane Mpoeleng ◽

Thabo Semong ◽

Banyatsang Mphago ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Human Error ◽

Missing Values ◽

Nearest Neighbor ◽

Research Direction ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approaches ◽

K Nearest Neighbor

AbstractMachine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

High-Speed and Accurate Meat Composition Imaging by Mechanically-Flexible Electrical Impedance Tomography With k-Nearest Neighbor and Fuzzy k-Means Machine Learning Approaches

IEEE Access ◽

10.1109/access.2021.3064315 ◽

2021 ◽

Vol 9 ◽

pp. 38792-38801

Author(s):

P. N. Darma ◽

M. Takei

Keyword(s):

Machine Learning ◽

Electrical Impedance Tomography ◽

High Speed ◽

Electrical Impedance ◽

Nearest Neighbor ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Impedance Tomography ◽

Meat Composition

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

Improving k-Nearest Neighbor Pattern Recognition Models for Privacy-Preserving Data Analysis

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006281 ◽

2019 ◽

Author(s):

Walisa Romsaiyud ◽

Henning Schnoor ◽

Wilhelm Hasselbring

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor

Download Full-text

A Study on the Psychological Analysis System Using Machine Learning

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.33.18591 ◽

2018 ◽

Vol 7 (3.33) ◽

pp. 128

Author(s):

Ki Young Lee ◽

Kyu Ho Kim ◽

Jeong Jin Kang ◽

Sung Jai Choi ◽

Yong Soon Im ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Smart Phone ◽

Principal Component ◽

Digital Data ◽

Expression Recognition ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Psychological Analysis ◽

Analysis System

Real-time facial expression recognition and analysis technology is recently drawing attention in areas of computer vision, computer graphics, and HCI. Recognition of user’s emotion on the basis of video and voice is drawing particular interest. The technology may help managers of households or hospitals. In the present study, video and voice were converted into digital data through MATLAB by using PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis), KNN(K Nearest Neighbor) algorithms to analyze emotions through machine learning. The manager of the psychological analysis counseling system may understand a user’s emotion in an smart phone environment. This system of the present study may help the manager to have a smooth conversation or develop a smooth relationship with a user on the basis of the provided psychological analysis results.

Download Full-text

PREDICTION OF CORONARY ARTERY DISEASE BASED ON ENSEMBLE LEARNING APPROACHES AND CO-EXPRESSED OBSERVATIONS

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519416400108 ◽

2016 ◽

Vol 16 (01) ◽

pp. 1640010 ◽

Cited By ~ 3

Author(s):

YING-TSANG LO ◽

HAMIDO FUJITA ◽

TUN-WEN PAI

Keyword(s):

Machine Learning ◽

Coronary Artery Disease ◽

Coronary Artery ◽

Nearest Neighbor ◽

Prediction Method ◽

Medical Decision ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Artery Disease ◽

Voting Mechanism

Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.

Download Full-text

Application of Fuzzy K-Nearest Neighbor (FKNN) to Detect the Parkinson’s Disease

InPrime: Indonesian Journal of Pure and Applied Mathematics ◽

10.15408/inprime.v1i1.12827 ◽

2019 ◽

Vol 1 (1) ◽

Author(s):

L.N. Desinaini ◽

Azizatul Mualimah ◽

Dian C. R. Novitasari ◽

Moh. Hafiyusholeh

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Principal Component Analysis ◽

Parkinson's Disease ◽

Nearest Neighbor ◽

Principal Component ◽

Component Analysis ◽

Training Data ◽

K Nearest Neighbor ◽

Positive Data

AbstractParkinson’s disease is a neurological disorder in which there is a gradual loss of brain cells that make and store dopamine. Researchers estimate that four to six million people worldwide, are living with Parkinson’s. The average age of patients is 60 years old, but some are diagnosed at age 40 or even younger and the worst thing is some patients are late to find out that they have Parkinson's disease. In this paper, we present a diagnosis system based on Fuzzy K-Nearest Neighbor (FKNN) to detect Parkinson’s disease. We use Parkinson’s disease dataset taken from UCI Machine Learning Repository. The first step is normalize the Parkinson’s disease dataset and analyze using Principal Component Analysis (PCA). The result shows that there are four new factors that influence Parkinson’s disease with total variance is 85.719%. In classification step, we use several percentage of training data to classify (detect) the Parkinson's disease i.e. 50%, 60%, 70%, 75%, 80% and 90%. We also use k = 3, 5, 7, and 9. The classification result shows that the highest accuracy obtained for the percentage of training data is 90% and k = 5, where 19 are correctly classified i.e. 14 positive data and 5 negative data, while 1 positive data is classified incorrectly.Keywords: Parkinson's disease; Fuzzy K-Nearest Neighbor; Principal Component Analysis. AbstrakPenyakit Parkinson merupakan kelainan sel saraf pada otak yang menyebabkan hilangnya dopamin pada otak. Para peneliti mengestimasi bahwa, empat sampai enam juta orang di dunia, menderita Parkinson. Penyakit ini rata-rata diderita oleh pasien berusia 60 tahun, namun beberapa orang terdeteksi saat berusia 40 tahun atau lebih muda dan hal terburuk adalah seseorang terlambat untuk mendeteksinya. Di dalam artikel ini, kami menyajikan sistem diagnosa penyakit Parkinson menggunakan metode Fuzzy K-Nearest Neighbor (FKNN). Kami menggunakan Data uji yang diperoleh dari UCI Machine Learning Repository yang telah banyak diterapkan pada masalah klasifikasi. Tahapan pertama yang kami lakukan adalah menormalisasi data kemudian menganalisisnya menggunakan Analisis Komponen Utama (Principal Component Analysis). Hasil Analisis Komponen Utama menunjukkan bahwa terdapat empat factor baru yang mempengaruhi penyakit Parkinson dengan variansi total 87,719%. Pada tahap klasifikasi, kami menggunakan beberapa prosentase data latih untuk mendeteksi penyakit yaitu 50%, 60%, 70%, 75%, 80% and 90%. Selain itu, kami menggunakan beberapa nilai k yaitu 3, 5, 7, and 9. Hasil menunjukkan bahwa klasifikasi dengan akurasi tertinggi diperoleh untuk 90% data latih dengan k = 5, dimana 19 diklasifikasikan secara tepat yaitu 14 data positif dan 5 data negatif, sedangkan satu data positif tidak diklasifikasikan dengan tepat.Keywords: penyakit Parkinson; Fuzzy K-Nearest Neighbor; Analisis Komponen Utama.

Download Full-text