scholarly journals K-Means Algorithm for Clustering of Learners Performance Levels Using Machine Learning Techniques

2021 ◽  
Vol 35 (1) ◽  
pp. 99-104
Author(s):  
Revathi Vankayalapati ◽  
Kalyani Balaso Ghutugade ◽  
Rekha Vannapuram ◽  
Bejjanki Pooja Sree Prasanna

Data Clustering is the process of grouping the objects in a way which is identical to the objects in the same group than in other classes. In this paper, the clustering of data is used as k-means to assess the output of students. Machine Learning is an area used in all systems. Machine learning is used in education, pattern recognition, sports, industrial applications. Its significance increases with the future of the students in the educational system. Data collection in education is very useful, as data volumes in the education system are growing each day. Higher education is relatively new, but due to the growing database its significance grows. There are several ways to assess the success of students. K-means is one of the best and most successful methods. The secret information in the database is extracted using data mining to increase the output of students. The decision tree is also a way to predict the success of the students. In recent years, educational institutions have the greatest challenges in increasing data growth and using it to increase efficiency, such that better decision-making can be made. Clustering is one of the most important methods used for the analysis of data sets. This trial uses cluster analyses according to their features for section students in various classes. Uncontrolled K-means algorithm is discussed. The mining of education data is used for the study of the knowledge available in the field of education in order to provide secret, significant and useful information. The proposed model considers K-means clustering model for analyzing learners performance. The outcomes and future of students can be strengthened with this support. The results show that the K-means cluster algorithm is useful for grouping students based on similar performance features.

Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


The Intrusion is a major threat to unauthorized data or legal network using the legitimate user identity or any of the back doors and vulnerabilities in the network. IDS mechanisms are developed to detect the intrusions at various levels. The objective of the research work is to improve the Intrusion Detection System performance by applying machine learning techniques based on decision trees for detection and classification of attacks. The methodology adapted will process the datasets in three stages. The experimentation is conducted on KDDCUP99 data sets based on number of features. The Bayesian three modes are analyzed for different sized data sets based upon total number of attacks. The time consumed by the classifier to build the model is analyzed and the accuracy is done.


Author(s):  
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.


Author(s):  
Jonathan Becker ◽  
Aveek Purohit ◽  
Zheng Sun

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.


2019 ◽  
Vol 119 (3) ◽  
pp. 676-696 ◽  
Author(s):  
Zhongyi Hu ◽  
Raymond Chiong ◽  
Ilung Pranata ◽  
Yukun Bao ◽  
Yuqing Lin

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.


2018 ◽  
Vol 7 (4) ◽  
pp. 2738
Author(s):  
P. Srinivas Rao ◽  
Jayadev Gyani ◽  
G. Narsimha

In online social network’s phony account detection is one of the major task among the ability of genuine user from forged user account. The fundamental objective of detection of phony account framework is to detect fake account and removal technique in Social network user sites. This work concentrates on detection of phony account in which it depends on normal basis framework, transformative Algorithms and fuzzy technique. Initially, the most essential attributes including personal attributes, comparability techniques and various real user review, tweets, or comments are extricated. A direct blend of these attributes demonstrates the significance of each reviews tweets comments etc. To compute closeness measure, a consolidated strategy in view of artificial honey bee state Algorithm and fuzzy technique are utilized. Second approach is proposed to alter the best weights of the normal user attributes utilizing the social network activities/transaction and inherited Algorithm. Finally, a normal rank rationale framework is utilized to calculate the final scoring of normal user activities. The decision making of proposed approach to find phony account are variation with existing techniques user behavioral analysis using data sets and machine learning techniques such as crowdflower_sample and genuine_accounts_sample dataset of facebook and Twitter. The outcomes demonstrate that proposed strategy overcomes the previously mentioned strategies. 


2020 ◽  
Author(s):  
Yosoon Choi ◽  
Jieun Baek ◽  
Jangwon Suh ◽  
Sung-Min Kim

<p>In this study, we proposed a method to utilize a multi-sensor Unmanned Aerial System (UAS) for exploration of hydrothermal alteration zones. This study selected an area (10m × 20m) composed mainly of the andesite and located on the coast, with wide outcrops and well-developed structural and mineralization elements. Multi-sensor (visible, multispectral, thermal, magnetic) data were acquired in the study area using UAS, and were studied using machine learning techniques. For utilizing the machine learning techniques, we applied the stratified random method to sample 1000 training data in the hydrothermal zone and 1000 training data in the non-hydrothermal zone identified through the field survey. The 2000 training data sets created for supervised learning were first classified into 1500 for training and 500 for testing. Then, 1500 for training were classified into 1200 for training and 300 for validation. The training and validation data for machine learning were generated in five sets to enable cross-validation. Five types of machine learning techniques were applied to the training data sets: k-Nearest Neighbors (k-NN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Deep Neural Network (DNN). As a result of integrated analysis of multi-sensor data using five types of machine learning techniques, RF and SVM techniques showed high classification accuracy of about 90%. Moreover, performing integrated analysis using multi-sensor data showed relatively higher classification accuracy in all five machine learning techniques than analyzing magnetic sensing data or single optical sensing data only.</p>


Sign in / Sign up

Export Citation Format

Share Document