scholarly journals Socio-Economical Status of India using Machine Learning Algorithms

2020 ◽  
Vol 8 (5) ◽  
pp. 3804-3813

Data is everywhere and lots of data is openly available to people. We can analyze this data to find the hidden and unnoticed information to use it purposefully. One important source of information is census data and it provides data related to the people living in a country. Analyzing such data is useful for knowing the socio economic status of the country. Data mining and machine learning techniques can be used to analyze such large volumes of data. In this work Indian census 2011 is analyzed and identified the socio economic status of different states of India. To identify the social status of each state we studied literacy rate, categories of workers in different fields, gender wise working population. To identify economical status like people living below poverty and above poverty we used clustering techniques of machine learning. At first we pre-processed the data and later correlation based feature selection was applied, and on that result k-means and k-mediods clustering methods were implemented independently. Finally the clusters are evaluated to see the performance using confusion matrix. The final results show that k-mediod has better performance than K-means.

The major source of living for the people of India is agriculture. It is considered as important economy for the country. India is one of the country that suffer from natural calamities like drought and flood that may destroy the crops which may lead to heavy loss for the people doing agriculture. Predicting the crop type can help them to cultivate the suitable crop that can be cultivated in that particular soil type. Soil is one major factor or agriculture. There are several types of soil available in our county. In order to classify the soil type we need to understand the characteristics of the soil. Data mining and machine learning is one of the emerging technology in the field of agriculture and horticulture. In order to classify the soil type and Provide suggestion of fertilizers that can improve the growth of the crop cultivated in that particular soil type plays major role in agriculture. For that here exploring Several machine learning algorithms such as Support vector machine(SVM),k-Nearest Neighbour(k-NN) and logistic regression are used to classify the soil type.


2020 ◽  
Vol 13 (37) ◽  
pp. 3820-3842
Author(s):  
V Balasankar ◽  

Background: Developing economic and social systems and assuring the efficiency of economic and social processes is the major task for the government of any country. Predictable machine learning (ML) models are used for analyzing data sets that allow more efficient enterprise management. Now a day, the research on Socio-Economic Status (SES) and Machine Learning (ML) is very crucial to find socio-economic inequalities, and take further actions that are preventions, protections, and suppressions. Objectives: The mainobjective of this research is to understand the Socio Economic System issues and predicting SES levels on particular area like Rajahmundry, AP, India using statistical analysis and machine learning methodologies. Methods: In this, we analyze the data that is collected from Rajahmundry (Rajamahandravaram),Andhra Pradesh, India with 48 feature attributes (dimensions), and one target four class attribute (poor, rich, middle, upper-middle ). The SES levels like poor, rich, middle, and upper-middle classes are predicted by 5 ML algorithms. Findings: In this paper, we conduct the statistical analysis of each attribute, and analyze and compare the performance accuracies using confusion matrix, performance parameter (classification accuracy, Precision,Recall, and F1) values and receive operating characteristic (ROC) under AUC values of five efficient ML algorithms like Naïve Bayes, Decision Trees (DTs), k-NN, SVM (kernel RBF) and Random Forest (RF). We observed that the RF algorithm showed better results when compared with other algorithms for the Rajahmundry AP SES dataset. The RF algorithm performs 97.82% of classification accuracy (CA) and time is taken for model construction 0.41 seconds. The next superior performed ML model is DTs with 96.67% of CA and 0.16 seconds for model construction. Novelty: Comprehensive analysis indicates that the novel AP SES Dataset with empirical statistical analysis gives the good results and predicts the SES levels with RF model is very effective. Keywords: Machine Learning; socio-economic status; Rajahmundry;household; poverty


2020 ◽  
Vol 8 (6) ◽  
pp. 5482-5485

Most of the times, data is created for the Intrusion Detection System (IDS) only when the set of all real working environments are explored under all the possibilities of attacks, which is an expensive task. Network Intrusion Detection software shields a system and computer network from staff and non-authorized users. The detector’s ultimate task is to build a foreboding classifier (i.e. a model) which would help in distinguishing between friendly and non-friendly connections, known as attacks or intrusions.This problem in network sectors is prevented by predicting whether the connection is attacked or not attacked from the dataset. We are using i.e. KDDCup99 using bio inspired machine learning techniques (like Artificial Neural Network). Bio inspired algorithm is a game changer in computer science. The extent of this field is really magnificent as compared to nature around it, complications of computer science are only a subset of it, opening a new era in next generation computing, modelling and algorithm engineering. The aim is to investigate bio inspired machine learning based techniques for better packet connection transfers forecasting by prediction results in best accuracy and to propose this machine learning-based method to accurately predict the DOS, R2L, U2R, Probe and overall attacks by predicting results in the form of best accuracy from comparing supervised classification machine learning algorithms. Furthermore, to compare and discuss the performance of various ML algorithms from the provided dataset with classification and evaluation report, finding and analysing the confusion matrix and for classifying data from the priority and result shows that the effectiveness of the proposed system i.e. bio inspired machine learning algorithm technique can be put on test with best accuracy along with precision, specificity, sensitivity, F1 Score and Recall


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


ABSTRACT The study analyses the socio-economic status, degree of income inequality and perceived socio-economic conditions of the fish farmers of the four districts of Sikkim. A total sample size of 200 fish farmers was selected from the four districts depending upon the presence of the number of farmers in each district. Purposive random sampling method was used and the results were analysed from descriptive statistics such as frequency count and percentages. The degree of income inequality was analysed through Gini coefficients. The factors that determined the perceived socio-economic living conditions were analysed with a logistic regression model. The socio-economic status of the people was found to be in good condition and there were not many variations among the fish farmers of different districts. Most of the respondents had pucca houses with the combination of firewood and LPG as a source of cooking fuel and also had access to basic amenities like electricity, drinking water and sanitation facilities in the households. The study also found that income inequality was not so severe amongst the fish farmers of the three districts except for the East district which had the strongest income inequality. The per capita income, housing condition and ratio of above primary education to total members had a significant impact on the perceived living conditions of the fish farmers. Keywords


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Materials ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1089
Author(s):  
Sung-Hee Kim ◽  
Chanyoung Jeong

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.


2009 ◽  
Vol 29 (3) ◽  
pp. 397-411 ◽  
Author(s):  
VERENA H. MENEC ◽  
DAWN M. VESELYUK ◽  
AUDREY A. BLANDFORD ◽  
SCOTT NOWICKI

ABSTRACTResearch has shown that the level of activity of the residents of a city's neighbourhood is related to the availability of activity-related resources. This study aimed to characterise the housing environment in which many older adults live by exploring what activity-related resources were available in senior apartment buildings in one Canadian city, Winnipeg. Of 195 senior apartment buildings in the city, 190 were surveyed to examine whether variation in the buildings' activity resources was related to neighbourhood characteristics, particularly socio-economic status. Resources were classified as those for physical activities (e.g. exercise classes), social activities (e.g. card games), and services (e.g. a grocery-store shuttle). The neighbourhood characteristics were taken from census data and included socio-economic and socio-demographic measures. The apartment buildings varied considerably in the resources available, and a positive relationship was found between neighbourhood income and physical and social activity programmes and services. Lower residential stability and a higher percentage of residents living alone were also related to the buildings' resource-richness, and senior apartment buildings with limited activity-related resources clustered in disadvantaged neighbourhoods. How senior apartments are resourced should be examined in relation to the neighbourhood in which they are located.


Sign in / Sign up

Export Citation Format

Share Document