Performance based Machine Learning Algorithm for Topic Oriented Text Categorization

doi:10.35940/ijrte.b1429.0982s1119

Understanding the expression of grievances in the Arabic Twitter-sphere using machine learning

Journal of Criminological Research Policy and Practice ◽

10.1108/jcrpp-02-2019-0009 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-119

Author(s):

Yeslam Al-Saggaf ◽

Amanda Davies

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Network ◽

Network Analysis ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Set ◽

Content Type ◽

The Social ◽

Twitter Users

Purpose The purpose of this paper is to discuss the design, application and findings of a case study in which the application of a machine learning algorithm is utilised to identify the grievances in Twitter in an Arabian context. Design/methodology/approach To understand the characteristics of the Twitter users who expressed the identified grievances, data mining techniques and social network analysis were utilised. The study extracted a total of 23,363 tweets and these were stored as a data set. The machine learning algorithm applied to this data set was followed by utilising a data mining process to explore the characteristics of the Twitter feed users. The network of the users was mapped and the individual level of interactivity and network density were calculated. Findings The machine learning algorithm revealed 12 themes all of which were underpinned by the coalition of Arab countries blockade of Qatar. The data mining analysis revealed that the tweets could be clustered in three clusters, the main cluster included users with a large number of followers and friends but who did not mention other users in their tweets. The social network analysis revealed that whilst a large proportion of users engaged in direct messages with others, the network ties between them were not registered as strong. Practical implications Borum (2011) notes that invoking grievances is the first step in the radicalisation process. It is hoped that by understanding these grievances, the study will shed light on what radical groups could invoke to win the sympathy of aggrieved people. Originality/value In combination, the machine learning algorithm offered insights into the grievances expressed within the tweets in an Arabian context. The data mining and the social network analyses revealed the characteristics of the Twitter users highlighting identifying and managing early intervention of radicalisation.

A Classification Framework on Opinion Mining for Effective Recommendation Systems

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch040 ◽

2018 ◽

pp. 980-994

Author(s):

Mahima Goyal ◽

Vishal Bhatnagar

Keyword(s):

Machine Learning ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Classification Framework ◽

Sentence Level ◽

The Social ◽

Social Media Platforms ◽

Mining Methods ◽

Document Level

With the recent trend of expressing opinions on the social media platforms like Twitter, Blogs, Reviews etc., a large amount of data is available for the analysis in the form of opinion mining. This analysis plays pivotal role in providing recommendation for ecommerce products, services and social networks, forecasting market movements and competition among businesses, etc. The authors present a literature review about the different techniques and applications of this field. The primary techniques can be classified into Data Mining methods, Natural Language Processing (NLP) and Machine learning algorithms. A classification framework is designed to depict the three levels of opinion mining –document level, Sentence Level and Aspect Level along with the methods involved in it. A system can be recommended on the basis of content based and collaborative filtering

A Classification Framework on Opinion Mining for Effective Recommendation Systems

Collaborative Filtering Using Data Mining and Analysis - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0489-4.ch010 ◽

2017 ◽

pp. 180-194

Author(s):

Mahima Goyal ◽

Vishal Bhatnagar

Keyword(s):

Machine Learning ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Classification Framework ◽

Sentence Level ◽

The Social ◽

Social Media Platforms ◽

Mining Methods ◽

Document Level

With the recent trend of expressing opinions on the social media platforms like Twitter, Blogs, Reviews etc., a large amount of data is available for the analysis in the form of opinion mining. This analysis plays pivotal role in providing recommendation for ecommerce products, services and social networks, forecasting market movements and competition among businesses, etc. The authors present a literature review about the different techniques and applications of this field. The primary techniques can be classified into Data Mining methods, Natural Language Processing (NLP) and Machine learning algorithms. A classification framework is designed to depict the three levels of opinion mining –document level, Sentence Level and Aspect Level along with the methods involved in it. A system can be recommended on the basis of content based and collaborative filtering

Opinion Mining from Text Reviews Using Machine Learning Algorithm

International Journal of Innovative Research in Computer and Communication Engineering ◽

10.15680/ijircce.2015.0303024 ◽

2015 ◽

Vol 03 (03) ◽

pp. 1567-1570 ◽

Cited By ~ 3

Author(s):

Poobana S, Sashi Rekha k

Keyword(s):

Machine Learning ◽

Opinion Mining ◽

Learning Algorithm ◽

Machine Learning Algorithm

Big Data for Health Care Analytics using Extreme Machine Learning Based on Map Reduce

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5808.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2758-2762

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Clinical Data ◽

Disease Risk ◽

Learning Algorithm ◽

Information Storage ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Set

A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.

DETERMINING LEARNING SUCCESS of kNN ALGORITHM on ZOO DATASET

Euroasia Journal of Mathematics, Engineering, Natural and Medical Sciences ◽

10.38065/euroasiaorg.762 ◽

2021 ◽

Vol 8 (18) ◽

pp. 78-82

Author(s):

Ahmet ÇELİK

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Method ◽

Data Set ◽

Intelligent Machines ◽

Weight Parameter ◽

Living Things ◽

Learning Success ◽

Similarities And Differences ◽

Two Parameters

People learn by examining, observing and researching their environment. They actually gains experience from what they have learned. By using the experience they have gained, they can adapt to the new situation they encounter and make decisions. People always make decisions by comparing their previous knowledge while describing objects and classifying them. Similarities and differences to previously learned objects are very effective in decision making. It has been shown in the studies that the experiential learning method can also be used on machines. Intelligent machines and devices that use machine learning methods in their structure are widely used in many areas. Machine learning can be performed using different algorithms. These algorithms use the attributes of the objects in the data set when making decisions. Similarities and differences in the attributes of objects are obtained by comparing them with previous experiences. As a result of the comparison, a decision is made and predictions are made about the classes of the objects. In this study, kNN machine learning algorithm, which is a supervised learning method, was used on the Zoo dataset. In this data set, there are attributes of common living things. By using these attributes, the classes of living things in the data set are determined. The “k” neighbor value and weight parameter selected in the kNN algorithm affect the learning success. In this study, the effect of two parameters used in the kNN algorithm on learning success is shown. According to the results obtained, the "k=1" neighbor value and the "Distance Weight" parameter were selected and the highest success result was obtained.

Water Quality Prediction Using Statistical Tool and Machine Learning Algorithm

Waste Management ◽

10.4018/978-1-7998-1210-4.ch029 ◽

2020 ◽

pp. 609-623

Author(s):

Arun Kumar Beerala ◽

Gobinath R. ◽

Shyamala G. ◽

Siribommala Manvitha

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Training Data ◽

Machine Learning Techniques ◽

Statistical Tool ◽

Data Set ◽

Water Quality Prediction ◽

Living Things ◽

Sampling Locations ◽

Different Seasons

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.

Early prediction of chronic disease using an efficient machine learning algorithm through adaptive probabilistic divergence based feature selection approach

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-04-2020-0018 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 2

Author(s):

Sandeepkumar Hegde ◽

Monica R. Mundada

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Chronic Disease ◽

Learning Algorithm ◽

Predictive Ability ◽

Experimental Result ◽

Data Set ◽

Content Type ◽

Learning Classifier ◽

Feature Selection Approach

Purpose According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease. Design/methodology/approach A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases. Findings The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6. Originality/value The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

A SYNTHETIC DATA SET OF 3D OOCYTE IMAGES AND MACHINE LEARNING ALGORITHM AS A MODEL TO ASSESS THE REPRODUCTIVE POTENTIAL OF OOCYTES

Fertility and Sterility ◽

10.1016/j.fertnstert.2020.08.424 ◽

2020 ◽

Vol 114 (3) ◽

pp. e145

Author(s):

Gerard Letterie ◽

Nathan Kundtz

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Reproductive Potential ◽

Synthetic Data ◽

Machine Learning Algorithm ◽

Data Set

Plural marking patterns of nouns and their associates in the world’s languages

Studies in Language ◽

10.1075/sl.16001.che ◽

2020 ◽

Vol 44 (1) ◽

pp. 231-269

Author(s):

Rong Chen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Set ◽

Component Structure ◽

Universal Distribution ◽

Two Component ◽

The World ◽

Plural Marking

Abstract Plural marking reaches most corners of languages. When a noun occurs with another linguistic element, which is called associate in this paper, plural marking on the two-component structure has four logically possible patterns: doubly unmarked, noun-marked, associate-marked and doubly marked. These four patterns do not distribute homogeneously in the world’s languages, because they are motivated by two competing motivations iconicity and economy. Some patterns are preferred over others, and this preference is consistently found in languages across the world. In other words, there exists a universal distribution of the four plural marking patterns. Furthermore, holding the view that plural marking on associates expresses plurality of nouns, I propose a hypothetical universal which uses the number of pluralized associates to predict plural marking on nouns. A data set collected from a sample of 100 languages is used to test the hypothetical universal, by employing the machine learning algorithm logistic regression.