A machine learning driven approach to improve efficiency of classification algorithm using prediction of affliction

V Sumalatha; Dr R.Santhi

doi:10.14419/ijet.v7i2.33.13887

A machine learning driven approach to improve efficiency of classification algorithm using prediction of affliction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.33.13887 ◽

2018 ◽

Vol 7 (3.3) ◽

pp. 206

Author(s):

V Sumalatha ◽

Dr R.Santhi

Keyword(s):

Machine Learning ◽

Language Processing ◽

Large Data ◽

Classification Algorithm ◽

Mathematical Algorithm ◽

Wide Range ◽

Predictor Model ◽

Using Data ◽

Learning Software

Machine learning plays a key role in a wide range of applications such as data mining, natural language processing and expert systems. It provides a solution in all domains for further development when large data is applied. Supervised learning is consist of mathematical algorithm to optimize the functions with given inputs. Machine learning solves problems that cannot be solved by numerical values. In this research paper, a model is developed to improve classification algorithm using anxiety of juvenile. Prediction and classification are made using data. A machine learning tool is used for pre-processing and first level of model is data preparation and ranking prototype used for filtration of data . Then Probabilistic estimation hypothesis is to find the hypothesis value based on statistical functions and classification of anxiety predictor model is used for prediction and classification. Comparison of Algorithm and experimental are done using machine learning software. According to the experiment, the model is more efficient and accurate compared with other classification algorithm as results shown.

Download Full-text

Using of data science in healthcare

Problems of Innovation and Investment Development ◽

10.33813/2224-1213.24.2021.15 ◽

2021 ◽

pp. 149-156

Author(s):

Ihor Ponomarenko ◽

Oleksandra Lubkovska

Keyword(s):

Machine Learning ◽

Health Care ◽

Business Intelligence ◽

Data Science ◽

Large Data ◽

Science Methods ◽

Medical Field ◽

Learning Methods ◽

Machine Learning Methods ◽

Using Data

The subject of the research is the approach to the possibility of using data science methods in the field of health care for integrated data processing and analysis in order to optimize economic and specialized processes The purpose of writing this article is to address issues related to the specifics of the use of Data Science methods in the field of health care on the basis of comprehensive information obtained from various sources. Methodology. The research methodology is system-structural and comparative analyzes (to study the application of BI-systems in the process of working with large data sets); monograph (the study of various software solutions in the market of business intelligence); economic analysis (when assessing the possibility of using business intelligence systems to strengthen the competitive position of companies). The scientific novelty the main sources of data on key processes in the medical field. Examples of innovative methods of collecting information in the field of health care, which are becoming widespread in the context of digitalization, are presented. The main sources of data in the field of health care used in Data Science are revealed. The specifics of the application of machine learning methods in the field of health care in the conditions of increasing competition between market participants and increasing demand for relevant products from the population are presented. Conclusions. The intensification of the integration of Data Science in the medical field is due to the increase of digitized data (statistics, textual informa- tion, visualizations, etc.). Through the use of machine learning methods, doctors and other health professionals have new opportunities to improve the efficiency of the health care system as a whole. Key words: Data science, efficiency, information, machine learning, medicine, Python, healthcare.

Download Full-text

Classification of Fake Product Ratings Using a Timeline Based Approach

International Journal of Business Administration and Management Research ◽

10.24178/ijbamr.2017.3.2.12 ◽

2017 ◽

Vol 3 (2) ◽

pp. 12 ◽

Cited By ~ 1

Author(s):

Neha Thomas ◽

Susan Elias

Keyword(s):

Language Processing ◽

Opinion Mining ◽

Optimal Point ◽

Linear Classifiers ◽

Wide Range ◽

Text Content ◽

Classification Tool ◽

Fake Reviews ◽

Product Ratings

Abstract— Detection of fake review and reviewers is currently a challenging problem in cyber space. It is challenging primarily due to the dynamic nature of the methodology used to fake the review. There are several aspects to be considered when analyzing reviews to classify them effective into genuine and fake. Sentiment analysis, opinion mining and intend mining are fields of research that try to accomplish the goal through Natural Language Processing of the text content of the review. In this paper, an approach that uses the review ratings evaluated along a timeline is presented. An Amazon dataset comprising of ratings indicated for a wide range of products was used for the analysis presented here. The analysis of the ratings was carried out for an electronic product over a period of six years. The computed average rating helps to identify linear classifiers that define solution boundaries within the dataspace. This enables a product specific classification of review ratings and suitable recommendations can also be generated automatically. The paper explains a methodology to evaluate the average product ratings over time and presents the research outcomes using a novel classification tool. The proposed approach helps to determine the optimal point to distinguish between fake and genuine ratings for each product. Index Terms: Fake reviews, Fake Ratings, Product Ratings, Online Shopping, Amazon Dataset.

Download Full-text

A Survey on Intelligence Tools for Data Analytics

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch005 ◽

2021 ◽

pp. 73-95

Author(s):

Shatakshi Singh ◽

Kanika Gautam ◽

Prachi Singhal ◽

Sunil Kumar Jangir ◽

Manish Kumar

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Language Processing ◽

Real Life ◽

Learning Tools ◽

The Core ◽

Training Mode ◽

Real Life Situation ◽

Selection Of

The recent development in artificial intelligence is quite astounding in this decade. Especially, machine learning is one of the core subareas of AI. Also, ML field is an incessantly growing along with evolution and becomes a rise in its demand and importance. It transmogrified the way data is extracted, analyzed, and interpreted. Computers are trained to get in a self-training mode so that when new data is fed they can learn, grow, change, and develop themselves without explicit programming. It helps to make useful predictions that can guide better decisions in a real-life situation without human interference. Selection of ML tool is always a challenging task, since choosing an appropriate tool can end up saving time as well as making it faster and easier to provide any solution. This chapter provides a classification of various machine learning tools on the following aspects: for non-programmers, for model deployment, for Computer vision, natural language processing, and audio for reinforcement learning and data mining.

Download Full-text

Using Data Mining to Predict Possible Future Depression Cases

International Journal of Public Health Science (IJPHS) ◽

10.11591/.v3i4.4697 ◽

2014 ◽

Vol 3 (4) ◽

pp. 231

Author(s):

Kevin Daimi ◽

Shadi Banitaan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Synthetic Data ◽

Considerable Difficulty ◽

Symptoms Of Depression ◽

Using Data ◽

Learning Software

Depression is a disorder characterized by misery and gloominess felt over a period of time. Some symptoms of depression overlap with somatic illnesses implying considerable difficulty in diagnosing it. This paper contributes to its diagnosis through the application of data mining, namely classification, to predict patients who will most likely develop depression or are currently suffering from depression. Synthetic data is used for this study. To acquire the results, the popular suite of machine learning software, WEKA, is used.

Download Full-text

HOMPer: A new hybrid system for opinion mining in the Persian language

Journal of Information Science ◽

10.1177/0165551519827886 ◽

2019 ◽

Vol 46 (1) ◽

pp. 101-117 ◽

Cited By ~ 3

Author(s):

Mohammad Ehsan Basiri ◽

Arman Kabiri

Keyword(s):

Machine Learning ◽

Language Processing ◽

Opinion Mining ◽

Feature Selection Method ◽

Large Data ◽

Data Set ◽

Persian Language ◽

Rating Prediction ◽

Bayes Algorithm ◽

Component Feature

Opinion mining is a subfield of data mining and natural language processing that concerns with extracting users’ opinion and attitude towards products or services from their comments on the Web. Persian opinion mining, in contrast to its counterpart in English, is a totally new field of study and hence, it has not received the attention it deserves. Existing methods for opinion mining in the Persian language may be classified into machine learning– and lexicon-based approaches. These methods have been proposed and successfully used for polarity-detection problem. However, when they should be used for more complex tasks like rating prediction, their results are not desirable. In this study, first an exhaustive investigation of machine learning– and lexicon-based methods is performed. Then, a new hybrid method is proposed for rating-prediction problem in the Persian language. Finally, the effect of machine learning component, feature-selection method, normalisation method and combination level are investigated. The experimental results on a large data set containing 16,000 Persian customers’ review show that this proposed system achieves higher performance in comparison to Naïve Bayes algorithm and a pure lexicon-based method. Moreover, results demonstrate that this proposed method may also be successfully used for polarity detection.

Download Full-text

Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records

Upsala Journal of Medical Sciences ◽

10.1080/03009734.2020.1792010 ◽

2020 ◽

Vol 125 (4) ◽

pp. 316-324

Author(s):

Andrea Caccamisi ◽

Leif Jørgensen ◽

Hercules Dalianis ◽

Mats Rosenlund

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Smoking Status ◽

Automatic Extraction

Download Full-text

Using Data Science Software to Address Health Disparities

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.20210701.oa4 ◽

2021 ◽

Vol 6 (2) ◽

pp. 45-58

Author(s):

Jose O. Huerta ◽

Gayle L. Prybutok ◽

Victor R. Prybutok

Keyword(s):

Machine Learning ◽

Health Disparities ◽

Data Science ◽

Science Technology ◽

Science Operations ◽

Using Data ◽

Learning Software ◽

Computational Processes

The article assesses data science software to evaluate the usefulness of data science technology in addressing concerns such as health disparities. Data science software was analyzed using KDnuggets data related to analytics, data science, and machine learning software. Data science functionalities include computational processes and frameworks that are relevant for healthcare. This study demonstrates the importance of leading applications for conducting data science operations that can improve care in healthcare networks by addressing such factors as health disparities.

Download Full-text

Prediction of Misclassification Data using Cognitive Bayes Computation Techniques (COBACO)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c7975.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 928-932

Keyword(s):

Machine Learning ◽

Missing Data ◽

Large Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Accuracy Rate ◽

Data Set ◽

Predictive Values ◽

Time Operation

Missing data arise major issues in the large database regarding quantitative analysis. Due to this issues, the inference of the computational process produce bias results, more damage of data, the error rate can increase, and more difficult to accomplish the process of imputation. Prediction of disguised missing data occurs in the large data sets are another major problems in real time operation. Machine learning (ML) techniques to connect with the classification of measurement to enforce the accuracy rate of predictive values. These techniques overcome the various challenges to the problem of losing data. Recent work based on the prediction of misclassification using supervised ML approach; to predict an output for an unseen input with limited parameters in a data set. When increase the size of parameter, then it generates the outcome of less accuracy rate. This article presented a new approach COBACO, an effective supervised machine learning technique. Several strategies describe the classification of predictive techniques for missing data analysis in efficient supervised machine learning techniques. The proposed predictive techniques COBACO generated more precise, accurate results than the other predictive approaches. The Experimental results obtained using both real and synthetic data set show that the proposed approach offers a valuable and promising insight to the problem of prediction of missing information.

Download Full-text

Classification of Photogrammetric and Airborne LiDAR Point Clouds Using Machine Learning Algorithms

Drones ◽

10.3390/drones5040104 ◽

2021 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Zaide Duran ◽

Kubra Ozcan ◽

Muhammed Enes Atik

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Point Clouds ◽

Machine Learning Algorithms ◽

Airborne Lidar ◽

Aerial Photogrammetry ◽

Feature Spaces ◽

Wide Range ◽

Extract Information

With the development of photogrammetry technologies, point clouds have found a wide range of use in academic and commercial areas. This situation has made it essential to extract information from point clouds. In particular, artificial intelligence applications have been used to extract information from point clouds to complex structures. Point cloud classification is also one of the leading areas where these applications are used. In this study, the classification of point clouds obtained by aerial photogrammetry and Light Detection and Ranging (LiDAR) technology belonging to the same region is performed by using machine learning. For this purpose, nine popular machine learning methods have been used. Geometric features obtained from point clouds were used for the feature spaces created for classification. Color information is also added to these in the photogrammetric point cloud. According to the LiDAR point cloud results, the highest overall accuracies were obtained as 0.96 with the Multilayer Perceptron (MLP) method. The lowest overall accuracies were obtained as 0.50 with the AdaBoost method. The method with the highest overall accuracy was achieved with the MLP (0.90) method. The lowest overall accuracy method is the GNB method with 0.25 overall accuracy.

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text