Predicting stroke risk by Migraine using AI

Stroke is a blood clot or bleeds in the brain, which can make permanent damage that has an effect on mobility, cognition, sight or communication. It is the second leading cause of death worldwide and one of the most life- threatening diseases for persons above 65 years. It damages the brain like “heart attack” which damages the heart. Every 4 minutes someone dies of stroke, but up to 80% of stroke can be prevented if we can identify or predict the occurrence of stroke in its early stage. In this paper, I used different types of machine learning algorithms for stroke prediction on the Healthcare Dataset Stroke data. Four types of machine learning classification algorithms were applied; Linear Regression, Confusion matrices, Random Forest Classifier, and Logistic Regression were used to build the stroke prediction model. Support, Precision, Recall, and F1-score were used to calculate performance measures of machine learning models. The results showed that Random Forest Classifier has achieved the best accuracy at 94 % [1].

Download Full-text

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Download Full-text

A Comparative Study using Feature Selection to Predict the Behaviour of Bank Customers

E3S Web of Conferences ◽

10.1051/e3sconf/202018401011 ◽

2020 ◽

Vol 184 ◽

pp. 01011

Author(s):

Sreethi Musunuru ◽

Mahaalakshmi Mukkamala ◽

Latha Kunaparaju ◽

N V Ganapathi Raju

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Random Forest Classifier ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

The Status ◽

Personal Level ◽

Near Future ◽

Structure Communication

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.

Download Full-text

Effectiveness of Classification Methods on the Diabetes System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v12i330287 ◽

2021 ◽

pp. 33-43

Author(s):

Ahmed T. Shawky ◽

Ismail M. Hagag

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Early Stage ◽

Research Paper ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Machine Learning Classification ◽

Bayes Algorithm

In today’s world using data mining and classification is considered to be one of the most important techniques, as today’s world is full of data that is generated by various sources. However, extracting useful knowledge out of this data is the real challenge, and this paper conquers this challenge by using machine learning algorithms to use data for classifiers to draw meaningful results. The aim of this research paper is to design a model to detect diabetes in patients with high accuracy. Therefore, this research paper using five different algorithms for different machine learning classification includes, Decision Tree, Support Vector Machine (SVM), Random Forest, Naive Bayes, and K- Nearest Neighbor (K-NN), the purpose of this approach is to predict diabetes at an early stage. Finally, we have compared the performance of these algorithms, concluding that K-NN algorithm is a better accuracy (81.16%), followed by the Naive Bayes algorithm (76.06%).

Download Full-text

Predicting Fitness Centre Dropout

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph181910465 ◽

2021 ◽

Vol 18 (19) ◽

pp. 10465

Author(s):

Pedro Sobreiro ◽

Pedro Guedes-Carvalho ◽

Abel Santos ◽

Paulo Pinheiro ◽

Celina Gonçalves

Keyword(s):

Machine Learning ◽

Random Forest ◽

Length Of Stay ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Registration Data ◽

Relevant Variables ◽

Overall Performance ◽

Number Of Visits

The phenomenon of dropout is often found among customers of sports services. In this study we intend to evaluate the performance of machine learning algorithms in predicting dropout using available data about their historic use of facilities. The data relating to a sample of 5209 members was taken from a Portuguese fitness centre and included the variables registration data, payments and frequency, age, sex, non-attendance days, amount billed, average weekly visits, total number of visits, visits hired per week, number of registration renewals, number of members referrals, total monthly registrations, and total member enrolment time, which may be indicative of members’ commitment. Whilst the Gradient Boosting Classifier had the best performance in predicting dropout (sensitivity = 0.986), the Random Forest Classifier was the best at predicting non-dropout (specificity = 0.790); the overall performance of the Gradient Boosting Classifier was superior to the Random Forest Classifier (accuracy 0.955 against 0.920). The most relevant variables predicting dropout were “non-attendance days”, “total length of stay”, and “total amount billed”. The use of decision trees provides information that can be readily acted upon to identify member profiles of those at risk of dropout, giving also guidelines for measures and policies to reduce it.

Download Full-text

Heart Failure Detection Using Quantum-Enhanced Machine Learning and Traditional Machine Learning Techniques for Internet of Artificially Intelligent Medical Things

Wireless Communications and Mobile Computing ◽

10.1155/2021/1616725 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Yogesh Kumar ◽

Apeksha Koul ◽

Pushpendra Singh Sisodia ◽

Jana Shafi ◽

Verma Kavita ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Random Forest ◽

Learning Algorithms ◽

Failure Detection ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Research Progress ◽

Record Management

Quantum-enhanced machine learning plays a vital role in healthcare because of its robust application concerning current research scenarios, the growth of novel medical trials, patient information and record management, procurement of chronic disease detection, and many more. Due to this reason, the healthcare industry is applying quantum computing to sustain patient-oriented attention to healthcare patrons. The present work summarized the recent research progress in quantum-enhanced machine learning and its significance in heart failure detection on a dataset of 14 attributes. In this paper, the number of qubits in terms of the features of heart failure data is normalized by using min-max, PCA, and standard scalar, and further, has been optimized using the pipelining technique. The current work verifies that quantum-enhanced machine learning algorithms such as quantum random forest (QRF), quantum K nearest neighbour (QKNN), quantum decision tree (QDT), and quantum Gaussian Naïve Bayes (QGNB) are better than traditional machine learning algorithms in heart failure detection. The best accuracy rate is (0.89), which the quantum random forest classifier attained. In addition to this, the quantum random forest classifier also incurred the best results in F 1 score, recall and, precision by (0.88), (0.93), and (0.89), respectively. The computation time taken by traditional and quantum-enhanced machine learning algorithms has also been compared where the quantum random forest has the least execution time by 150 microseconds. Hence, the work provides a way to quantify the differences between standard and quantum-enhanced machine learning algorithms to select the optimal method for detecting heart failure.

Download Full-text

Implementation of the QoS framework using fog computing to predict COVID-19 disease at early stage

World Journal of Engineering ◽

10.1108/wje-12-2020-0636 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Prabhdeep Singh ◽

Rajbir Kaur

Keyword(s):

Random Forest ◽

Similarity Measure ◽

Early Stage ◽

Fog Computing ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Content Type ◽

Delay Sensitive ◽

Network Usage ◽

Quality Of Service Parameters

Purpose The purpose of this paper is to provide more accurate structure that allows the estimation of coronavirus (COVID-19) at a very early stage with ultra-low latency. The machine learning algorithms are used to evaluate the past medical details of the patients and forecast COVID-19 positive cases, which can aid in lowering costs and distinctively enhance the standard of treatment at hospitals. Design/methodology/approach In this paper, artificial intelligence (AI) and cloud/fog computing are integrated to strengthen COVID-19 patient prediction. A delay-sensitive efficient framework for the prediction of COVID-19 at an early stage is proposed. A novel similarity measure-based random forest classifier is proposed to increase the efficiency of the framework. Findings The performance of the framework is checked with various quality of service parameters such as delay, network usage, RAM usages and energy consumption, whereas classification accuracy, recall, precision, kappa static and root mean square error is used for the proposed classifier. Results show the effectiveness of the proposed framework. Originality/value AI and cloud/fog computing are integrated to strengthen COVID-19 patient prediction. A novel similarity measure-based random forest classifier with more than 80% accuracy is proposed to increase the efficiency of the framework.

Download Full-text

A Study of Machine Learning Algorithms for DDoS Detection

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.34922 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 174-178

Author(s):

Sheikh Shehzad Ahmed

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Random Forest Classifier ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Ddos Attacks ◽

Decision Tree Classifier ◽

Tree Classifier

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.

Download Full-text

Human activity recognition based on machine learning classification of smartwatch accelerometer dataset

FME Transaction ◽

10.5937/fme2101225r ◽

2021 ◽

Vol 49 (1) ◽

pp. 225-232

Author(s):

Dušan Radivojević ◽

Nikola Mirkov ◽

Slobodan Maletić

Keyword(s):

Machine Learning ◽

Random Forest ◽

Time Series Data ◽

Test Group ◽

Wearable Devices ◽

Random Forest Classifier ◽

Series Data ◽

Machine Learning Classification ◽

Improving Accuracy

This paper presents two Machine Learning models that classify time series data given from smartwatch accelerometer of observed subjects. For the purpose of classification we use Deep Neural Network and Random Forest classifier algorithms. The comparison of both models shows that they have similar performance with regard to recognition of subject's activities that are used in the test group of the dataset. Training accuracy reaches approximately 95% and 100% for Deep Learning and Random Forest model respectively. Since the validation and recognition, reached about 81% and 75% respectively, a tendency for improving accuracy as a function of number of participants is considered. The influence of data sample precision to the accuracy of the models is examined since the input data could be given from various wearable devices.

Download Full-text

Chronic Kidney Disease Diagnosis using Decision Tree Algorithms

10.21203/rs.3.rs-34685/v1 ◽

2020 ◽

Author(s):

Hamida Ilyas ◽

Sajid Ali ◽

Mahvish Ponum ◽

Osman Hasan ◽

Muhammad Tahir Mahmood

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Early Detection ◽

Early Stage ◽

Disease Diagnosis ◽

Automated System ◽

Disease Epidemiology ◽

Machine Learning Classification

Abstract Chronic Kidney Disease (CKD), i.e., gradual decrease in the renal function spanning over a duration of several months to years without any major symptoms, is a life-threatening disease. It progresses in six stages according to the severity level. It is categorized into various stages based on the Glomerular Filtration Rate (GFR), which in turn utilizes several attributes, like age, sex, race and Serum Creatinine. Among multiple available models for estimating GFR value, Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI), which is a linear model, has been found to be quite efficient because it allows detecting all CKD stages i.e., early stage to the last stage of kidney failure. Early detection and cure of CKD is extremely desirable as it can lead to the prevention of unwanted consequences. Machine learning are being extensively advocated for early detection of symptoms and diagnosis of several diseases recently. With the same motivation, the aim of this study is to predict the various stages of CKD using machine learning classification algorithms on the dataset obtained from the medical records of affected people. In particular, we have used the Random Forest and J48 algorithms to obtain a sustainable and practicable model to detect various stages of CKD with comprehensive medical accuracy. Comparative analysis of the results revealed that J48 predicted CKD in all stages better than random forest with a 85.5% accuracy. The study also showed that J48 shows improved performance over Random Forest, so, it may be used to build an automated system for the detection of severity of CKD.

Download Full-text

Machine Learning for Identifying Relevance to Biosurveillance in Multilingual Text

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8375 ◽

2018 ◽

Vol 10 (1) ◽

Author(s):

Qiaochu Chen ◽

Lauren E Charles

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Measure ◽

Foreign Language ◽

Topic Modeling ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Word Embeddings ◽

Online Data ◽

Chinese Model

Objective: The objective is to develop an ensemble of machine learning algorithms to identify multilingual, online articles that are relevant to biosurveillance. Language morphology varies widely across languages and must be accounted for when designing algorithms. Here, we compare the performance of a word embedding-based approach and a topic modeling approach with machine learning algorithms to determine the best method for Chinese, Arabic, and French languages.Introduction: Global biosurveillance is an extremely important, yet challenging task. One form of global biosurveillance comes from harvesting open source online data (e.g. news, blogs, reports, RSS feeds). The information derived from this data can be used for timely detection and identification of biological threats all over the world. However, the more inclusive the data harvesting procedure is to ensure that all potentially relevant articles are collected, the more data that is irrelevant also gets harvested. This issue can become even more complex when the online data is in a non-native language. Foreign language articles not only create language-specific issues for Natural Language Processing (NLP), but also add a significant amount of translation costs. Previous work shows success in the use of combinatory monolingual classifiers in specific applications, e.g., legal domain [1]. A critical component for a comprehensive, online harvesting biosurveillance system is the capability to identify relevant foreign language articles from irrelevant ones based on the initial article information collected, without the additional cost of full text retrieval and translation.Methods: The analysis text dataset contains the title and brief description of 3506 online articles in Chinese, Arabic, and French languages from the date range of August, 17, 2016 to July 5, 2017. The NLP article pre-processing steps are language-specific tokenization and stop words removal. We compare two different approaches: word embeddings and topic modeling (Fig. 1). For word embeddings, we first generate word vectors for the data using a pretrained Word2Vec (W2V) model [2]. Subsequently, the word vectors within a document are averaged to produce a single feature vector for the document. Then, we fit a machine learning algorithm (random forest classifier or Support Vector Machine (SVM)) to the training vectors and get predictions for the test documents. For topic modelling, we used a Latent Dirichlet Allocation (LDA) model to generate five topics for all relevant documents [3]. For each new document, the output is the probability measure for the document belonging to these five topics. Here, we classify the new document by comparing the probability measure with a relevancy threshold.Results: The Word2Vec model combined with a random forest classifier outperformed the other approaches across the three languages (Fig. 2); the Chinese model has an 89% F1-score, the Arabic model has 86%, and the French model has 94%. To decrease the chance of calling a potentially relevant article irrelevant, high recall was more important than high precision. In the Chinese model, the Word2Vec with a random forest approach had the highest recall at 98% (Table 1).Conclusions: We present research findings on different approaches of relevance to biosurveillance identification on non-English texts and identify the best performing methods for implementation into a biosurveillance online article harvesting system. Our initial results suggest that the word embeddings model has an advantage over topic modeling, and the random forest classifier outperforms the SVM. Directions for future work will aim to further expand the list of languages and methods to be compared, e.g., n-grams and non-negative matrix factorization. In addition, we will fine-tune the Arabic and French model for better accuracy results.

Download Full-text