Child Behavioral Analysis: Machine Learning based Investigation for Autism Screening and Early Diagnosis

Autism is a developmental disorder which affects cognition, social and behavioural functionalities of a person. When a person is affected by autism spectrum disorder, he/she will exhibit peculiar behaviours and those symptoms initiate from that patient’s childhood. Early diagnosis of autism is an important and challenging task. Behavioural analysis a well known therapeutic practice can be adopted for earlier diagnosis of autism. Machine learning is a computational methodology, which can be applied to a wide range of applications in-order to obtain efficient outputs. At present machine learning is especially applied in medical applications such as disease prediction. In our study we evaluated various machine learning algorithms [(Naive bayes (NB), Support Vector Machines (SVM) and k-Nearest Neighbours (KNN)] with “k-fold” based cross validation for 3 datasets retrieved from the UCI repository. Additionally we validated the effective accuracy of the estimated results using a clustered cross validation strategy. The process of employing the clustered cross validation scrutinises the parameters which contributes more importance in the dataset. The strategy induces hyper parameter tuning which yields trusted results as it involves double validation. On application of the clustered cross validation for a SVM based model, we obtained an accuracy of 99.6% accuracy for autism child dataset.

Download Full-text

Autism Spectrum Disorder Detection with Machine Learning Methods

Current Psychiatry Research and Reviews ◽

10.2174/2666082215666191111121115 ◽

2020 ◽

Vol 15 (4) ◽

pp. 297-308 ◽

Cited By ~ 3

Author(s):

Uğur Erkan ◽

Dang N.H. Thanh

Keyword(s):

Machine Learning ◽

Early Diagnosis ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Support Vector ◽

Classification Methods ◽

Brain Images ◽

Machine Method ◽

Nearest Neighbours ◽

Number Of Patients

Background: Autistic Spectrum Disorder (ASD) is a disorder associated with genetic and neurological components leading to difficulties in social interaction and communication. According to statistics of WHO, the number of patients diagnosed with ASD is gradually increasing. Most of the current studies focus on clinical diagnosis, data collection and brain images analysis, but do not focus on the diagnosis of ASD based on machine learning. Objective: This study aims to classify ASD data to provide a quick, accessible and easy way to support early diagnosis of ASD. Methods: Three ASD datasets are used for children, adolescences and adults. To classify the ASD data, we used the k-Nearest Neighbours method (kNN), the Support Vector Machine method (SVM) and the Random Forests method (RF). In our experiments, the data was randomly split into training and test sets. The parts of the data were randomly selected 100 times to test the classification methods. Results: The final results were assessed by the average values. It is shown that SVM and RF are effective methods for ASD classification. In particular, the RF method classified the data with an accuracy of 100% for all above datasets. Conclusion: The early diagnosis of ASD is critical. If the number of data samples is large enough, we can achieve a high accuracy for machine learning-based ASD diagnosis. Among three classification methods, RF achieves the best performance for ASD data classification.

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

JMIR Medical Informatics ◽

10.2196/16042 ◽

2020 ◽

Vol 8 (1) ◽

pp. e16042

Author(s):

Emily R Pfaff ◽

Miles Crosskey ◽

Kenneth Morton ◽

Ashok Krishnamurthy

Keyword(s):

Machine Learning ◽

Clinical Features ◽

Language Processing ◽

Predictive Value ◽

Cross Validation ◽

Uterine Fibroids ◽

Machine Learning Algorithms ◽

Support Vector ◽

Nonalcoholic Fatty Liver ◽

Clinical Annotation

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.

Download Full-text

Assessment of Machine Learning Algorithms for Prediction of Breast Cancer Malignancy Based on Mammogram Numeric Data

10.1101/2020.01.08.20016949 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter T. Habib ◽

Alsamman M. Alsamman ◽

Sameh E. Hassnein ◽

Ghada A. Shereif ◽

Aladdin Hamwieh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Adjusted Rand Index ◽

Support Vector ◽

Cancer Information ◽

Term Care

Abstractin 2019, estimated New Cases 268.600, Breast cancer has one of the most common cancers and is one of the world’s leading causes of death for women. Classification and data mining is an efficient way to classify information. Particularly in the medical field where prediction techniques are commonly used for early detection and effective treatment in diagnosis and research.These paper tests models for the mammogram analysis of breast cancer information from 23 of the more widely used machine learning algorithms such as Decision Tree, Random forest, K-nearest neighbors and support vector machine. The spontaneously splits results are distributed from a replicated 10-fold cross-validation method. The accuracy calculated by Regression Metrics such as Mean Absolute Error, Mean Squared Error, R2 Score and Clustering Metrics such as Adjusted Rand Index, Homogeneity, V-measure.accuracy has been checked F-Measure, AUC, and Cross-Validation. Thus, proper identification of patients with breast cancer would create care opportunities, for example, the supervision and the implementation of intervention plans could benefit the quality of long-term care. Experimental results reveal that the maximum precision 100%with the lowest error rate is obtained with Ada-boost Classifier.

Download Full-text

Efficient Machine Learning Techniques to Diagnose and Predict Alzheimer’s disease

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6508.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3953-3960

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Early Diagnosis ◽

Image Data ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Mechanisms ◽

Learning Techniques

Recent research in computational engineering have evidenced the design and development numerous intelligent models to analyze medical data and derive inferences related to early diagnosis and prediction of disease severity. In this context, prediction and diagnosis of fatal neurodegenerative diseases that comes under the class of dementia from medical image data is considered as the challenging area of research for many researchers. Recently Alzheimer’s disease is considered as major category of dementia that affects major population. Despite of the development of numerous machine learning models for early diagnosis of Alzheimer’s disease, it is observed that there is a lot more scope of research. Addressing the same, this article presents a systematic literature review of machine learning techniques developed for early diagnosis of Alzheimer’s disease. Furthermore this article includes major categories of machine learning algorithms that include artificial neural networks, Support vector machines and Deep learning based ensemble models that helps the budding researchers to explore the scope of research in predicting Alzheimer’s disease. Implementation results depict the comparative analysis of state of art machine learning mechanisms.

Download Full-text

An efficient hybrid system for anomaly detection in social networks

Cybersecurity ◽

10.1186/s42400-021-00074-w ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Md. Shafiur Rahman ◽

Sajal Halder ◽

Md. Ashraf Uddin ◽

Uzzal Kumar Acharjee

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Network ◽

Anomaly Detection ◽

Research Area ◽

Machine Learning Algorithms ◽

Security And Privacy ◽

Support Vector ◽

The Social ◽

Wide Range

AbstractAnomaly detection has been an essential and dynamic research area in the data mining. A wide range of applications including different social medias have adopted different state-of-the-art methods to identify anomaly for ensuring user’s security and privacy. The social network refers to a forum used by different groups of people to express their thoughts, communicate with each other, and share the content needed. This social networks also facilitate abnormal activities, spread fake news, rumours, misinformation, unsolicited messages, and propaganda post malicious links. Therefore, detection of abnormalities is one of the important data analysis activities for the identification of normal or abnormal users on the social networks. In this paper, we have developed a hybrid anomaly detection method named DT-SVMNB that cascades several machine learning algorithms including decision tree (C5.0), Support Vector Machine (SVM) and Naïve Bayesian classifier (NBC) for classifying normal and abnormal users in social networks. We have extracted a list of unique features derived from users’ profile and contents. Using two kinds of dataset with the selected features, the proposed machine learning model called DT-SVMNB is trained. Our model classifies users as depressed one or suicidal one in the social network. We have conducted an experiment of our model using synthetic and real datasets from social network. The performance analysis demonstrates around 98% accuracy which proves the effectiveness and efficiency of our proposed system.

Download Full-text

Using Machine Learning to Classify Music Genre

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38365 ◽

2021 ◽

Vol 9 (10) ◽

pp. 39-44

Author(s):

Rachaell Nihalaani

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Support Vector ◽

Art Form ◽

Genre Classification ◽

Machine Learning Model ◽

Nearest Neighbours ◽

Music Genre ◽

The Mind ◽

Music Genre Classification

Abstract: As Plato once rightfully said, ‘Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything.’ Music has always been an important art form, and more so in today’s science-driven world. Music genre classification paves the way for other applications such as music recommender models. Several approaches could be used to classify music genres. In this literature, we aimed to build a machine learning model to classify the genre of an input audio file using 8 machine learning algorithms and determine which algorithm is the best suitable for genre classification. We have obtained an accuracy of 91% using the XGBoost algorithm. Keywords: Machine Learning, Music Genre Classification, Decision Trees, K Nearest Neighbours, Logistic regression, Naïve Bayes, Neural Networks, Random Forest, Support Vector Machine, XGBoost

Download Full-text

Detecting Spam Messages in Twitter Data by Machine learning Algorithms using Cross Validation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1913.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2941-2946

Keyword(s):

Machine Learning ◽

Social Media ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Human Relations ◽

Detection Model ◽

Social Media Networks ◽

Twitter Data

Now a day’s human relations are maintained by social media networks. Traditional relationships now days are obsolete. To maintain in association, sharing ideas, exchange knowledge between we use social media networking sites. Social media networking sites like Twitter, Facebook, LinkedIn etc are available in the communication environment. Through Twitter media users share their opinions, interests, knowledge to others by messages. At the same time some of the user’s misguide the genuine users. These genuine users are also called solicited users and the users who misguidance are called spammers. These spammers post unwanted information to the non spam users. The non spammers may retweet them to others and they follow the spammers. To avoid this spam messages we propose a methodology by us using machine learning algorithms. To develop our approach used a set of content based features. In spam detection model we used Support vector machine algorithm(SVM) and Naive bayes classification algorithm. To measure the performance of our model we used precision, recall and F measure metrics.

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090382 ◽

2019 ◽

Vol 8 (9) ◽

pp. 382 ◽

Cited By ~ 2

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarria ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Air Temperature ◽

Cross Validation ◽

Daily Basis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text

A Comparative Analysis and Predicting for Breast Cancer Detection Based on Data Mining Models

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v8i430209 ◽

2021 ◽

pp. 45-59

Author(s):

Shler Farhad Khorshid ◽

Adnan Mohsin Abdulazeez ◽

Amira Bibo Sallow

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Nearest Neighbors ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Wide Range

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.

Download Full-text