Using Machine Learning Approaches to Explore Non-Cognitive Variables Influencing Reading Proficiency in English among Filipino Learners

Allan B. I. Bernardo; Macario O. Cordel; Rochelle Irene G. Lucas; Jude Michael M. Teves; Sashmir A. Yap; Unisse C. Chua

doi:10.3390/educsci11100628

Using Machine Learning Approaches to Explore Non-Cognitive Variables Influencing Reading Proficiency in English among Filipino Learners

Education Sciences ◽

10.3390/educsci11100628 ◽

2021 ◽

Vol 11 (10) ◽

pp. 628

Author(s):

Allan B. I. Bernardo ◽

Macario O. Cordel ◽

Rochelle Irene G. Lucas ◽

Jude Michael M. Teves ◽

Sashmir A. Yap ◽

...

Keyword(s):

Machine Learning ◽

School Environment ◽

Binary Classification ◽

Reading Proficiency ◽

Classification Model ◽

Support Vector ◽

Test Accuracy ◽

Learning Approaches ◽

Classification Methods ◽

Poor Reading

Filipino students ranked last in reading proficiency among all countries/territories in the PISA 2018, with only 19% meeting the minimum (Level 2) standard. It is imperative to understand the range of factors that contribute to low reading proficiency, specifically variables that can be the target of interventions to help students with poor reading proficiency. We used machine learning approaches, specifically binary classification methods, to identify the variables that best predict low (Level 1b and lower) vs. higher (Level 1a or better) reading proficiency using the Philippine PISA data from a nationally representative sample of 15-year-old students. Several binary classification methods were applied, and the best classification model was derived using support vector machines (SVM), with 81.2% average test accuracy. The 20 variables with the highest impact in the model were identified and interpreted using a socioecological perspective of development and learning. These variables included students’ home-related resources and socioeconomic constraints, learning motivation and mindsets, classroom reading experiences with teachers, reading self-beliefs, attitudes, and experiences, and social experiences in the school environment. The results were discussed with reference to the need for a systems perspective to addresses poor proficiency, requiring interconnected interventions that go beyond students’ classroom reading.

Evaluation Of Machine Learning Tools For Distinguishing Fraud From Error

Journal of Business & Economics Research (JBER) ◽

10.19030/jber.v11i9.8067 ◽

2013 ◽

Vol 11 (9) ◽

pp. 393

Author(s):

Mei Zhang

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Classification Problem ◽

General Purpose ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Learning Tools ◽

Pure Chance ◽

Error Classification

Fraud and error are two underlying sources of misstated financial statements. Modern machine learning techniques provide a potential direction to distinguish the two factors in such statements. In this paper, a thorough evaluation is conducted evaluation on how the off-the-shelf machine learning tools perform for fraud/error classification. In particular, the task is treated as a standard binary classification problem; i.e., mapping from an input vector of financial indices to a class label which is either error or fraud. With a real dataset of financial restatements, this study empirically evaluates and analyzes five state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines, decision trees, and bagging. There are several important observations from the experimental results. First, it is observed that bagging performs the best among these commonly used general purpose machine learning tools. Second, the results show that the underlying relationship from the statement indices to the fraud/error decision is likely to be non-linear. Third, it is very challenging to distinguish error from fraud, and general machine learning approaches, though perform better than pure chance, leave much room for improvement. The results suggest that more advanced or task-specific solutions are needed for fraud/error classification.

Artificial Intelligence, Big data and Machine Learning approaches in Preci-sion Medicine & Drug Discovery

Current Drug Targets ◽

10.2174/1389450122999210104205732 ◽

2021 ◽

Vol 22 ◽

Author(s):

Anuraj Nayarisseri ◽

Ravina Khandelwal ◽

Poonam Tanwar ◽

Maddala Madhavi ◽

Diksha Sharma ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Big Data ◽

Drug Discovery ◽

Virtual Screening ◽

Biologically Active ◽

Classification Model ◽

Support Vector ◽

Learning Approaches ◽

Qsar Modeling

Abstract: Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short span of time. The present review is an overview based on some applications of Machine Learning based tools such as GOLD, DeepPVP, LIBSVM, etc and the algorithms involved such as support vector machine (SVM), random forest (RF), decision trees and artificial neural networks (ANN) etc in the various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure-based virtual screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intesti-nal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF model in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1 by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predicts flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery in order to model small-molecule drugs, Gene Biomarkers, and identifying the novel drug targets for various diseases.

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Practical CO2—WAG Field Operational Designs Using Hybrid Numerical-Machine-Learning Approaches

Energies ◽

10.3390/en14041055 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1055

Author(s):

Qian Sun ◽

William Ampomah ◽

Junyu You ◽

Martha Cather ◽

Robert Balch

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

History Matching ◽

Optimization Problems ◽

Learning Technologies ◽

Petroleum Engineering ◽

Support Vector ◽

Learning Approaches ◽

Field Development ◽

Proxy Models

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.

Binary Classification Model Based on Machine Learning Algorithm for the Short-Circuit Detection in Power System

Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence ◽

10.1145/3377713.3377753 ◽

2019 ◽

Author(s):

Qiwei Lu ◽

Jinpei Cheng ◽

Dianlin Guo ◽

Mengmeng Su ◽

Xuewei Wu ◽

...

Keyword(s):

Machine Learning ◽

Power System ◽

Learning Algorithm ◽

Binary Classification ◽

Short Circuit ◽

Classification Model ◽

Machine Learning Algorithm ◽

Model Based

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

A comparison study: Support vector machines for binary classification in machine learning

2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) ◽

10.1109/bmei.2011.6098517 ◽

2011 ◽

Cited By ~ 4

Author(s):

Wencai Zeng ◽

Jiong Jia ◽

Zhonglong Zheng ◽

Chenmao Xie ◽

Li Guo

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Binary Classification ◽

Support Vector ◽

Comparison Study ◽

Vector Machines ◽

Study Support

A Two-Stage Machine Learning Classification Approach to Identify Extremism in Arabic Opinions

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/391022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 736-745

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Feature Selection Method ◽

Support Vector ◽

Two Stage ◽

Machine Learning Classification ◽

Second Stage ◽

Testing Data ◽

Stage Classification ◽

Positive Dataset

The increased usage of the Internet and social networks allowed and enabled people to express their views, which have generated an increasing attention lately. Sentiment Analysis (SA) techniques are used to determine the polarity of information, either positive or negative, toward a given topic, including opinions. In this research, we have introduced a machine learning approach based on Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) classifiers, to find and classify extreme opinions in Arabic reviews. To achieve this, a dataset of 1500 Arabic reviews was collected from Google Play Store. In addition, a two-stage Classification process was applied to classify the reviews. In the first stage, we built a binary classifier to sort out positive from negative reviews. In the second stage, however we applied a binary classification mechanism based on a set of proposed rules that distinguishes extreme positive from positive reviews, and extreme negative from negative reviews. Four major experiments were conducted with a total of 10 different sub experiments to fulfill the two-stage process using different X-validation schemas and Term Frequency-Inverse Document Frequency feature selection method. Obtained results have indicated that SVM was the best during the first stage classification with 30% testing data, and NB was the best with 20% testing data. The results of the second stage classification indicated that SVM has scored better results in identifying extreme positive reviews when dealing with the positive dataset with an overall accuracy of 68.7% and NB showed better accuracy results in identifying extreme negative reviews when dealing with the negative dataset, with an overall accuracy of 72.8%.