Using Machine Learning Approaches to Explore Non-Cognitive Variables Influencing Reading Proficiency in English among Filipino Learners

2021 ◽  
Vol 11 (10) ◽  
pp. 628
Author(s):  
Allan B. I. Bernardo ◽  
Macario O. Cordel ◽  
Rochelle Irene G. Lucas ◽  
Jude Michael M. Teves ◽  
Sashmir A. Yap ◽  
...  

Filipino students ranked last in reading proficiency among all countries/territories in the PISA 2018, with only 19% meeting the minimum (Level 2) standard. It is imperative to understand the range of factors that contribute to low reading proficiency, specifically variables that can be the target of interventions to help students with poor reading proficiency. We used machine learning approaches, specifically binary classification methods, to identify the variables that best predict low (Level 1b and lower) vs. higher (Level 1a or better) reading proficiency using the Philippine PISA data from a nationally representative sample of 15-year-old students. Several binary classification methods were applied, and the best classification model was derived using support vector machines (SVM), with 81.2% average test accuracy. The 20 variables with the highest impact in the model were identified and interpreted using a socioecological perspective of development and learning. These variables included students’ home-related resources and socioeconomic constraints, learning motivation and mindsets, classroom reading experiences with teachers, reading self-beliefs, attitudes, and experiences, and social experiences in the school environment. The results were discussed with reference to the need for a systems perspective to addresses poor proficiency, requiring interconnected interventions that go beyond students’ classroom reading.

2013 ◽  
Vol 11 (9) ◽  
pp. 393
Author(s):  
Mei Zhang

<p>Fraud and error are two underlying sources of misstated financial statements. Modern machine learning techniques provide a potential direction to distinguish the two factors in such statements. In this paper, a thorough evaluation is conducted evaluation on how the off-the-shelf machine learning tools perform for fraud/error classification. In particular, the task is treated as a standard binary classification problem; i.e., mapping from an input vector of financial indices to a class label which is either error or fraud. With a real dataset of financial restatements, this study empirically evaluates and analyzes five state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines, decision trees, and bagging. There are several important observations from the experimental results. First, it is observed that bagging performs the best among these commonly used general purpose machine learning tools. Second, the results show that the underlying relationship from the statement indices to the fraud/error decision is likely to be non-linear. Third, it is very challenging to distinguish error from fraud, and general machine learning approaches, though perform better than pure chance, leave much room for improvement. The results suggest that more advanced or task-specific solutions are needed for fraud/error classification.</p>


2021 ◽  
Vol 22 ◽  
Author(s):  
Anuraj Nayarisseri ◽  
Ravina Khandelwal ◽  
Poonam Tanwar ◽  
Maddala Madhavi ◽  
Diksha Sharma ◽  
...  

Abstract: Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short span of time. The present review is an overview based on some applications of Machine Learning based tools such as GOLD, DeepPVP, LIBSVM, etc and the algorithms involved such as support vector machine (SVM), random forest (RF), decision trees and artificial neural networks (ANN) etc in the various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure-based virtual screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intesti-nal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF model in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1 by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predicts flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery in order to model small-molecule drugs, Gene Biomarkers, and identifying the novel drug targets for various diseases.


2019 ◽  
Vol 20 (5) ◽  
pp. 488-500 ◽  
Author(s):  
Yan Hu ◽  
Yi Lu ◽  
Shuo Wang ◽  
Mengying Zhang ◽  
Xiaosheng Qu ◽  
...  

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world&#039;s highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. </P><P> Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. </P><P> Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. </P><P> Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 1055
Author(s):  
Qian Sun ◽  
William Ampomah ◽  
Junyu You ◽  
Martha Cather ◽  
Robert Balch

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1694
Author(s):  
Mathew Ashik ◽  
A. Jyothish ◽  
S. Anandaram ◽  
P. Vinod ◽  
Francesco Mercaldo ◽  
...  

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3068
Author(s):  
Soumaya Dghim ◽  
Carlos M. Travieso-González ◽  
Radim Burget

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.


The increased usage of the Internet and social networks allowed and enabled people to express their views, which have generated an increasing attention lately. Sentiment Analysis (SA) techniques are used to determine the polarity of information, either positive or negative, toward a given topic, including opinions. In this research, we have introduced a machine learning approach based on Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) classifiers, to find and classify extreme opinions in Arabic reviews. To achieve this, a dataset of 1500 Arabic reviews was collected from Google Play Store. In addition, a two-stage Classification process was applied to classify the reviews. In the first stage, we built a binary classifier to sort out positive from negative reviews. In the second stage, however we applied a binary classification mechanism based on a set of proposed rules that distinguishes extreme positive from positive reviews, and extreme negative from negative reviews. Four major experiments were conducted with a total of 10 different sub experiments to fulfill the two-stage process using different X-validation schemas and Term Frequency-Inverse Document Frequency feature selection method. Obtained results have indicated that SVM was the best during the first stage classification with 30% testing data, and NB was the best with 20% testing data. The results of the second stage classification indicated that SVM has scored better results in identifying extreme positive reviews when dealing with the positive dataset with an overall accuracy of 68.7% and NB showed better accuracy results in identifying extreme negative reviews when dealing with the negative dataset, with an overall accuracy of 72.8%.


Sign in / Sign up

Export Citation Format

Share Document