An Ensemble Model for Predicting Passenger Demand Using Taxi Data Set

Author(s):  
Santosh Rajak ◽  
Ujwala Baruah
Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1285
Author(s):  
Mohammed Al-Sarem ◽  
Faisal Saeed ◽  
Zeyad Ghaleb Al-Mekhlafi ◽  
Badiea Abdulkarem Mohammed ◽  
Tawfik Al-Hadhrami ◽  
...  

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.


2019 ◽  
Vol 17 (4) ◽  
pp. 769-781 ◽  
Author(s):  
Preet Kamal ◽  
Sachin Ahuja

Purpose The purpose of this paper is to develop a prediction model to study the factors affecting the academic performance of students pursuing an undergraduate professional course (BCA). For this purpose, the ensemble model of decision tree, gradient boost algorithm and Naïve Bayes techniques is created to achieve best and accurate results. Monitoring the academic performance of students has emerged as an essential field as it plays a vital role in the accurate development and growth of students’ critical and cognitive thinking. If the academic performance of students during the initial years of the graduation can be predicted, different stakeholders, i.e. government, policymakers, academicians, can be helped to make significant remedial strategies. This comprehensible practice can go a long way in shaping the ideologies of young minds, enhancing pedagogical practices and reframing of curriculum. This study aims to develop positive steps that can be taken to enhance future endeavours in the field of education. Design/methodology/approach A questionnaire was prepared specifically to find out influential factors affecting the academic performance of the students. Its specific area of investigation was demographic, social, academic and behavioural factors that influence the performance of the students. Then, an ensemble model was built using three techniques based on accuracy rate. A 10-fold cross-validation technique was applied to access the fitness of results obtained from proposed ensemble model. Findings The result obtained from ensemble model provides efficient and accurate prediction of student performance and helps identify the students that are at risk of failing or being a drop-out. The effect of previous semester’s academic performance shows a significant impact on current academic performance along with other factors (such as number of siblings and distance of university from residence). Any major mishap during past one year also affects the academic performance along with habit-based behavioural factors such as consumption of alcohol and tobacco. Research limitations/implications Though the existing model considers aspects related to a student’s family income and academic indicators, it tends to ignore major factors such as influence of peer pressure, self-study habits and time devoted to study after college hours. An attempt is made in this paper to examine the above cited factors in predicting the academic performance of the students. The need of the hour is to develop innovative models to assess and make advancements in the present educational set-up. The ensemble model is best suited to study all factors needed to accomplish a robust and reliable model. Originality\value The present model is developed using classification and regression algorithms. The model is able to achieve 99 per cent accuracy with the existing data set and is able to identify the influential factors affecting the academic performance. As early detection of at-risk students is possible with the proposed model, preventive and corrective measures can be proposed for improving the overall academic performance of the students.


2021 ◽  
Author(s):  
Rovshan Mollayev ◽  
Aghamehdi Aliyev

Abstract Study was conducted to evaluate development of gas-bearing formations in the Azerbaijan sector of the Caspian Sea. Study considered subsea wellheads tied into subsea manifold, and that manifold tied to offshore facility. Flow Assurance required the calculation of subsea Flowing Wellhead Temperature (FWHT) and Pressures (FWHP). 242 subsurface scenarios were conducted with reservoir model. To accommodate all subsurface scenarios in flow assurance assessments, it was required to carry out FWHT/P calculations for all. Reservoir model was equipped with vertical lift performance curves for pressure loss calculations in tubing and logic for pressure loss estimation in subsea system. If correctly calculated, [FWHP >= dP(subsea) + Pseparator] logic should have been satisfied. As the reservoir model was not set for FWHT calculations, an external tool was required to cope with that task. Both nodal analysis software and dynamic flow modeling were considered as appropriate tools. However, as nodal modelling allowed much more automation, it was decided to use nodal analysis over dynamic modelling. To improve FWHP calculations: the logic was built into the reservoir model to: ○  estimate dP(subsea) from gas rate vs pressure drop curves ○  confirm validity of [minFWHP(wells 1, 2…n) >= dP(subsea) + Pseparator] statement: step was re-iterated until the statement was satisfied To improve FWHT calculations: Enthalpy Balance method was tested for gas wells with 1-2% error against actual data Then, nodal analysis models with the same method were built for the project wells Code was developed to calculate FWHT as part of the ensemble model predictions in following steps: ○  Well properties of each prediction step were transferred to nodal analysis software. ○  kH was varied until nodal analysis software calculated gas rate matched to ensemble model output within 1mmscf/d error Summary: Described methods allowed to significantly increase accuracy in FWHT and FWHP calculations and accommodate all possible subsurface scenarios in Flow Assurance evaluation Integration of subsea and topside hydraulics in subsurface modelling is important to develop flow assured design for development Enthalpy Balance temperature prediction method provides good match to actual data Use of coding provides huge opportunities to automate data analysis Paper will present different approach to calculation of FWHT and FWHP in subsurface modelling, integration of subsea and topside hydraulics in subsurface modelling via alternatives ways, use enthalpy balance temperature modelling, integration between nodal analysis and subsurface modelling and coding can prove analysis of large subsurface data set.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Syed Nisar Hussain Bukhari ◽  
Amit Jain ◽  
Ehtishamul Haq ◽  
Moaiad Ahmad Khder ◽  
Rahul Neware ◽  
...  

Zika virus (ZIKV), the causative agent of Zika fever in humans, is an RNA virus that belongs to the genus Flavivirus. Currently, there is no approved vaccine for clinical use to combat the ZIKV infection and contain the epidemic. Epitope-based peptide vaccines have a large untapped potential for boosting vaccination safety, cross-reactivity, and immunogenicity. Though many attempts have been made to develop vaccines for ZIKV, none of these have proved to be successful. Epitope-based peptide vaccines can act as powerful alternatives to conventional vaccines due to their low production cost, less reactogenic, and allergenic responses. For designing an effective and viable epitope-based peptide vaccine against this deadly virus, it is essential to select the antigenic T-cell epitopes since epitope-based vaccines are considered safe. The in silico machine-learning-based approach for ZIKV T-cell epitope prediction would save a lot of physical experimental time and efforts for speedy vaccine development compared to in vivo approaches. We hereby have trained a machine-learning-based computational model to predict novel ZIKV T-cell epitopes by employing physicochemical properties of amino acids. The proposed ensemble model based on a voting mechanism works by blending the predictions for each class (epitope or nonepitope) from each base classifier. Predictions obtained for each class by the individual classifier are summed up, and the class with the majority vote is predicted upon. An odd number of classifiers have been used to avoid the occurrence of ties in the voting. Experimentally determined ZIKV peptide sequences data set was collected from Immune Epitope Database and Analysis Resource (IEDB) repository. The data set consists of 3,519 sequences, of which 1,762 are epitopes and 1,757 are nonepitopes. The length of sequences ranges from 6 to 30 meter. For each sequence, we extracted 13 physicochemical features. The proposed ensemble model achieved sensitivity, specificity, Gini coefficient, AUC, precision, F-score, and accuracy of 0.976, 0.959, 0.993, 0.994, 0.989, 0.985, and 97.13%, respectively. To check the consistency of the model, we carried out five-fold cross-validation and an average accuracy of 96.072% is reported. Finally, a comparative analysis of the proposed model with existing methods has been carried out using a separate validation data set, suggesting the proposed ensemble model as a better model. The proposed ensemble model will help predict novel ZIKV vaccine candidates to save lives globally and prevent future epidemic-scale outbreaks.


2018 ◽  
Author(s):  
Henrik Singmann ◽  
David Kellen ◽  
Eda Mizrak ◽  
Ilke Öztekin

Cognitive measurement models decompose observed behavior into latent cognitive processes. For situations with more than one condition, such models allow to test hypotheses on the level of the latent processes. We propose a fully Bayesian ensemble model approach to test hypotheses on the level of the latent processes in situations in which multiple measurement models or model classes exist. In the first step, one needs to perform a Bayesian model selection step comparing the hypotheses within each model class. Aggregating the results of the first step yields ensemble posterior model probabilities. We provide an example for a working memory data set using an ensemble of a resource model and a slots model.


Author(s):  
Abdulazeez Yusuf ◽  
Ayuba John

The increasing need for data driven decision making recently has resulted in the application of data mining in various fields including the educational sector which is referred to as educational data mining. The need for improving the performance of data mining models has also been identified as a gap for future researcher. In Nigeria, higher educational institutions collect various students’ data, but these data are rarely used in any decision or policy making to improve the academic performance of students. This research work, attempts to improve the performance of data mining models for predicting students’ academic performance using stacking classifiers ensemble and synthetic minority over-sampling techniques. The research was conducted by adopting and evaluating the performance of J48, IBK and SMO classifiers. The individual classifiers models, standard stacking classifier ensemble model and stacking classifiers ensemble model were trained and tested on 206 students’ data set from the faculty of science federal university Dutse. Students’ specific previous academic performance records at Unified Tertiary Matriculation Examination, Senior Secondary Certificate Examination and first year Cumulative Grade Point Average of students are used as data inputs in WEKA 3.9.1 data mining tool to predict students’ graduation classes of degrees at undergraduate level. The result shows that application of synthetic minority over-sampling technique for class balancing improves all the various models performance with the proposed modified stacking classifiers ensemble model outperforming the various classifiers models in both performance accuracy and RSME values making it the best model.<strong></strong>


2020 ◽  
Author(s):  
Vishan Kumar Gupta ◽  
Prashant Singh Rana

Abstract The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Authors have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors), which can bind to the antioxidant response elements (AREs). The software PaDEL-Descriptor is used for extracting the different features of drug molecules. The ARE data set has total 7439 drug molecules, of which 1147 are active and 6292 are inactive, and each drug molecule contains 1444 features. We have proposed a novel ensemble-based model that can efficiently classify active (binding) and inactive (non-binding) compounds of the data set. Initially, we performed feature selection using random forest importance algorithm in R, and subsequently, we have resolved the class imbalance issue by ensemble learning method itself, where we divided the data set into five data frames, which have an almost equal number of active and inactive drug molecules. An ensemble model based upon the votes of four base classifiers is proposed, which gives an accuracy of 97.14%. The K-fold cross-validation is conducted to measure the consistency of the proposed ensemble model. Finally, the proposed ensemble model is validated on some new drug molecules and compared with some existing models.


2018 ◽  
Vol 20 (3) ◽  
pp. 321-357 ◽  
Author(s):  
Kalyan Nagaraj ◽  
Biplab Bhattacharjee ◽  
Amulyashree Sridhar ◽  
Sharvani GS

Purpose Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites. Design/methodology/approach A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones. Findings Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026. Research limitations/implications The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection. Originality/value The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Wei Xu ◽  
Hongyong Fu ◽  
Yuchen Pan

This work presents a novel soft ensemble model (ANSEM) for financial distress prediction with different sample sizes. It integrates qualitative classifiers (expert system method, ES) and quantitative classifiers (convolutional neural network, CNN) based on the uni-int decision making method of soft set theory (UI). We introduce internet searches indices as new variables for financial distress prediction. By constructing a soft set representation of each classifier and then using the optimal decision on soft sets to identify the financial status of firms, ANSEM inherits advantages of ES, CNN, and UI. Empirical experiments with the real data set of Chinese listed firms demonstrate that the proposed ANSEM has superior predicting performance for financial distress on accuracy and stability with different sample sizes. Further discussions also show that internet searches indices can offer additional information to improve predicting performance.


Biology ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 43
Author(s):  
Mahmoud Ragab ◽  
Khalid Eljaaly ◽  
Nabil A. Alhakamy ◽  
Hani A. Alhadrami ◽  
Adel A. Bahaddad ◽  
...  

Coronavirus disease 2019 (COVID-19) has spread worldwide, and medicinal resources have become inadequate in several regions. Computed tomography (CT) scans are capable of achieving precise and rapid COVID-19 diagnosis compared to the RT-PCR test. At the same time, artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL), find it useful to design COVID-19 diagnoses using chest CT scans. In this aspect, this study concentrates on the design of an artificial intelligence-based ensemble model for the detection and classification (AIEM-DC) of COVID-19. The AIEM-DC technique aims to accurately detect and classify the COVID-19 using an ensemble of DL models. In addition, Gaussian filtering (GF)-based preprocessing technique is applied for the removal of noise and improve image quality. Moreover, a shark optimization algorithm (SOA) with an ensemble of DL models, namely recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU), is employed for feature extraction. Furthermore, an improved bat algorithm with a multiclass support vector machine (IBA-MSVM) model is applied for the classification of CT scans. The design of the ensemble model with optimal parameter tuning of the MSVM model for COVID-19 classification shows the novelty of the work. The effectiveness of the AIEM-DC technique take place on benchmark CT image data set, and the results reported the promising classification performance of the AIEM-DC technique over the recent state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document