scholarly journals A methodology for the design of experiments in computational intelligence with multiple regression models

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2721 ◽  
Author(s):  
Carlos Fernandez-Lozano ◽  
Marcos Gestal ◽  
Cristian R. Munteanu ◽  
Julian Dorado ◽  
Alejandro Pazos

The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

2021 ◽  
Vol 23 (4) ◽  
pp. 2742-2752
Author(s):  
Tamar L. Greaves ◽  
Karin S. Schaffarczyk McHale ◽  
Raphael F. Burkart-Radke ◽  
Jason B. Harper ◽  
Tu C. Le

Machine learning models were developed for an organic reaction in ionic liquids and validated on a selection of ionic liquids.


Author(s):  
Magdalena Kukla-Bartoszek ◽  
Paweł Teisseyre ◽  
Ewelina Pośpiech ◽  
Joanna Karłowska-Pik ◽  
Piotr Zieliński ◽  
...  

AbstractIncreasing understanding of human genome variability allows for better use of the predictive potential of DNA. An obvious direct application is the prediction of the physical phenotypes. Significant success has been achieved, especially in predicting pigmentation characteristics, but the inference of some phenotypes is still challenging. In search of further improvements in predicting human eye colour, we conducted whole-exome (enriched in regulome) sequencing of 150 Polish samples to discover new markers. For this, we adopted quantitative characterization of eye colour phenotypes using high-resolution photographic images of the iris in combination with DIAT software analysis. An independent set of 849 samples was used for subsequent predictive modelling. Newly identified candidates and 114 additional literature-based selected SNPs, previously associated with pigmentation, and advanced machine learning algorithms were used. Whole-exome sequencing analysis found 27 previously unreported candidate SNP markers for eye colour. The highest overall prediction accuracies were achieved with LASSO-regularized and BIC-based selected regression models. A new candidate variant, rs2253104, located in the ARFIP2 gene and identified with the HyperLasso method, revealed predictive potential and was included in the best-performing regression models. Advanced machine learning approaches showed a significant increase in sensitivity of intermediate eye colour prediction (up to 39%) compared to 0% obtained for the original IrisPlex model. We identified a new potential predictor of eye colour and evaluated several widely used advanced machine learning algorithms in predictive analysis of this trait. Our results provide useful hints for developing future predictive models for eye colour in forensic and anthropological studies.


2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Georgia Tsiliki ◽  
Cristian R. Munteanu ◽  
Jose A. Seoane ◽  
Carlos Fernandez-Lozano ◽  
Haralambos Sarimveis ◽  
...  

2020 ◽  
Vol 9 (2) ◽  
pp. 111-118
Author(s):  
Shindy Arti ◽  
Indriana Hidayah ◽  
Sri Suning Kusumawardhani

Machine learning is commonly used to predict and implement  pattern recognition and the relationship between variables. Causal machine learning combines approaches for analyzing the causal impact of intervention on the result, asumming a considerably ambigous variables. The combination technique of causality and machine learning is adequate for predicting and understanding the cause and effect of the results. The aim of this study is a systematic review to identify which causal machine learning approaches are generally used. This paper focuses on what data characteristics are applied to causal machine learning research and how to assess the output of algorithms used in the context of causal machine learning research. The review paper analyzes 20 papers with various approaches. This study categorizes data characteristics based on the type of data, attribute value, and the data dimension. The Bayesian Network (BN) commonly used in the context of causality. Meanwhile, the propensity score is the most extensively used in causality research. The variable value will affect algorithm performance. This review can be as a guide in the selection of a causal machine learning system.


Author(s):  
Preethi Krishna Rao Mane ◽  
K. Narasimha Rao

The adoption of the occupancy sensors has become an inevitable in commercial and non-commercial security devices, owing to their proficiency in the energy management. It has been found that the usages of conventional sensors is shrouded with operational problems, hence the use of the Doppler radar offers better mitigation of such problems. However, the usage of Doppler radar towards occupancy sensing in existing system is found to be very much in infancy stage. Moreover, the performance of monitoring using Doppler radar is yet to be improved more. Therefore, this paper introduces a simplified framework for enriching the event sensing performance by efficient selection of minimal robust attributes using Doppler radar. Adoption of analytical methodology has been carried out to find that different machine learning approaches could be further used for improving the accuracy performance for the feature that has been extracted in the proposed system of occuancy system.


Author(s):  
B. C. Naha ◽  
A. K. Chakravarty ◽  
M. A. Mir ◽  
M. Bhakat

The objective of the study was to optimise the age at first use (AAFU) of semen in Sahiwal breeding bulls which will help in early selection of bulls under progeny testing programme. The data on AAFU, conception rate based on first A.I. (CRFAI), overall conception rate (OCR) and birth weight (B.WT) of 43 Sahiwal bulls during 1987 to 2013 at NDRI centre pertaining to 8 sets of Sahiwal improvement programme at ICAR-NDRI, Karnal, India were adjusted for significant environmental influences and subsequently analyzed. Simple and multiple regression models were used for prediction of CRFAI and OCR of Sahiwal bulls. Comparative evaluation of three developed models (I to III) have showed that Model III, having AAFU and B.WT which fulfill the accuracy of model as revealed by high coefficient of determination, low mean sum of square to due error, low conceptual predictive value and low Bayesian information criterion . The results showed that average predicted CRFAI was highest (49.34%) at less than 5 years and lowest (44.79%) at > 6 years of age at first A.I. /use. Similarly average predicted OCR was highest (48.50%) at less than 5 years and lowest (44.56%) at >6 years of age at first A.I. / use of Sahiwal bulls. In organized herd under progeny testing programme, Sahiwal bulls should be used prior to 5 years which is expected to result in 4.45% better CRFAI and 3.94% better OCR in comparison to Sahiwal bulls used after 6 years of age.


2021 ◽  
Vol 6 (4) ◽  
pp. 293-307
Author(s):  
Luc Dewulf ◽  
Mauro Chiacchia ◽  
Aaron S. Yeardley ◽  
Robert A. Milton ◽  
Solomon F. Brown ◽  
...  

This is a first comparison of the sequential design of experiments strategy and global sensitivity analysis for nanomaterials, thus enabling sustainable product and process design in future.


2021 ◽  
Vol 32 (4) ◽  
pp. 362-375
Author(s):  
Ligita Gasparėnienė ◽  
Rita Remeikiene ◽  
Aleksejus Sosidko ◽  
Vigita Vėbraitė

In order to forecast stock prices based on economic indicators, many studies have been conducted using well-known statistical methods. Meanwhile, since ~2010 as the power of computers improved, new methods of machine learning began to be used. It would be interesting to know how those algorithms using a variety of mathematical and statistical methods, are able to predict the stock market. The purpose of this article is to model the monthly price of the S&P 500 index based on U.S. economic indicators using statistical, machine learning, deep learning approaches and finally compare metrics of those models. After the selection of indicators according to the data visualization, multicollinearity tests, statistical significance tests, 3 out of 27 indicators remained. The main finding of the research is that the authors improved the baseline statistical linear regression model by 19 percent using a ML Random Forest algorithm. In this way, model achieved accuracy 97.68% of prediction S&P 500 index.


2018 ◽  
Vol 8 (12) ◽  
pp. 2570 ◽  
Author(s):  
Yves Rybarczyk ◽  
Rasa Zalakeviciute

Current studies show that traditional deterministic models tend to struggle to capture the non-linear relationship between the concentration of air pollutants and their sources of emission and dispersion. To tackle such a limitation, the most promising approach is to use statistical models based on machine learning techniques. Nevertheless, it is puzzling why a certain algorithm is chosen over another for a given task. This systematic review intends to clarify this question by providing the reader with a comprehensive description of the principles underlying these algorithms and how they are applied to enhance prediction accuracy. A rigorous search that conforms to the PRISMA guideline is performed and results in the selection of the 46 most relevant journal papers in the area. Through a factorial analysis method these studies are synthetized and linked to each other. The main findings of this literature review show that: (i) machine learning is mainly applied in Eurasian and North American continents and (ii) estimation problems tend to implement Ensemble Learning and Regressions, whereas forecasting make use of Neural Networks and Support Vector Machines. The next challenges of this approach are to improve the prediction of pollution peaks and contaminants recently put in the spotlights (e.g., nanoparticles).


2017 ◽  
Author(s):  
Boris Guennewig ◽  
Zachary Davies ◽  
Mark Pinese ◽  
Antony A Cooper

AbstractMotivationMachine learning (ML) is a powerful tool to create supervised models that can distinguish between classes and facilitate biomarker selection in high-dimensional datasets, including RNA Sequencing (RNA-Seq). However, it is variable as to which is the best performing ML algorithm(s) for a specific dataset, and identifying the optimal match is time consuming. blkbox is a software package including a shiny frontend, that integrates nine ML algorithms to select the best performing classifier for a specific dataset. blkbox accepts a simple abundance matrix as input, includes extensive visualization, and also provides an easy to use feature selection step to enable convenient and rapid potential biomarker selection, all without requiring parameter optimization.ResultsFeature selection makes blkbox computationally inexpensive while multi-functionality, including nested cross-fold validation (NCV), ensures robust results. blkbox identified algorithms that outperformed prior published ML results. Applying NCV identifies features, which are utilized to gain high accuracy.AvailabilityThe software is available as a CRAN R package and as a developer version with extended functionality on github (https://github.com/gboris/blkbox)[email protected]


Sign in / Sign up

Export Citation Format

Share Document