computational statistics
Recently Published Documents


TOTAL DOCUMENTS

244
(FIVE YEARS 43)

H-INDEX

13
(FIVE YEARS 2)

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Abolfazl Mehbodniya ◽  
Ihtiram Raza Khan ◽  
Sudeshna Chakraborty ◽  
M. Karthik ◽  
Kamakshi Mehta ◽  
...  

Background. Even in today’s environment, when there is a plethora of information accessible, it may be difficult to make appropriate choices for one’s well-being. Data mining, machine learning, and computational statistics are among the most popular arenas of training today, and they are all aimed at secondary empowered person in making good decisions that will maximize the outcome of whatever working area they are involved with. Because the degree of rise in the number of patient roles is directly related to the rate of people growth and lifestyle variations, the healthcare sector has a significant need for data processing services. When it comes to cancer, the prognosis is an expression that relates to the possibility of the patient surviving in general, but it may also be used to describe the severity of the sickness as it will present itself in the patient's future timeline. Methodology. The proposed technique consists of three stages: input data acquisition, preprocessing, and classification. Data acquisition consists of input raw data which is followed by preprocessing to eliminate the missed data and the classification is carried out using ensemble classifier to analyze the stages of cancer. This study explored the combined influence of the prominent labels in conjunction with one another utilizing the multilabel classifier approach, which is successful. Finally, an ensemble classifier model has been constructed and experimentally validated to increase the accuracy of the classifier model, which has been previously shown. The entire performance of the recommended and tested models demonstrates a steady development of 2% to 6% over the baseline presentation on the baseline performance. Results. Providing a good contribution to the general health welfare of noncommercial potential workers in the healthcare sector is an opportunity provided by this recommended job outcome. It is anticipated that alternative solutions to these constraints, as well as automation of the whole process flow of all five phases, will be the key focus of the work to be carried out shortly. Predicting health status of employee in industry or information trends is made easier by these data patterns. The proposed classifier achieves the accuracy rate of 93.265%.


2022 ◽  
pp. 1-24
Author(s):  
Pengcheng Zhang ◽  
David Pitt ◽  
Xueyuan Wu

Abstract The fact that a large proportion of insurance policyholders make no claims during a one-year period highlights the importance of zero-inflated count models when analyzing the frequency of insurance claims. There is a vast literature focused on the univariate case of zero-inflated count models, while work in the area of multivariate models is considerably less advanced. Given that insurance companies write multiple lines of insurance business, where the claim counts on these lines of business are often correlated, there is a strong incentive to analyze multivariate claim count models. Motivated by the idea of Liu and Tian (Computational Statistics and Data Analysis, 83, 200–222; 2015), we develop a multivariate zero-inflated hurdle model to describe multivariate count data with extra zeros. This generalization offers more flexibility in modeling the behavior of individual claim counts while also incorporating a correlation structure between claim counts for different lines of insurance business. We develop an application of the expectation–maximization (EM) algorithm to enable the statistical inference necessary to estimate the parameters associated with our model. Our model is then applied to an automobile insurance portfolio from a major insurance company in Spain. We demonstrate that the model performance for the multivariate zero-inflated hurdle model is superior when compared to several alternatives.


2021 ◽  
Vol 12 ◽  
Author(s):  
Maria Krantz ◽  
David Zimmer ◽  
Stephan O. Adler ◽  
Anastasia Kitashova ◽  
Edda Klipp ◽  
...  

The study of plant-environment interactions is a multidisciplinary research field. With the emergence of quantitative large-scale and high-throughput techniques, amount and dimensionality of experimental data have strongly increased. Appropriate strategies for data storage, management, and evaluation are needed to make efficient use of experimental findings. Computational approaches of data mining are essential for deriving statistical trends and signatures contained in data matrices. Although, current biology is challenged by high data dimensionality in general, this is particularly true for plant biology. Plants as sessile organisms have to cope with environmental fluctuations. This typically results in strong dynamics of metabolite and protein concentrations which are often challenging to quantify. Summarizing experimental output results in complex data arrays, which need computational statistics and numerical methods for building quantitative models. Experimental findings need to be combined by computational models to gain a mechanistic understanding of plant metabolism. For this, bioinformatics and mathematics need to be combined with experimental setups in physiology, biochemistry, and molecular biology. This review presents and discusses concepts at the interface of experiment and computation, which are likely to shape current and future plant biology. Finally, this interface is discussed with regard to its capabilities and limitations to develop a quantitative model of plant-environment interactions.


2021 ◽  
Author(s):  
David Watson

Abstract High-throughput technologies such as next generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state of the art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1166
Author(s):  
Deepak Kumar ◽  
Chaman Verma ◽  
Pradeep Kumar Singh ◽  
Maria Simona Raboaca ◽  
Raluca-Andreea Felseghi ◽  
...  

The present study accentuated a hybrid approach to evaluate the impact, association and discrepancies of demographic characteristics on a student’s job placement. The present study extracted several significant academic features that determine the Master of Business Administration (MBA) student placement and confirm the placed gender. This paper recommended a novel futuristic roadmap for students, parents, guardians, institutions, and companies to benefit at a certain level. Out of seven experiments, the first five experiments were conducted with deep statistical computations, and the last two experiments were performed with supervised machine learning approaches. On the one hand, the Support Vector Machine (SVM) outperformed others with the uppermost accuracy of 90% to predict the employment status. On the other hand, the Random Forest (RF) attained a maximum accuracy of 88% to recognize the gender of placed students. Further, several significant features are also recommended to identify the placement of gender and placement status. A statistical t-test at 0.05 significance level proved that the student’s gender did not influence their offered salary during job placement and MBA specializations Marketing and Finance (Mkt&Fin) and Marketing and Human Resource (Mkt&HR) (p > 0.05). Additionally, the result of the t-test also showed that gender did not affect student’s placement test percentage scores (p > 0.05) and degree streams such as Science and Technology (Sci&Tech), Commerce and Management (Comm&Mgmt). Others did not affect the offered salary (p > 0.05). Further, the χ2 test revealed a significant association between a student’s course specialization and student’s placement status (p < 0.05). It also proved that there is no significant association between a student’s degree and placement status (p > 0.05). The current study recommended automatic placement prediction with demographic impact identification for the higher educational universities and institutions that will help human communities (students, teachers, parents, institutions) to prepare for the future accordingly.


Sign in / Sign up

Export Citation Format

Share Document