computational statistics Latest Research Papers

Background. Even in today’s environment, when there is a plethora of information accessible, it may be difficult to make appropriate choices for one’s well-being. Data mining, machine learning, and computational statistics are among the most popular arenas of training today, and they are all aimed at secondary empowered person in making good decisions that will maximize the outcome of whatever working area they are involved with. Because the degree of rise in the number of patient roles is directly related to the rate of people growth and lifestyle variations, the healthcare sector has a significant need for data processing services. When it comes to cancer, the prognosis is an expression that relates to the possibility of the patient surviving in general, but it may also be used to describe the severity of the sickness as it will present itself in the patient's future timeline. Methodology. The proposed technique consists of three stages: input data acquisition, preprocessing, and classification. Data acquisition consists of input raw data which is followed by preprocessing to eliminate the missed data and the classification is carried out using ensemble classifier to analyze the stages of cancer. This study explored the combined influence of the prominent labels in conjunction with one another utilizing the multilabel classifier approach, which is successful. Finally, an ensemble classifier model has been constructed and experimentally validated to increase the accuracy of the classifier model, which has been previously shown. The entire performance of the recommended and tested models demonstrates a steady development of 2% to 6% over the baseline presentation on the baseline performance. Results. Providing a good contribution to the general health welfare of noncommercial potential workers in the healthcare sector is an opportunity provided by this recommended job outcome. It is anticipated that alternative solutions to these constraints, as well as automation of the whole process flow of all five phases, will be the key focus of the work to be carried out shortly. Predicting health status of employee in industry or information trends is made easier by these data patterns. The proposed classifier achieves the accuracy rate of 93.265%.

Download Full-text

A NEW MULTIVARIATE ZERO-INFLATED HURDLE MODEL WITH APPLICATIONS IN AUTOMOBILE INSURANCE

Astin Bulletin ◽

10.1017/asb.2021.39 ◽

2022 ◽

pp. 1-24

Author(s):

Pengcheng Zhang ◽

David Pitt ◽

Xueyuan Wu

Keyword(s):

Model Performance ◽

Insurance Companies ◽

Hurdle Model ◽

Automobile Insurance ◽

Insurance Company ◽

Computational Statistics ◽

Count Models ◽

Insurance Portfolio ◽

Insurance Business ◽

One Year

Abstract The fact that a large proportion of insurance policyholders make no claims during a one-year period highlights the importance of zero-inflated count models when analyzing the frequency of insurance claims. There is a vast literature focused on the univariate case of zero-inflated count models, while work in the area of multivariate models is considerably less advanced. Given that insurance companies write multiple lines of insurance business, where the claim counts on these lines of business are often correlated, there is a strong incentive to analyze multivariate claim count models. Motivated by the idea of Liu and Tian (Computational Statistics and Data Analysis, 83, 200–222; 2015), we develop a multivariate zero-inflated hurdle model to describe multivariate count data with extra zeros. This generalization offers more flexibility in modeling the behavior of individual claim counts while also incorporating a correlation structure between claim counts for different lines of insurance business. We develop an application of the expectation–maximization (EM) algorithm to enable the statistical inference necessary to estimate the parameters associated with our model. Our model is then applied to an automobile insurance portfolio from a major insurance company in Spain. We demonstrate that the model performance for the multivariate zero-inflated hurdle model is superior when compared to several alternatives.

Download Full-text

Geographical origin discrimination of “Ntopia” olive oil cultivar from Ionian islands using volatile compounds analysis and computational statistics

European Food Research and Technology ◽

10.1007/s00217-021-03863-2 ◽

2021 ◽

Author(s):

Effimia Eriotou ◽

Ioannis K. Karabagias ◽

Sofia Maina ◽

Dionysios Koulougliotis ◽

Nikolaos Kopsahelis

Keyword(s):

Olive Oil ◽

Volatile Compounds ◽

Geographical Origin ◽

Computational Statistics ◽

Ionian Islands

Download Full-text

Data Management and Modeling in Plant Biology

Frontiers in Plant Science ◽

10.3389/fpls.2021.717958 ◽

2021 ◽

Vol 12 ◽

Author(s):

Maria Krantz ◽

David Zimmer ◽

Stephan O. Adler ◽

Anastasia Kitashova ◽

Edda Klipp ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Computational Models ◽

Quantitative Model ◽

Plant Biology ◽

Complex Data ◽

Computational Statistics ◽

Plant Environment ◽

Experimental Findings ◽

Statistical Trends

The study of plant-environment interactions is a multidisciplinary research field. With the emergence of quantitative large-scale and high-throughput techniques, amount and dimensionality of experimental data have strongly increased. Appropriate strategies for data storage, management, and evaluation are needed to make efficient use of experimental findings. Computational approaches of data mining are essential for deriving statistical trends and signatures contained in data matrices. Although, current biology is challenged by high data dimensionality in general, this is particularly true for plant biology. Plants as sessile organisms have to cope with environmental fluctuations. This typically results in strong dynamics of metabolite and protein concentrations which are often challenging to quantify. Summarizing experimental output results in complex data arrays, which need computational statistics and numerical methods for building quantitative models. Experimental findings need to be combined by computational models to gain a mechanistic understanding of plant metabolism. For this, bioinformatics and mathematics need to be combined with experimental setups in physiology, biochemistry, and molecular biology. This review presents and discusses concepts at the interface of experiment and computation, which are likely to shape current and future plant biology. Finally, this interface is discussed with regard to its capabilities and limitations to develop a quantitative model of plant-environment interactions.

Download Full-text

Forecasting Short Term Peak Loads of Distribution Transformer (DT) Using Machine Learning and Computational Statistics—Various Methodologies and Their Pros and Cons

10.1007/978-981-16-1299-2_1 ◽

2021 ◽

pp. 1-11

Author(s):

Arghya Roy

Keyword(s):

Machine Learning ◽

Computational Statistics ◽

Distribution Transformer ◽

Short Term ◽

Peak Loads ◽

Pros And Cons

Download Full-text

Interpretable Machine Learning for Genomics

10.21203/rs.3.rs-448572/v1 ◽

2021 ◽

Author(s):

David Watson

Keyword(s):

Machine Learning ◽

Cell Function ◽

State Of The Art ◽

Future Research ◽

Computational Statistics ◽

Close Collaboration ◽

Current State ◽

Interpretable Machine Learning ◽

Underlying Mechanisms ◽

Generation Sequencing

Abstract High-throughput technologies such as next generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state of the art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines.

Download Full-text

Computational Statistics and Machine Learning Techniques for Effective Decision Making on Student’s Employment for Real-Time

Mathematics ◽

10.3390/math9111166 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1166

Author(s):

Deepak Kumar ◽

Chaman Verma ◽

Pradeep Kumar Singh ◽

Maria Simona Raboaca ◽

Raluca-Andreea Felseghi ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Approach ◽

T Test ◽

Job Placement ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Computational Statistics ◽

The Impact

The present study accentuated a hybrid approach to evaluate the impact, association and discrepancies of demographic characteristics on a student’s job placement. The present study extracted several significant academic features that determine the Master of Business Administration (MBA) student placement and confirm the placed gender. This paper recommended a novel futuristic roadmap for students, parents, guardians, institutions, and companies to benefit at a certain level. Out of seven experiments, the first five experiments were conducted with deep statistical computations, and the last two experiments were performed with supervised machine learning approaches. On the one hand, the Support Vector Machine (SVM) outperformed others with the uppermost accuracy of 90% to predict the employment status. On the other hand, the Random Forest (RF) attained a maximum accuracy of 88% to recognize the gender of placed students. Further, several significant features are also recommended to identify the placement of gender and placement status. A statistical t-test at 0.05 significance level proved that the student’s gender did not influence their offered salary during job placement and MBA specializations Marketing and Finance (Mkt&Fin) and Marketing and Human Resource (Mkt&HR) (p > 0.05). Additionally, the result of the t-test also showed that gender did not affect student’s placement test percentage scores (p > 0.05) and degree streams such as Science and Technology (Sci&Tech), Commerce and Management (Comm&Mgmt). Others did not affect the offered salary (p > 0.05). Further, the χ2 test revealed a significant association between a student’s course specialization and student’s placement status (p < 0.05). It also proved that there is no significant association between a student’s degree and placement status (p > 0.05). The current study recommended automatic placement prediction with demographic impact identification for the higher educational universities and institutions that will help human communities (students, teachers, parents, institutions) to prepare for the future accordingly.

Download Full-text

Computational Statistics and Data Science in the Twenty‐First Century

Wiley StatsRef: Statistics Reference Online ◽

10.1002/9781118445112.stat08324 ◽

2021 ◽

pp. 1-17

Author(s):

Andrew J. Holbrook ◽

Akihiko Nishimura ◽

Xiang Ji ◽

Marc A. Suchard

Keyword(s):

Data Science ◽

First Century ◽

Computational Statistics ◽

Twenty First Century

Download Full-text

Introduction to Python and the Field of Computational Statistics

Applied Univariate, Bivariate, and Multivariate Statistics Using Python ◽

10.1002/9781119578208.ch2 ◽

2021 ◽

pp. 25-49

Keyword(s):

Computational Statistics

Download Full-text

Quantification and compensation of thermal distortion in additive manufacturing: A computational statistics approach

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2020.113611 ◽

2021 ◽

Vol 375 ◽

pp. 113611

Author(s):

Chao Wang ◽

Shaofan Li ◽

Danielle Zeng ◽

Xinhai Zhu

Keyword(s):

Additive Manufacturing ◽

Thermal Distortion ◽

Computational Statistics

Download Full-text

computational statistics
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Data Mining in Employee Healthcare Detection Using Intelligence Techniques for Industry Development

A NEW MULTIVARIATE ZERO-INFLATED HURDLE MODEL WITH APPLICATIONS IN AUTOMOBILE INSURANCE

Geographical origin discrimination of “Ntopia” olive oil cultivar from Ionian islands using volatile compounds analysis and computational statistics

Data Management and Modeling in Plant Biology

Forecasting Short Term Peak Loads of Distribution Transformer (DT) Using Machine Learning and Computational Statistics—Various Methodologies and Their Pros and Cons

Interpretable Machine Learning for Genomics

Computational Statistics and Machine Learning Techniques for Effective Decision Making on Student’s Employment for Real-Time

Computational Statistics and Data Science in the Twenty‐First Century

Introduction to Python and the Field of Computational Statistics

Quantification and compensation of thermal distortion in additive manufacturing: A computational statistics approach

Export Citation Format

computational statisticsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Data Mining in Employee Healthcare Detection Using Intelligence Techniques for Industry Development

A NEW MULTIVARIATE ZERO-INFLATED HURDLE MODEL WITH APPLICATIONS IN AUTOMOBILE INSURANCE

Geographical origin discrimination of “Ntopia” olive oil cultivar from Ionian islands using volatile compounds analysis and computational statistics

Data Management and Modeling in Plant Biology

Forecasting Short Term Peak Loads of Distribution Transformer (DT) Using Machine Learning and Computational Statistics—Various Methodologies and Their Pros and Cons

Interpretable Machine Learning for Genomics

Computational Statistics and Machine Learning Techniques for Effective Decision Making on Student’s Employment for Real-Time

Computational Statistics and Data Science in the Twenty‐First Century

Introduction to Python and the Field of Computational Statistics

Quantification and compensation of thermal distortion in additive manufacturing: A computational statistics approach

computational statistics
Recently Published Documents