Completing missing values using discovered formal concepts

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.

Download Full-text

Wpływ liczebności próby i metody zastępowania braków odpowiedzi na miary dopasowania oraz wyniki modelowania ścieżkowego / Influence of a sample size and a method of hand- ling missing values on the results and goodness of fit of the path relation model

Econometrics ◽

10.15611/ekt.2016.3.04 ◽

2016 ◽

Author(s):

Łukasz Skowron ◽

Marcin Gąsior

Keyword(s):

Sample Size ◽

Goodness Of Fit ◽

Missing Values ◽

Relation Model

Download Full-text

How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach

10.26434/chemrxiv.5327677.v1 ◽

2017 ◽

Author(s):

Natalia Sizochenko ◽

Alicja Mikolajczyk ◽

Karolina Jagiello ◽

Tomasz Puzyn ◽

Jerzy Leszczynski ◽

...

Keyword(s):

Missing Values ◽

Metal Oxide Nanoparticles ◽

Machine Learning Techniques ◽

Self Organizing Map ◽

Related Data ◽

Mammalian Cell Lines ◽

Learning Techniques ◽

Combination Of Methods ◽

Map Analysis ◽

Unique Chemical

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.

Download Full-text

How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach

10.26434/chemrxiv.5327677 ◽

2017 ◽

Author(s):

Natalia Sizochenko ◽

Alicja Mikolajczyk ◽

Karolina Jagiello ◽

Tomasz Puzyn ◽

Jerzy Leszczynski ◽

...

Keyword(s):

Missing Values ◽

Metal Oxide Nanoparticles ◽

Machine Learning Techniques ◽

Self Organizing Map ◽

Related Data ◽

Mammalian Cell Lines ◽

Learning Techniques ◽

Combination Of Methods ◽

Map Analysis ◽

Unique Chemical

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.

Download Full-text

Faculty Opinions recommendation of Population time series: process variability, observation errors, missing values, lags, and hidden states.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1022586.274674 ◽

2005 ◽

Author(s):

Mark Rees

Keyword(s):

Time Series ◽

Missing Values ◽

Process Variability ◽

Hidden States

Download Full-text

Missing Values in Vector Time Series

SSRN Electronic Journal ◽

10.2139/ssrn.2097465 ◽

2012 ◽

Author(s):

Heather Eunice Mitchell

Keyword(s):

Time Series ◽

Missing Values ◽

Vector Time Series

Download Full-text

Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.

Current Genomics ◽

10.2174/1389202921999201224110101 ◽

2020 ◽

Vol 21 ◽

Author(s):

Sukanya Panja ◽

Sarra Rahem ◽

Cassandra J. Chu ◽

Antonina Mitrofanova

Keyword(s):

Machine Learning ◽

Missing Values ◽

Therapeutic Response ◽

Patient Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Human Machine Interaction ◽

Data Repositories ◽

Response Modeling

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.

Download Full-text

MATHURA (MBI) - A NOVEL IMPUTATION MEASURE FOR IMPUTATION OF MISSING VALUES IN MEDICAL DATASETS

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191216123352 ◽

2019 ◽

Vol 13 ◽

Author(s):

B. Mathura Bai ◽

N. Mangathayaru ◽

B. Padmaja Rani ◽

Shadi Aljawarneh

Keyword(s):

Similarity Measure ◽

Medical Records ◽

Missing Values ◽

Similarity Measures ◽

Common Problems ◽

Experiment Analysis

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.

Download Full-text

Three-step imputation of missing values in condition monitoring datasets

IET Generation Transmission & Distribution ◽

10.1049/iet-gtd.2019.1446 ◽

2020 ◽

Vol 14 (16) ◽

pp. 3288-3300

Author(s):

Hang Liu ◽

Youyuan Wang ◽

WeiGen Chen

Keyword(s):

Condition Monitoring ◽

Missing Values

Download Full-text

Comprehensive survey among statistical members of medical ethics committees in Germany on their personal impression of completeness and correctness of biostatistical aspects of submitted study protocols

BMJ Open ◽

10.1136/bmjopen-2019-032864 ◽

2020 ◽

Vol 10 (2) ◽

pp. e032864

Author(s):

Geraldine Rauch ◽

Lorena Hafermann ◽

Ulrich Mansmann ◽

Iris Pigeot

Keyword(s):

Medical Ethics ◽

Missing Values ◽

Sample Size Calculation ◽

Ethics Committees ◽

Secondary Outcome ◽

Study Protocols ◽

Individual Assessment ◽

Statistical Problems ◽

Multiple Comparison Procedures

ObjectivesTo assess biostatistical quality of study protocols submitted to German medical ethics committees according to personal appraisal of their statistical members.DesignWe conducted a web-based survey among biostatisticians who have been active as members in German medical ethics committees during the past 3 years.SettingThe study population was identified by a comprehensive web search on websites of German medical ethics committees.ParticipantsThe final list comprised 86 eligible persons. In total, 57 (66%) completed the survey.QuestionnaireThe first item checked whether the inclusion criterion was met. The last item assessed satisfaction with the survey. Four items aimed to characterise the medical ethics committee in terms of type and location, one item asked for the urgency of biostatistical training addressed to the medical investigators. The main 2×12 items reported an individual assessment of the quality of biostatistical aspects in the submitted study protocols, while distinguishing studies according to the German Medicines Act (AMG)/German Act on Medical Devices (MPG) and studies non-regulated by these laws.Primary and secondary outcome measuresThe individual assessment of the quality of biostatistical aspects corresponds to the primary objective. Thus, participants were asked to complete the sentence ‘In x% of the submitted study protocols, the following problem occurs’, where 12 different statistical problems were formulated. All other items assess secondary endpoints.ResultsFor all biostatistical aspects, 45 of 49 (91.8%) participants judged the quality of AMG/MPG study protocols much better than that of ‘non-regulated’ studies. The latter are in median affected 20%–60% more often by statistical problems. The highest need for training was reported for sample size calculation, missing values and multiple comparison procedures.ConclusionsBiostatisticians being active in German medical ethics committees classify the biostatistical quality of study protocols as low for ‘non-regulated’ studies, whereas quality is much better for AMG/MPG studies.

Download Full-text