A Study on Analyzing Data from Designed Experiments with Missing Values

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.

Download Full-text

Wpływ liczebności próby i metody zastępowania braków odpowiedzi na miary dopasowania oraz wyniki modelowania ścieżkowego / Influence of a sample size and a method of hand- ling missing values on the results and goodness of fit of the path relation model

Econometrics ◽

10.15611/ekt.2016.3.04 ◽

2016 ◽

Author(s):

Łukasz Skowron ◽

Marcin Gąsior

Keyword(s):

Sample Size ◽

Goodness Of Fit ◽

Missing Values ◽

Relation Model

Download Full-text

How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach

10.26434/chemrxiv.5327677.v1 ◽

2017 ◽

Author(s):

Natalia Sizochenko ◽

Alicja Mikolajczyk ◽

Karolina Jagiello ◽

Tomasz Puzyn ◽

Jerzy Leszczynski ◽

...

Keyword(s):

Missing Values ◽

Metal Oxide Nanoparticles ◽

Machine Learning Techniques ◽

Self Organizing Map ◽

Related Data ◽

Mammalian Cell Lines ◽

Learning Techniques ◽

Combination Of Methods ◽

Map Analysis ◽

Unique Chemical

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.

Download Full-text

How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach

10.26434/chemrxiv.5327677 ◽

2017 ◽

Author(s):

Natalia Sizochenko ◽

Alicja Mikolajczyk ◽

Karolina Jagiello ◽

Tomasz Puzyn ◽

Jerzy Leszczynski ◽

...

Keyword(s):

Missing Values ◽

Metal Oxide Nanoparticles ◽

Machine Learning Techniques ◽

Self Organizing Map ◽

Related Data ◽

Mammalian Cell Lines ◽

Learning Techniques ◽

Combination Of Methods ◽

Map Analysis ◽

Unique Chemical

Application of predictive modeling approaches is able solve the problem of the missing data. There are a lot of studies that investigate the effects of missing values on qualitative or quantitative modeling, but only few publications have been<br>discussing it in case of applications to nanotechnology related data. Current project aimed at the development of multi-nano-read-across modeling technique that helps in predicting the toxicity of different species: bacteria, algae, protozoa, and mammalian cell lines. In this study, the experimental toxicity for 184 metal- and silica oxides (30 unique chemical types) nanoparticles from 15 experimental datasets was analyzed. A hybrid quantitative multi-nano-read-across approach that combines interspecies correlation analysis and self-organizing map analysis was developed. At the first step, hidden patterns of toxicity among the nanoparticles were identified using a combination of methods. Then the developed model that based on categorization of metal oxide nanoparticles’ toxicity outcomes was evaluated by means of combination of supervised and unsupervised machine learning techniques to find underlying factors responsible for toxicity.

Download Full-text

Revisiting Glycerol Esterification with Acetic Acid over Amberlyst-35 via Statistically Designed Experiments: Overcoming Transport Limitations

10.26434/chemrxiv.7731305.v1 ◽

2019 ◽

Author(s):

Víctor Gabriel Baldovino Medrano ◽

Karen V. Caballero ◽

Hernando Guerrero-Amaya

Keyword(s):

Acetic Acid ◽

Active Sites ◽

Batch Reactor ◽

Catalyst Particle ◽

Experimental Designs ◽

Particle Sizes ◽

Turnover Rates ◽

Designed Experiments ◽

Different Temperatures ◽

Glycerol Esterification

Turnover rates for glycerol esterification with acetic acid over Amberlyst-35 were measured under different temperatures, reactants and active sites concentrations, and catalyst particle sizes. Data were collected in a batch reactor. Experiments were done following a sequence of factorial experimental designs.

Download Full-text

Burn-in Failure Analysis of 0.5μm 1MB SRAM: Barrier Glue Layer Cracks and Tungsten Plug “Worm Holes”

ISTFA 1996: Conference Proceedings from the 22nd International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa1996p0159 ◽

1996 ◽

Author(s):

E. Widener ◽

S. Tatti ◽

P. Schani ◽

S. Crown ◽

B. Dunnigan ◽

...

Keyword(s):

Failure Analysis ◽

Product Quality ◽

Cmos Process ◽

Designed Experiments ◽

Root Cause ◽

Poly Silicon ◽

New Process ◽

Supply Current ◽

Electrical Analysis ◽

Current Characteristics

Abstract A new 0.5 um 1 Megabit SRAM which employed a double metal, triple poly CMOS process with Tungsten plug metal to poly /silicon contacts was introduced. During burn-in of this product, high currents, apparently due to electrical overstress, were experienced. Electrical analysis showed abnormal supply current characteristics at high voltages. Failure analysis identified the sites of the high currents of the bum-in rejects and discovered cracks in the glue layer prior to Tungsten deposition as the root cause of the failure. The glue layer cracks allowed a reaction with the poly/silicon, causing opens at the bottom of contacts. These floating nodes caused high currents and often latch-up during burn-in. Designed experiments in the wafer fab identified an improved glue layer process, which has been implemented. The new process shows improvement in burn in performance as well as outgoing product quality.

Download Full-text

Faculty Opinions recommendation of Population time series: process variability, observation errors, missing values, lags, and hidden states.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1022586.274674 ◽

2005 ◽

Author(s):

Mark Rees

Keyword(s):

Time Series ◽

Missing Values ◽

Process Variability ◽

Hidden States

Download Full-text

Missing Values in Vector Time Series

SSRN Electronic Journal ◽

10.2139/ssrn.2097465 ◽

2012 ◽

Author(s):

Heather Eunice Mitchell

Keyword(s):

Time Series ◽

Missing Values ◽

Vector Time Series

Download Full-text

Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.

Current Genomics ◽

10.2174/1389202921999201224110101 ◽

2020 ◽

Vol 21 ◽

Author(s):

Sukanya Panja ◽

Sarra Rahem ◽

Cassandra J. Chu ◽

Antonina Mitrofanova

Keyword(s):

Machine Learning ◽

Missing Values ◽

Therapeutic Response ◽

Patient Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Human Machine Interaction ◽

Data Repositories ◽

Response Modeling

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.

Download Full-text

MATHURA (MBI) - A NOVEL IMPUTATION MEASURE FOR IMPUTATION OF MISSING VALUES IN MEDICAL DATASETS

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191216123352 ◽

2019 ◽

Vol 13 ◽

Author(s):

B. Mathura Bai ◽

N. Mangathayaru ◽

B. Padmaja Rani ◽

Shadi Aljawarneh

Keyword(s):

Similarity Measure ◽

Medical Records ◽

Missing Values ◽

Similarity Measures ◽

Common Problems ◽

Experiment Analysis

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.

Download Full-text