The Estimation of Missing Values in Highly Correlated Data

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.

Download Full-text

Bayesian Methods for Highly Correlated Data: An Application to Disinfection By-Products Spontaneous Abortion

American Journal of Epidemiology ◽

10.1093/aje/163.suppl_11.s227-c ◽

2006 ◽

Vol 163 (suppl_11) ◽

pp. S227-S227

Author(s):

R F MacLehose ◽

D B Dunson ◽

A H Herring ◽

J S Kaufman ◽

K E Hartmann ◽

...

Keyword(s):

Spontaneous Abortion ◽

Bayesian Methods ◽

Correlated Data ◽

Highly Correlated ◽

By Products

Download Full-text

Using a DEA–AutoML Approach to Track SDG Achievements

Sustainability ◽

10.3390/su122310124 ◽

2020 ◽

Vol 12 (23) ◽

pp. 10124

Author(s):

Bodin Singpai ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Correlated Data ◽

Integrative Approach ◽

Classification Prediction ◽

Predicted Values ◽

Highly Correlated ◽

Prediction Problems ◽

Data Result

Each country needs to monitor progress on their Sustainable Development Goals (SDGs) to develop strategies that meet the expectations of the United Nations. Data envelope analysis (DEA) can help identify best practices for SDGs by setting goals to compete against. Automated machine learning (AutoML) simplifies machine learning for researchers who need less time and manpower to predict future situations. This work introduces an integrative method that integrates DEA and AutoML to assess and predict performance in SDGs. There are two experiments with different data properties in their interval and correlation to demonstrate the approach. Three prediction targets are set to measure performance in the regression, classification, and multi-target regression algorithms. The back-propagation neural network (BPNN) is used to validate the outputs of the AutoML. As a result, AutoML can outperform BPNN for regression and classification prediction problems. Low standard deviation (SD) data result in poor prediction performance for the BPNN, but does not have a significant impact on AutoML. Highly correlated data result in a higher accuracy, but does not significantly affect the R-squared values between the actual and predicted values. This integrative approach can accurately predict the projected outputs, which can be used as national goals to transform an inefficient country into an efficient country.

Download Full-text

Sampling of Highly Correlated Data for Polynomial Regression and Model Discovery

Advances in Intelligent Data Analysis - Lecture Notes in Computer Science ◽

10.1007/3-540-44816-0_37 ◽

2001 ◽

pp. 370-377 ◽

Cited By ~ 2

Author(s):

Grace W. Rumantir ◽

Chris S. Wallace

Keyword(s):

Polynomial Regression ◽

Correlated Data ◽

Highly Correlated

Download Full-text

$\ell_P$ Norm Independently Interpretable Regularization Based Sparse Coding for Highly Correlated Data

IEEE Access ◽

10.1109/access.2019.2911004 ◽

2019 ◽

Vol 7 ◽

pp. 53542-53554

Author(s):

Haoli Zhao ◽

Shuxue Ding ◽

Xiang Li ◽

Lingjun Zhao

Keyword(s):

Sparse Coding ◽

Correlated Data ◽

Highly Correlated

Download Full-text

Dynamic substitution based encryption algorithm for highly correlated data

Multidimensional Systems and Signal Processing ◽

10.1007/s11045-020-00730-3 ◽

2020 ◽

Author(s):

Arslan Shafique ◽

Jameel Ahmed

Keyword(s):

Correlated Data ◽

Encryption Algorithm ◽

Highly Correlated

Download Full-text

Comparison of SMMR and SSM/I passive microwave data collected over Antarctica

Annals of Glaciology ◽

10.1017/s0260305500012726 ◽

1993 ◽

Vol 17 ◽

pp. 131-136 ◽

Cited By ~ 32

Author(s):

Kenneth C. Jezek ◽

Carolyn J. Merry ◽

Don J. Cavalieri

Keyword(s):

Brightness Temperature ◽

Microwave Radiometer ◽

Correlated Data ◽

Data Sets ◽

Microwave Brightness Temperature ◽

Brightness Temperatures ◽

Scatter Plots ◽

Microwave Imager ◽

Highly Correlated ◽

The Antarctic

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.

Download Full-text

Predicting THM concentration in treated water with highly correlated data

Mathematical and Computer Modelling ◽

10.1016/0895-7177(88)90658-9 ◽

1988 ◽

Vol 11 ◽

pp. 1073-1076 ◽

Cited By ~ 2

Author(s):

Paul J. Ossenbruggen ◽

Marie Gaudard ◽

M.Robin Collins

Keyword(s):

Correlated Data ◽

Treated Water ◽

Highly Correlated

Download Full-text

Uncertainty of the slope for highly correlated data

American Journal of Physics ◽

10.1119/1.17029 ◽

1992 ◽

Vol 60 (1) ◽

pp. 11-11 ◽

Cited By ~ 5

Author(s):

Mark A. Heald

Keyword(s):

Correlated Data ◽

Highly Correlated

Download Full-text

Missing by Design Patterns for Optimizing Survey Response by Efficient and Consistent Data Collection

10.20378/irb-49487 ◽

2020 ◽

Author(s):

◽

Sara Bahrami

Keyword(s):

Optimal Design ◽

Design Patterns ◽

Missing Values ◽

Complete Data ◽

Training Data ◽

Lower Probability ◽

Analysis Model ◽

Vast Number ◽

Ex Post ◽

Highly Correlated

Respondent burden due to long questionnaires in surveys can negatively affect the response rate as well as the quality of responses. A solution to this problem is to use split questionnaire design (SQD). In an SQD, the items of the long questionnaire are divided into subsets and only a fraction of item-subsets are assigned to random subsamples of individuals. This will lead to several shorter questionnaires which are administered to random subsample of individuals. The completed sub-questionnaires are then combined and the missing values due to design are imputed by means of multiple imputation method. Identification problems can be avoided in advance by ensuring that the combination of variables in the analysis model of interest are jointly observed on at least a subsample of individuals. Furthermore, including an appropriate combination of items in each sub-questionnaire is the most important concern in designing the SQD to reduce the information loss, i.e. highly correlated items that explain each other well should not be jointly missing. For this reason, training data must be available from previous surveys or a pilot study to exploit the association between the variables. In this thesis two SQDs are proposed. In the first study a potential design for NEPS data is introduced. The data consist of items which can be divided and allocated into blocks according to their context, with the objective that the within block correlations are higher relative to the between block correlations. According to the design, the target sample is divided to subsamples. In addition to the items of a whole block which is assigned to each subsample, a fraction of items of the remaining blocks are randomly drawn and assigned to each subsample. Where items that belong to blocks with relatively higher correlations are drawn with lower probability. The design is evaluated by means of several ex-post investigations. The design is imposed on complete data and several models are estimated for both complete data and data deleted by design. The design is also compared with a random multiple matrix sampling design which assigns random subset of items to each sample individual. In the second study, a genetic algorithm is used to search among a vast number of SQDs to find the optimal design. The algorithm evaluates the designs by the fraction of missing information (FMI) induced by the design. The optimal design is the one with the smallest FMI. The optimal design is evaluated by means of several simulation studies and is compared with a random MMS design.

Download Full-text