The Estimation of Missing Values in Highly Correlated Data

Author(s):  
R. C. Tabony
1993 ◽  
Vol 17 ◽  
pp. 131-136 ◽  
Author(s):  
Kenneth C. Jezek ◽  
Carolyn J. Merry ◽  
Don J. Cavalieri

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.


2006 ◽  
Vol 163 (suppl_11) ◽  
pp. S227-S227
Author(s):  
R F MacLehose ◽  
D B Dunson ◽  
A H Herring ◽  
J S Kaufman ◽  
K E Hartmann ◽  
...  

2020 ◽  
Vol 12 (23) ◽  
pp. 10124
Author(s):  
Bodin Singpai ◽  
Desheng Wu

Each country needs to monitor progress on their Sustainable Development Goals (SDGs) to develop strategies that meet the expectations of the United Nations. Data envelope analysis (DEA) can help identify best practices for SDGs by setting goals to compete against. Automated machine learning (AutoML) simplifies machine learning for researchers who need less time and manpower to predict future situations. This work introduces an integrative method that integrates DEA and AutoML to assess and predict performance in SDGs. There are two experiments with different data properties in their interval and correlation to demonstrate the approach. Three prediction targets are set to measure performance in the regression, classification, and multi-target regression algorithms. The back-propagation neural network (BPNN) is used to validate the outputs of the AutoML. As a result, AutoML can outperform BPNN for regression and classification prediction problems. Low standard deviation (SD) data result in poor prediction performance for the BPNN, but does not have a significant impact on AutoML. Highly correlated data result in a higher accuracy, but does not significantly affect the R-squared values between the actual and predicted values. This integrative approach can accurately predict the projected outputs, which can be used as national goals to transform an inefficient country into an efficient country.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 53542-53554
Author(s):  
Haoli Zhao ◽  
Shuxue Ding ◽  
Xiang Li ◽  
Lingjun Zhao

1993 ◽  
Vol 17 ◽  
pp. 131-136 ◽  
Author(s):  
Kenneth C. Jezek ◽  
Carolyn J. Merry ◽  
Don J. Cavalieri

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.


1988 ◽  
Vol 11 ◽  
pp. 1073-1076 ◽  
Author(s):  
Paul J. Ossenbruggen ◽  
Marie Gaudard ◽  
M.Robin Collins

2020 ◽  
Author(s):  
◽  
Sara Bahrami

Respondent burden due to long questionnaires in surveys can negatively affect the response rate as well as the quality of responses. A solution to this problem is to use split questionnaire design (SQD). In an SQD, the items of the long questionnaire are divided into subsets and only a fraction of item-subsets are assigned to random subsamples of individuals. This will lead to several shorter questionnaires which are administered to random subsample of individuals. The completed sub-questionnaires are then combined and the missing values due to design are imputed by means of multiple imputation method. Identification problems can be avoided in advance by ensuring that the combination of variables in the analysis model of interest are jointly observed on at least a subsample of individuals. Furthermore, including an appropriate combination of items in each sub-questionnaire is the most important concern in designing the SQD to reduce the information loss, i.e. highly correlated items that explain each other well should not be jointly missing. For this reason, training data must be available from previous surveys or a pilot study to exploit the association between the variables. In this thesis two SQDs are proposed. In the first study a potential design for NEPS data is introduced. The data consist of items which can be divided and allocated into blocks according to their context, with the objective that the within block correlations are higher relative to the between block correlations. According to the design, the target sample is divided to subsamples. In addition to the items of a whole block which is assigned to each subsample, a fraction of items of the remaining blocks are randomly drawn and assigned to each subsample. Where items that belong to blocks with relatively higher correlations are drawn with lower probability. The design is evaluated by means of several ex-post investigations. The design is imposed on complete data and several models are estimated for both complete data and data deleted by design. The design is also compared with a random multiple matrix sampling design which assigns random subset of items to each sample individual. In the second study, a genetic algorithm is used to search among a vast number of SQDs to find the optimal design. The algorithm evaluates the designs by the fraction of missing information (FMI) induced by the design. The optimal design is the one with the smallest FMI. The optimal design is evaluated by means of several simulation studies and is compared with a random MMS design.


Sign in / Sign up

Export Citation Format

Share Document