scholarly journals Robust Multivariate Correlation Techniques: A Confirmation Analysis using Covid-19 Data Set

2021 ◽  
Vol 29 (2) ◽  
Author(s):  
Friday Zinzendoff Okwonu ◽  
Nor Aishah Ahad ◽  
Joshua Sarduana Apanapudor ◽  
Festus Irismisose Arunaye

Robust multivariate correlation techniques are proposed to determine the strength of the association between two or more variables of interest since the existing multivariate correlation techniques are susceptible to outliers when the data set contains random outliers. The performances of the proposed techniques were compared with the conventional multivariate correlation techniques. All techniques under study are applied on COVID-19 data sets for Malaysia and Nigeria to determine the level of association between study variables which are confirmed, discharged, and death cases. These techniques’ performances are evaluated based on the multivariate correlation (R), multivariate coefficient of determination (R^2), and Adjusted R^2. The proposed techniques showed R=0.99 and the conventional methods showed that R ranges from 0.44 to 0.73. The R^2 and the Adjusted R^2 for proposed methods are 0.98 and 0.97 while the conventional methods showed that R equals 0.53, 0.44, and 0.19 whereas Adjusted R^2 equals 0.52, 0.43, and 0.18, respectively. The proposed techniques strongly affirmed that for any patient to be discharged or die of the Covid-19, the patient must be confirmed Covid-19 positive, whereas the conventional method showed moderate to very weak affirmation. Based on the results, the proposed techniques are robust and show a very strong association between the variables of interest than the conventional techniques.

Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


1999 ◽  
Vol 1 (4) ◽  
pp. 313-323 ◽  
Author(s):  
Boris P. Kovatchev ◽  
Leon S. Farhy ◽  
Daniel J. Cox ◽  
Martin Straume ◽  
Vladimir I. Yankov ◽  
...  

A dynamical network model of insulin-glucose interactions in subjects with Type I Diabetes was developed and applied to data sets for 40 subjects. Each data set contained the amount of dextrose + insulin infused and blood glucose (BG) determinations, sampled every 5 minutes during a one-hour standardized euglycemic hyperinsulinemic clamp and a subsequent one-hour BG reduction to moderate hypoglycemic levels. The model approximated the temporal pattern of BG and on that basis predicted the counterregulatory response of each subject. The nonlinear fits explained more than 95% of the variance of subjects' BG fluctuations, with a median coefficient of determination 97.7%. For all subjects the model-predicted counterregulatory responses correlated with measured plasma epinephrine concentrations. The observed nadirs of BG during the tests correlated negatively with the model-predicted insulin utilization coefficient (r = -0.51,p< 0.001) and counterregulation rates (r= -0.63,p< 0.001). Subjects with a history of multiple severe hypoglycemic episodes demonstrated slower onset of counterregulation compared to subjects with no such history (p< 0.03).


Data Mining ◽  
2013 ◽  
pp. 28-49
Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


2019 ◽  
Vol 2019 ◽  
pp. 1-11 ◽  
Author(s):  
Ahmet Irvem ◽  
Mustafa Ozbuldu

Use of the satellite and reanalysis precipitation products, as supplementary data sources, are steadily rising for hydrometeorological applications, especially in data-sparse areas. However, the accuracy of these data sets is often lacking, especially in Turkey. This study evaluates the accuracy of satellite precipitation product (TRMM 3B42V7) and reanalysis precipitation product (NCEP-CFSR) against rain gauge observations for the 1998–2010 periods. Average annual precipitation for the 25 basins in Turkey was calculated using rain gauge precipitation data from 225 stations. The inverse distance weighting (IDW) method was used to calculate areal precipitation for each basin using GIS. According to the results of statistical analysis, the coefficient of determination for the TRMM product gave satisfactory results (R2 > 0.88). However, R2 for the CFSR data set ranges from 0.35 for the Eastern Black Sea basin to 0.93 for the West Mediterranean basin. RMSE was calculated to be 95.679 mm and 128.097 mm for the TRMM and CFSR data, respectively. The NSE results of TRMM data showed very good performance for 6 basins, while the PBias value showed very good performance for 7 basins. The NSE results of CFSR data showed very good performance for 3 basins, while the PBias value showed very good performance for 6 basins.


2019 ◽  
Vol 9 (5) ◽  
pp. 115 ◽  
Author(s):  
Ömer Türk ◽  
Mehmet Siraç Özerdem

The studies implemented with Electroencephalogram (EEG) signals are progressing very rapidly and brain computer interfaces (BCI) and disease determinations are carried out at certain success rates thanks to new methods developed in this field. The effective use of these signals, especially in disease detection, is very important in terms of both time and cost. Currently, in general, EEG studies are used in addition to conventional methods as well as deep learning networks that have recently achieved great success. The most important reason for this is that in conventional methods, increasing classification accuracy is based on too many human efforts as EEG is being processed, obtaining the features is the most important step. This stage is based on both the time-consuming and the investigation of many feature methods. Therefore, there is a need for methods that do not require human effort in this area and can learn the features themselves. Based on that, two-dimensional (2D) frequency-time scalograms were obtained in this study by applying Continuous Wavelet Transform to EEG records containing five different classes. Convolutional Neural Network structure was used to learn the properties of these scalogram images and the classification performance of the structure was compared with the studies in the literature. In order to compare the performance of the proposed method, the data set of the University of Bonn was used. The data set consists of five EEG records containing healthy and epilepsy disease which are labeled as A, B, C, D, and E. In the study, A-E and B-E data sets were classified as 99.50%, A-D and B-D data sets were classified as 100% in binary classifications, A-D-E data sets were 99.00% in triple classification, A-C-D-E data sets were 90.50%, B-C-D-E data sets were 91.50% in quaternary classification, and A-B-C-D-E data sets were in the fifth class classification with an accuracy of 93.60%.


2017 ◽  
Vol 32 (1) ◽  
pp. 77-84 ◽  
Author(s):  
Zdenka Stojanovska ◽  
Kremena Ivanova ◽  
Peter Bossew ◽  
Blazo Boev ◽  
Zora Zunic ◽  
...  

We present a method for the estimation of annual radon concentration based on short-term (three months) measurements. The study involves results from two independent sets of indoor radon concentration measurements performed in 16 cities of the Republic of Macedonia. The first data set contains winter and annual radon concentration obtained during the National survey in 2010 and the second, contains only the radon concentration measured during the winter of 2013. Both data sets pertain to radon concentration from the same cities and have been measured applying the same methodology in ground floor dwellings. The results appeared to be consistent and the dispersion of radon concentration was low. Linear regression analysis of the radon concentration measured in winter of 2010 and of the 2010 annual radon concentration revealed a high coefficient of determination R2 = 0.92, with a relative uncertainty of 3%. Furthermore, this model was used to estimate the annual radon concentration solely from winter-term measurements performed in 2013. The geometrical mean of the estimated annual radon concentration of the 2013: radon concentration (A-2013) =98 Bqm-3 was almost equal to the geometrical mean of the annual radon concentration from the 2010, radon concentration (A-2010) = 99 Bqm-3. Analysis of the influence of building characteristics, such as presence/absence of a basement in the building, or the dominant building material on the estimated annual radon concentration is also reported. Our results show that a low number of relatively short-term radon measurements may produce a reasonable insight into a gross average obtained in a larger survey.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


Sign in / Sign up

Export Citation Format

Share Document