scholarly journals A Robust Rerank Approach for Feature Selection and Its Application to Pooling-Based GWA Studies

2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Jia-Rou Liu ◽  
Po-Hsiu Kuo ◽  
Hung Hung

Large-p-small-ndatasets are commonly encountered in modern biomedical studies. To detect the difference between two groups, conventional methods would fail to apply due to the instability in estimating variances int-test and a high proportion of tied values in AUC (area under the receiver operating characteristic curve) estimates. The significance analysis of microarrays (SAM) may also not be satisfactory, since its performance is sensitive to the tuning parameter, and its selection is not straightforward. In this work, we propose a robust rerank approach to overcome the above-mentioned diffculties. In particular, we obtain a rank-based statistic for each feature based on the concept of “rank-over-variable.” Techniques of “random subset” and “rerank” are then iteratively applied to rank features, and the leading features will be selected for further studies. The proposed re-rank approach is especially applicable for large-p-small-ndatasets. Moreover, it is insensitive to the selection of tuning parameters, which is an appealing property for practical implementation. Simulation studies and real data analysis of pooling-based genome wide association (GWA) studies demonstrate the usefulness of our method.

2021 ◽  
Vol 9 (524) ◽  
pp. 243-249
Author(s):  
S. M. Osypenko ◽  
◽  
T. V. Romanchyk ◽  
O. M. Tesnikov ◽  
I. O. Kuruch ◽  
...  

The relevance of the problem of managing the competitiveness of service sector enterprises in modern conditions of economic management is determined. In a contentual form, the task of managing the competitiveness of enterprises is formulated and a scheme of its implementation is proposed, which includes the following stages: formation of factors that determine competitiveness; substantiation of the competitiveness indicator and its model; computing the competitiveness indicator and its subsequent factor analysis; determination of reserves for growth of the competitiveness index; formation of the task of increasing competitiveness; formation of a list of measures for implementing the task; selection of measures for the implementation; implementation of measures, control and regulation. The procedure for computing the integral indicator of competitiveness, which includes groups of indicators that determine the state of the enterprise and the competitiveness of its products, as well as indicators within each group, is considered. The methodology of economic analysis of the level of competitiveness of the enterprise is proposed, which allows to conduct an overall assessment of its competitive position compared to competitors; determine the influence of factors on the difference in their position from the position of competitors; calculate the reserves of growth of the competitiveness indicator and develop measures for their implementation. In accordance with the competitive strategy of the enterprise and the task of increasing the competitiveness indicator on the basis of the use of the provisions of the theory of economic efficiency, a list of measures is substantiated, the practical implementation of which will allow to perform the task with a minimum amount of resources. Operational control over the implementation of measures and their impact on the competitiveness indicator is envisaged in order to make corrective managerial decisions in a timely manner.


Author(s):  
Yingjie Guo ◽  
Chenxi Wu ◽  
Zhian Yuan ◽  
Yansu Wang ◽  
Zhen Liang ◽  
...  

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.


Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 324
Author(s):  
S. Ejaz Ahmed ◽  
Saeid Amiri ◽  
Kjell Doksum

Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.


2011 ◽  
Vol 18 (5) ◽  
pp. 491-519 ◽  
Author(s):  
Heather Gowans ◽  
Nadja Kanellopoulou ◽  
Naomi Hawkins ◽  
Liam Curren ◽  
Karen Melham ◽  
...  

AbstractConsent forms are the principal method for obtaining informed consent from biomedical research participants. The significance of these forms is increasing as more secondary research is undertaken on existing research samples and information, and samples are deposited in biobanks accessible to many researchers. We reviewed a selection of consent forms used in European Genome-Wide Association Studies (GWAS) and identified four common elements that were found in every consent form. Our analysis showed that only two of the four most commonly found elements in our sample of informed consent forms were required in UK law. This raises questions about what should be put in informed consent forms for research participants. These findings could be beneficial for the formulation of participant information and consent documentation in the future studies.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yingjie Guo ◽  
Honghong Cheng ◽  
Zhian Yuan ◽  
Zhen Liang ◽  
Yang Wang ◽  
...  

Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.


Author(s):  
N. I. Pak ◽  
E. V. Asaulenko

The relevance of the study under consideration is due to the need to increase the efficiency of students independent work in solving computational problems. A theoretical rationale is proposed and the practical implementation of an automated training and diagnostic system for the formation of skills to solve problems according to the “white box” model is described. The leading idea of the study is the construction of mental schemes for a given topic, which allow to visualize the dynamics of changes in the learner’s level of ability to solve computational problems. The methods of accounting for forgetting educational information and methods of personalized selection of tasks are substantiated. The site for self-management of user independent work is available at the link: http://msbx.ru. The materials of the article are of practical value for teachers who use e-learning tools in the educational process.


2020 ◽  
Vol 7 (2) ◽  
pp. 34-41
Author(s):  
VLADIMIR NIKONOV ◽  
◽  
ANTON ZOBOV ◽  

The construction and selection of a suitable bijective function, that is, substitution, is now becoming an important applied task, particularly for building block encryption systems. Many articles have suggested using different approaches to determining the quality of substitution, but most of them are highly computationally complex. The solution of this problem will significantly expand the range of methods for constructing and analyzing scheme in information protection systems. The purpose of research is to find easily measurable characteristics of substitutions, allowing to evaluate their quality, and also measures of the proximity of a particular substitutions to a random one, or its distance from it. For this purpose, several characteristics were proposed in this work: difference and polynomial, and their mathematical expectation was found, as well as variance for the difference characteristic. This allows us to make a conclusion about its quality by comparing the result of calculating the characteristic for a particular substitution with the calculated mathematical expectation. From a computational point of view, the thesises of the article are of exceptional interest due to the simplicity of the algorithm for quantifying the quality of bijective function substitutions. By its nature, the operation of calculating the difference characteristic carries out a simple summation of integer terms in a fixed and small range. Such an operation, both in the modern and in the prospective element base, is embedded in the logic of a wide range of functional elements, especially when implementing computational actions in the optical range, or on other carriers related to the field of nanotechnology.


2021 ◽  
Vol 10 (7) ◽  
pp. 435
Author(s):  
Yongbo Wang ◽  
Nanshan Zheng ◽  
Zhengfu Bian

Since pairwise registration is a necessary step for the seamless fusion of point clouds from neighboring stations, a closed-form solution to planar feature-based registration of LiDAR (Light Detection and Ranging) point clouds is proposed in this paper. Based on the Plücker coordinate-based representation of linear features in three-dimensional space, a quad tuple-based representation of planar features is introduced, which makes it possible to directly determine the difference between any two planar features. Dual quaternions are employed to represent spatial transformation and operations between dual quaternions and the quad tuple-based representation of planar features are given, with which an error norm is constructed. Based on L2-norm-minimization, detailed derivations of the proposed solution are explained step by step. Two experiments were designed in which simulated data and real data were both used to verify the correctness and the feasibility of the proposed solution. With the simulated data, the calculated registration results were consistent with the pre-established parameters, which verifies the correctness of the presented solution. With the real data, the calculated registration results were consistent with the results calculated by iterative methods. Conclusions can be drawn from the two experiments: (1) The proposed solution does not require any initial estimates of the unknown parameters in advance, which assures the stability and robustness of the solution; (2) Using dual quaternions to represent spatial transformation greatly reduces the additional constraints in the estimation process.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
T M Mikkola ◽  
H Kautiainen ◽  
M Mänty ◽  
M B von Bonsdorff ◽  
T Kröger ◽  
...  

Abstract Purpose Mortality appears to be lower in family caregivers than in the general population. However, there is lack of knowledge whether the difference in mortality between family caregivers and the general population is dependent on age. The purpose of this study was to analyze all-cause mortality in relation to age in family caregivers and to study their cause-specific mortality using data from multiple Finnish national registers. Methods The data included all individuals, who received family caregiver's allowance in Finland in 2012 (n = 42 256, mean age 67 years, 71% women) and a control population matched for age, sex, and municipality of residence (n = 83 618). Information on dates and causes of death between 2012 and 2017 were obtained from the Finnish Causes of Death Register. Flexible parametric survival modeling and competing risk regression adjusted for socioeconomic status were used. Results The total follow-up time was 717 877 person-years. Family caregivers had lower all-cause mortality than the controls over the follow-up (8.1% vs. 11.6%) both among women (hazard ratio [HR]: 0.64, 95% CI: 0.61-0.68) and men (HR: 0.73, 95% CI: 0.70-0.77). Younger adult caregivers had equal or only slightly lower mortality than their controls, but after age 60, the difference increased markedly resulting in over 10% lower mortality in favor of the caregivers in the oldest age groups. Caregivers had lower mortality for all the causes of death studied, namely cardiovascular, cancer, neurological, external, respiratory, gastrointestinal and dementia than the controls. Of these, the lowest was the risk for dementia (subhazard ratio=0.29, 95%CI: 0.25-0.34). Conclusions Older family caregivers have lower mortality than the age-matched controls from the general population while younger caregivers have similar mortality to their peers. This age-dependent advantage in mortality is likely to reflect selection of healthier individuals into the family caregiver role. Key messages The difference in mortality between family caregivers and the age-matched general population varies considerably with age. Advantage in mortality observed in family caregiver studies is likely to reflect the selection of healthier individuals into the caregiver role, which underestimates the adverse effects of caregiving.


Sign in / Sign up

Export Citation Format

Share Document