Feature screening with large scale and high dimensional survival data

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text

Investigation of Improved Cooperative Coevolution for Large-Scale Global Optimization Problems

Algorithms ◽

10.3390/a14050146 ◽

2021 ◽

Vol 14 (5) ◽

pp. 146

Author(s):

Aleksei Vakhnin ◽

Evgenii Sopov

Keyword(s):

Global Optimization ◽

Evolutionary Algorithms ◽

Numerical Experiments ◽

Large Scale ◽

Optimization Problems ◽

State Of The Art ◽

Fixed Number ◽

High Dimensional ◽

Cooperative Coevolution ◽

Large Scale Problems

Modern real-valued optimization problems are complex and high-dimensional, and they are known as “large-scale global optimization (LSGO)” problems. Classic evolutionary algorithms (EAs) perform poorly on this class of problems because of the curse of dimensionality. Cooperative Coevolution (CC) is a high-performed framework for performing the decomposition of large-scale problems into smaller and easier subproblems by grouping objective variables. The efficiency of CC strongly depends on the size of groups and the grouping approach. In this study, an improved CC (iCC) approach for solving LSGO problems has been proposed and investigated. iCC changes the number of variables in subcomponents dynamically during the optimization process. The SHADE algorithm is used as a subcomponent optimizer. We have investigated the performance of iCC-SHADE and CC-SHADE on fifteen problems from the LSGO CEC’13 benchmark set provided by the IEEE Congress of Evolutionary Computation. The results of numerical experiments have shown that iCC-SHADE outperforms, on average, CC-SHADE with a fixed number of subcomponents. Also, we have compared iCC-SHADE with some state-of-the-art LSGO metaheuristics. The experimental results have shown that the proposed algorithm is competitive with other efficient metaheuristics.

Download Full-text

High-dimensional optimization of large-scale steel truss structures using guided stochastic search

Structures ◽

10.1016/j.istruc.2021.05.035 ◽

2021 ◽

Vol 33 ◽

pp. 1439-1456

Author(s):

Saeid Kazemzadeh Azad ◽

Saman Aminbakhsh

Keyword(s):

Large Scale ◽

Stochastic Search ◽

Truss Structures ◽

High Dimensional ◽

Steel Truss ◽

Dimensional Optimization

Download Full-text

Fundamental Dynamical Modes Underlying Human Brain Synchronization

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/912729 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Catalina Alvarado-Rojas ◽

Michel Le Van Quyen

Keyword(s):

Large Scale ◽

Brain Regions ◽

Electrode Placement ◽

High Dimensional ◽

Sleep Cycle ◽

Epileptic Patients ◽

Brain States ◽

Intracranial Recordings ◽

Characteristic Dynamic

Little is known about the long-term dynamics of widely interacting cortical and subcortical networks during the wake-sleep cycle. Using large-scale intracranial recordings of epileptic patients during seizure-free periods, we investigated local- and long-range synchronization between multiple brain regions over several days. For such high-dimensional data, summary information is required for understanding and modelling the underlying dynamics. Here, we suggest that a compact yet useful representation is given by a state space based on the first principal components. Using this representation, we report, with a remarkable similarity across the patients with different locations of electrode placement, that the seemingly complex patterns of brain synchrony during the wake-sleep cycle can be represented by a small number of characteristic dynamic modes. In this space, transitions between behavioral states occur through specific trajectories from one mode to another. These findings suggest that, at a coarse level of temporal resolution, the different brain states are correlated with several dominant synchrony patterns which are successively activated across wake-sleep states.

Download Full-text

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Scientific Programming ◽

10.1155/2015/180214 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Sai Kiranmayee Samudrala ◽

Jaroslaw Zola ◽

Srinivas Aluru ◽

Baskar Ganapathysubramanian

Keyword(s):

Dimensionality Reduction ◽

Organic Solar Cells ◽

Large Scale ◽

Parallel Implementation ◽

High Dimensional Data ◽

Real Life ◽

Processing Parameters ◽

High Dimensional ◽

Morphology Evolution ◽

Reduction Techniques

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Download Full-text

Mobile spine chordoma: results of 166 patients from the AOSpine Knowledge Forum Tumor database

Journal of Neurosurgery Spine ◽

10.3171/2015.7.spine15201 ◽

2016 ◽

Vol 24 (4) ◽

pp. 644-651 ◽

Cited By ~ 30

Author(s):

Ziya L. Gokaslan ◽

Patricia L. Zadnik ◽

Daniel M. Sciubba ◽

Niccole Germscheid ◽

C. Rory Goodwin ◽

...

Keyword(s):

Local Recurrence ◽

Survival Data ◽

Large Scale ◽

Surgical Techniques ◽

En Bloc Resection ◽

Bloc Resection ◽

En Bloc ◽

Increased Risk ◽

Significant Difference ◽

Mobile Spine

OBJECT A chordoma is an indolent primary spinal tumor that has devastating effects on the patient's life. These lesions are chemoresistant, resistant to conventional radiotherapy, and moderately sensitive to proton therapy; however, en bloc resection remains the preferred treatment for optimizing patient outcomes. While multiple small and largely retrospective studies have investigated the outcomes following en bloc resection of chordomas in the sacrum, there have been few large-scale studies on patients with chordomas of the mobile spine. The goal of this study was to review the outcomes of surgically treated patients with mobile spine chordomas at multiple international centers with respect to local recurrence and survival. This multiinstitutional retrospective study collected data between 1988 and 2012 about prognosis-predicting factors, including various clinical characteristics and surgical techniques for mobile spine chordoma. Tumors were classified according to the Enneking principles and analyzed in 2 treatment cohorts: Enneking-appropriate (EA) and Enneking-inappropriate (EI) cohorts. Patients were categorized as EA when the final pathological assessment of the margin matched the Enneking recommendation; otherwise, they were categorized as EI. METHODS Descriptive statistics were used to summarize the data (Student t-test, chi-square, and Fisher exact tests). Recurrence and survival data were analyzed using Kaplan-Meier survival curves, log-rank tests, and multivariate Cox proportional hazard modeling. RESULTS A total of 166 patients (55 female and 111 male patients) with mobile spine chordoma were included. The median patient follow-up was 2.6 years (range 1 day to 22.5 years). Fifty-eight (41%) patients were EA and 84 (59%) patients were EI. The type of biopsy (p < 0.001), spinal location (p = 0.018), and if the patient received adjuvant therapy (p < 0.001) were significantly different between the 2 cohorts. Overall, 58 (35%) patients developed local recurrence and 57 (34%) patients died. Median survival was 7.0 years postoperative: 8.4 years postoperative for EA patients and 6.4 years postoperative for EI patients (p = 0.023). The multivariate analysis showed that the EI cohort was significantly associated with an increased risk of local recurrence in comparison with the EA cohort (HR 7.02; 95% CI 2.96–16.6; p < 0.001), although no significant difference in survival was observed. CONCLUSIONS EA resection plays a major role in decreasing the risk for local recurrence in patients with chordoma of the mobile spine.

Download Full-text

Feature screening with large scale and high dimensional survival data

Feature Screening for High-Dimensional Survival Data via Censored Quantile Correlation

Robust feature screening for high-dimensional survival data

Model-free feature screening for high-dimensional survival data

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Investigation of Improved Cooperative Coevolution for Large-Scale Global Optimization Problems

High-dimensional optimization of large-scale steel truss structures using guided stochastic search

Fundamental Dynamical Modes Underlying Human Brain Synchronization

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Mobile spine chordoma: results of 166 patients from the AOSpine Knowledge Forum Tumor database

Export Citation Format