Improving Classification for Microarray Data Sets by Constructing Synthetic Data

Identification of Candidate Genetic Markers and a Novel 4-genes Diagnostic Model in Osteoarthritis through Integrating Multiple Microarray Data

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666200428120310 ◽

2020 ◽

Vol 23 (8) ◽

pp. 805-813

Author(s):

Ai Jiang ◽

Peng Xu ◽

Zhenda Zhao ◽

Qizhao Tan ◽

Shang Sun ◽

...

Keyword(s):

Signaling Pathway ◽

Microarray Data ◽

Differential Expression Analysis ◽

Enrichment Analysis ◽

Mapk Signaling ◽

Functional Enrichment ◽

Joint Disease ◽

Support Vector ◽

Diagnostic Model ◽

Data Sets

Background: Osteoarthritis (OA) is a joint disease that leads to a high disability rate and a low quality of life. With the development of modern molecular biology techniques, some key genes and diagnostic markers have been reported. However, the etiology and pathogenesis of OA are still unknown. Objective: To develop a gene signature in OA. Method: In this study, five microarray data sets were integrated to conduct a comprehensive network and pathway analysis of the biological functions of OA related genes, which can provide valuable information and further explore the etiology and pathogenesis of OA. Results and Discussion: Differential expression analysis identified 180 genes with significantly expressed expression in OA. Functional enrichment analysis showed that the up-regulated genes were associated with rheumatoid arthritis (p < 0.01). Down-regulated genes regulate the biological processes of negative regulation of kinase activity and some signaling pathways such as MAPK signaling pathway (p < 0.001) and IL-17 signaling pathway (p < 0.001). In addition, the OA specific protein-protein interaction (PPI) network was constructed based on the differentially expressed genes. The analysis of network topological attributes showed that differentially upregulated VEGFA, MYC, ATF3 and JUN genes were hub genes of the network, which may influence the occurrence and development of OA through regulating cell cycle or apoptosis, and were potential biomarkers of OA. Finally, the support vector machine (SVM) method was used to establish the diagnosis model of OA, which not only had excellent predictive power in internal and external data sets (AUC > 0.9), but also had high predictive performance in different chip platforms (AUC > 0.9) and also had effective ability in blood samples (AUC > 0.8). Conclusion: The 4-genes diagnostic model may be of great help to the early diagnosis and prediction of OA.

Download Full-text

Methods for estimating uncertainty in factor analytic solutions

Atmospheric Measurement Techniques ◽

10.5194/amt-7-781-2014 ◽

2014 ◽

Vol 7 (3) ◽

pp. 781-797 ◽

Cited By ~ 174

Author(s):

P. Paatero ◽

S. Eberly ◽

S. G. Brown ◽

G. A. Norris

Keyword(s):

Environmental Protection Agency ◽

Synthetic Data ◽

Analytic Solutions ◽

Data Sets ◽

Random Errors ◽

Data Set ◽

Factor Analytic ◽

Uncertainty Estimates ◽

Multilinear Engine ◽

Analytic Models

Abstract. The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.

Download Full-text

Reverse time migration

Geophysics ◽

10.1190/1.1441434 ◽

1983 ◽

Vol 48 (11) ◽

pp. 1514-1524 ◽

Cited By ~ 855

Author(s):

Edip Baysal ◽

Dan D. Kosloff ◽

John W. C. Sherwood

Keyword(s):

Wave Amplitude ◽

Synthetic Data ◽

Data Sets ◽

Field Observations ◽

Reverse Time ◽

Reverse Time Migration ◽

Time Migration ◽

Migration Method ◽

Zero Offset ◽

Time Extrapolation

Migration of stacked or zero‐offset sections is based on deriving the wave amplitude in space from wave field observations at the surface. Conventionally this calculation has been carried out through a depth extrapolation. We examine the alternative of carrying out the migration through a reverse time extrapolation. This approach may offer improvements over existing migration methods, especially in cases of steeply dipping structures with strong velocity contrasts. This migration method is tested using appropriate synthetic data sets.

Download Full-text

Quasi-2D inversion of DCR and TDEM data for shallow investigations

Geophysics ◽

10.1190/1.3587218 ◽

2011 ◽

Vol 76 (4) ◽

pp. F239-F250 ◽

Cited By ~ 10

Author(s):

Fernando A. Monteiro Santos ◽

Hesham M. El-Kaliouby

Keyword(s):

Joint Inversion ◽

Synthetic Data ◽

Data Sets ◽

Complex Environments ◽

Inversion Algorithm ◽

2D Inversion ◽

New Approach ◽

Earth Models ◽

Time Domain Electromagnetic ◽

Inversion Techniques

Joint or sequential inversion of direct current resistivity (DCR) and time-domain electromagnetic (TDEM) data commonly are performed for individual soundings assuming layered earth models. DCR and TDEM have different and complementary sensitivity to resistive and conductive structures, making them suitable methods for the application of joint inversion techniques. This potential joint inversion of DCR and TDEM methods has been used by several authors to reduce the ambiguities of the models calculated from each method separately. A new approach for joint inversion of these data sets, based on a laterally constrained algorithm, was found. The method was developed for the interpretation of soundings collected along a line over a 1D or 2D geology. The inversion algorithm was tested on two synthetic data sets, as well as on field data from Saudi Arabia. The results show that the algorithm is efficient and stable in producing quasi-2D models from DCR and TDEM data acquired in relatively complex environments.

Download Full-text

Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets

10.1109/iccvw54120.2021.00110 ◽

2021 ◽

Author(s):

Florentin Poucin ◽

Andrea Kraus ◽

Martin Simon

Keyword(s):

Real World ◽

Synthetic Data ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Instance Segmentation

Download Full-text

A Non-Parametric Model for Accurate and Provably Private Synthetic Data Sets

Proceedings of the 12th International Conference on Availability, Reliability and Security - ARES '17 ◽

10.1145/3098954.3098962 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jordi Soria-Comas ◽

Josep Domingo-Ferrer

Keyword(s):

Synthetic Data ◽

Parametric Model ◽

Data Sets ◽

Non Parametric

Download Full-text

Identification of Genes Expressed in Hyperpigmented Skin Using Meta-Analysis of Microarray Data Sets

Journal of Investigative Dermatology ◽

10.1038/jid.2015.179 ◽

2015 ◽

Vol 135 (10) ◽

pp. 2455-2463 ◽

Cited By ~ 7

Author(s):

Lanlan Yin ◽

Sergio G. Coelho ◽

Julio C. Valencia ◽

Dominik Ebsen ◽

Andre Mahns ◽

...

Keyword(s):

Microarray Data ◽

Meta Analysis ◽

Data Sets

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

Sparse reflectivity inversion for nonstationary seismic data with surface-related multiples: Numerical and field-data experiments

Geophysics ◽

10.1190/geo2016-0520.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. R199-R217 ◽

Cited By ~ 3

Author(s):

Xintao Chai ◽

Shangxu Wang ◽

Genyang Tang

Keyword(s):

Seismic Data ◽

Resolution Enhancement ◽

Synthetic Data ◽

Data Sets ◽

Data Set ◽

Anelastic Attenuation ◽

Seismic Resolution ◽

Text Filtering ◽

The Stability ◽

Reflectivity Inversion

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.

Download Full-text

Synthetic Data Sets

Encyclopedia of Social Network Analysis and Mining ◽

10.1007/978-1-4614-7163-9_110190-1 ◽

2017 ◽

pp. 1-4

Author(s):

Sargur N. Srihari

Keyword(s):

Synthetic Data ◽

Data Sets

Download Full-text