A method for adequate selection of training data sets to reconstruct seismic field data using a convolutional U-Net

Geophysics ◽  
2021 ◽  
pp. 1-103
Author(s):  
Jiho Park ◽  
Jihun Choi ◽  
Soon Jee Seol ◽  
Joongmoo Byun ◽  
Young Kim

Deep learning (DL) methods are recently introduced for seismic signal processing. Using DL methods, many researchers have adopted these novel techniques in an attempt to construct a DL model for seismic data reconstruction. The performance of DL-based methods depends heavily on what is learned from the training data. We focus on constructing the DL model that well reflect the features of target data sets. The main goal is to integrate DL with an intuitive data analysis approach that compares similar patterns prior to the DL training stage. We have developed a two-sequential method consisting of two stage: (i) analyzing training and target data sets simultaneously for determining target-informed training set and (ii) training the DL model with this training data set to effectively interpolate the seismic data. Here, we introduce the convolutional autoencoder t-distributed stochastic neighbor embedding (CAE t-SNE) analysis that can provide the insight into the results of interpolation through the analysis of both the training and target data sets prior to DL model training. The proposed method were tested with synthetic and field data. Dense seismic gathers (e.g. common-shot gathers; CSGs) were used as a labeled training data set, and relatively sparse seismic gather (e.g. common-receiver gathers; CRGs) were reconstructed in both cases. The reconstructed results and SNRs demonstrated that the training data can be efficiently selected using CAE t-SNE analysis and the spatial aliasing of CRGs was successfully alleviated by the trained DL model with this training data, which contain target features. These results imply that the data analysis for selecting target-informed training set is very important for successful DL interpolation. Additionally, the proposed analysis method can also be applied to investigate the similarities between training and target data sets for another DL-based seismic data reconstruction tasks.

2020 ◽  
pp. 666-679 ◽  
Author(s):  
Xuhong Zhang ◽  
Toby C. Cornish ◽  
Lin Yang ◽  
Tellen D. Bennett ◽  
Debashis Ghosh ◽  
...  

PURPOSE We focus on the problem of scarcity of annotated training data for nucleus recognition in Ki-67 immunohistochemistry (IHC)–stained pancreatic neuroendocrine tumor (NET) images. We hypothesize that deep learning–based domain adaptation is helpful for nucleus recognition when image annotations are unavailable in target data sets. METHODS We considered 2 different institutional pancreatic NET data sets: one (ie, source) containing 38 cases with 114 annotated images and the other (ie, target) containing 72 cases with 20 annotated images. The gold standards were manually annotated by 1 pathologist. We developed a novel deep learning–based domain adaptation framework to count different types of nuclei (ie, immunopositive tumor, immunonegative tumor, nontumor nuclei). We compared the proposed method with several recent fully supervised deep learning models, such as fully convolutional network-8s (FCN-8s), U-Net, fully convolutional regression network (FCRN) A, FCRNB, and fully residual convolutional network (FRCN). We also evaluated the proposed method by learning with a mixture of converted source images and real target annotations. RESULTS Our method achieved an F1 score of 81.3% and 62.3% for nucleus detection and classification in the target data set, respectively. Our method outperformed FCN-8s (53.6% and 43.6% for nucleus detection and classification, respectively), U-Net (61.1% and 47.6%), FCRNA (63.4% and 55.8%), and FCRNB (68.2% and 60.6%) in terms of F1 score and was competitive with FRCN (81.7% and 70.7%). In addition, learning with a mixture of converted source images and only a small set of real target labels could further boost the performance. CONCLUSION This study demonstrates that deep learning–based domain adaptation is helpful for nucleus recognition in Ki-67 IHC stained images when target data annotations are not available. It would improve the applicability of deep learning models designed for downstream supervised learning tasks on different data sets.


2021 ◽  
Vol 14 (2) ◽  
pp. 120-128
Author(s):  
Mohammed Ehsan Safi ◽  
Eyad I. Abbas

In personal image recognition algorithms, two effective factors govern the system’s evaluation, recognition rate and size of the database. Unfortunately, the recognition rate proportional to the increase in training sets. Consequently, that increases the processing time and memory limitation problems. This paper’s main goal was to present a robust algorithm with minimum data sets and a high recognition rate. Images for ten persons were chosen as a database, nine images for each individual as the full version of the training data set, and one image for each person out of the training set as a test pattern before the database reduction procedure. The proposed algorithm integrates Principal Component Analysis (PCA) as a feature extraction technique with the minimum means of clusters and Euclidean Distance to achieve personal recognition. After indexing the training set for each person, the clustering of the differences is determined. The recognition of the person represented by the minimum mean index; this process returned with each reduction. The experimental results show that the recognition rate is 100% despite reducing the training sets to 44%, while the recognition rate decrease to 70% when the reduction reaches 89%. The clear picture out is the results of the proposed system support the idea of the redaction of training sets in addition to obtaining a high recognition rate based on application requirements.


Author(s):  
Feng Qian ◽  
Cangcang Zhang ◽  
Lingtian Feng ◽  
Cai Lu ◽  
Gulan Zhang ◽  
...  

2014 ◽  
Vol 539 ◽  
pp. 181-184
Author(s):  
Wan Li Zuo ◽  
Zhi Yan Wang ◽  
Ning Ma ◽  
Hong Liang

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.


Geophysics ◽  
2017 ◽  
Vol 82 (3) ◽  
pp. R199-R217 ◽  
Author(s):  
Xintao Chai ◽  
Shangxu Wang ◽  
Genyang Tang

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.


2021 ◽  
Author(s):  
Louise Bloch ◽  
Christoph M. Friedrich

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.


Geophysics ◽  
2018 ◽  
Vol 83 (4) ◽  
pp. M41-M48 ◽  
Author(s):  
Hongwei Liu ◽  
Mustafa Naser Al-Ali

The ideal approach for continuous reservoir monitoring allows generation of fast and accurate images to cope with the massive data sets acquired for such a task. Conventionally, rigorous depth-oriented velocity-estimation methods are performed to produce sufficiently accurate velocity models. Unlike the traditional way, the target-oriented imaging technology based on the common-focus point (CFP) theory can be an alternative for continuous reservoir monitoring. The solution is based on a robust data-driven iterative operator updating strategy without deriving a detailed velocity model. The same focusing operator is applied on successive 3D seismic data sets for the first time to generate efficient and accurate 4D target-oriented seismic stacked images from time-lapse field seismic data sets acquired in a [Formula: see text] injection project in Saudi Arabia. Using the focusing operator, target-oriented prestack angle domain common-image gathers (ADCIGs) could be derived to perform amplitude-versus-angle analysis. To preserve the amplitude information in the ADCIGs, an amplitude-balancing factor is applied by embedding a synthetic data set using the real acquisition geometry to remove the geometry imprint artifact. Applying the CFP-based target-oriented imaging to time-lapse data sets revealed changes at the reservoir level in the poststack and prestack time-lapse signals, which is consistent with the [Formula: see text] injection history and rock physics.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Sign in / Sign up

Export Citation Format

Share Document