Machine-learning based data recovery and its contribution to seismic acquisition: simultaneous application of deblending, trace reconstruction and low-frequency extrapolation

Acquisition of incomplete data, i.e., blended, sparsely sampled, and narrowband data, allows for cost-effective and efficient field seismic operations. This strategy becomes technically acceptable, provided that a satisfactory recovery of the complete data, i.e., deblended, well-sampled and broadband data, is attainable. Hence, we explore a machine-learning approach that simultaneously performs suppression of blending noise, reconstruction of missing traces and extrapolation of low frequencies. We apply a deep convolutional neural network in the framework of supervised learning where we train a network using pairs of incomplete-complete datasets. Incomplete data, which are never used for training and employ different subsurface properties and acquisition scenarios, are subsequently fed into the trained network to predict complete data. We describe matrix representations indicating the contributions of different acquisition strategies to reducing the field operational effort. We also illustrate that the simultaneous implementation of source blending, sparse geometry and band limitation leads to a significant data compression where the size of the incomplete data in the frequency-space domain is much smaller than the size of the complete data. This reduction is indicative of survey cost and duration that our acquisition strategy can save. Both synthetic and field data examples demonstrate the applicability of the proposed approach. Despite the reduced amount of information available in the incomplete data, the results obtained from both numerical and field data cases clearly show that the machine-learning scheme effectively performs deblending, trace reconstruction, and low-frequency extrapolation in a simultaneous fashion. It is noteworthy that no discernible difference in prediction errors between extrapolated frequencies and preexisting frequencies is observed. The approach potentially allows seismic data to be acquired in a significantly compressed manner, while subsequently recovering data of satisfactory quality.

Download Full-text

Bioinspired Low-Frequency Material Characterisation

Advances in Acoustics and Vibration ◽

10.1155/2012/927903 ◽

2012 ◽

Vol 2012 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

C. Hopper ◽

S. Assous ◽

P. B. Wilkinson ◽

D. A. Gunn ◽

P. D. Jackson ◽

...

Keyword(s):

Sound Velocity ◽

Medical Physics ◽

Low Frequency ◽

High Sensitivity ◽

Prediction Errors ◽

Low Frequencies ◽

Longitudinal Sound Velocity ◽

Velocity Prediction ◽

Thickness Measurements

New-coded signals, transmitted by high-sensitivity broadband transducers in the 40–200 kHz range, allow subwavelength material discrimination and thickness determination of polypropylene, polyvinylchloride, and brass samples. Frequency domain spectra enable simultaneous measurement of material properties including longitudinal sound velocity and the attenuation constant as well as thickness measurements. Laboratory test measurements agree well with model results, with sound velocity prediction errors of less than 1%, and thickness discrimination of at least wavelength/15. The resolution of these measurements has only been matched in the past through methods that utilise higher frequencies. The ability to obtain the same resolution using low frequencies has many advantages, particularly when dealing with highly attenuating materials. This approach differs significantly from past biomimetic approaches where actual or simulated animal signals have been used and consequently has the potential for application in a range of fields where both improved penetration and high resolution are required, such as nondestructive testing and evaluation, geophysics, and medical physics.

Download Full-text

Low Frequency Radio Astronomy from Earth Orbit

International Astronomical Union Colloquium ◽

10.1017/s0252921100077563 ◽

1990 ◽

Vol 123 ◽

pp. 508-508

Author(s):

Kurt W. Weiler ◽

Namir E. Kassim

Keyword(s):

Wavelength Range ◽

Radio Astronomy ◽

Low Cost ◽

Low Frequency ◽

Earth Orbit ◽

The Moon ◽

Low Frequencies ◽

Frequency Space ◽

Current Technology

AbstractLow frequency radio astronomy for the purpose of this discussion is defined as frequencies ≲100 MHz. Since the technology is fairly simple at these frequencies and even Jansky’s original observations were made at 20.5 MHz, there have been many years of research at these wavelengths. However, though radio astronomers have been working at low frequencies since the first days of science, the observing limitations and the move of much of the effort to ever shorter wavelengths has meant that most areas still remain to be fully exploited with modern techniques and instruments. In particular, the possibilities for pursuing the very lowest frequencies by interferometry of ground to space, in Earth orbit, or from the Moon promises a rebirth of work in this wavelength range.We present concepts for space-ground VLBI and a fully space-based array in high Earth orbit to pursue the astrophysics which can only be probed at these frequencies. An Orbiting Low Frequency Radio Astronomy Satellite (OLFRAS) and a Low Frequency Space Array (LFSA) are two concepts which will open this last, poorly explored area of astronomy at relatively low cost and well within the limits of current technology.

Download Full-text

Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider

Mathematics ◽

10.3390/math9040415 ◽

2021 ◽

Vol 9 (4) ◽

pp. 415

Author(s):

Yong-Chao Su ◽

Cheng-Yu Wu ◽

Cheng-Hong Yang ◽

Bo-Sheng Li ◽

Sin-Hua Moi ◽

...

Keyword(s):

Machine Learning ◽

Group Size ◽

Incomplete Data ◽

Field Data ◽

Ideal Free Distribution ◽

Rank Test ◽

Significant Feature ◽

P Value ◽

Data Imputation ◽

Spider Webs

Cost–benefit analysis is widely used to elucidate the association between foraging group size and resource size. Despite advances in the development of theoretical frameworks, however, the empirical systems used for testing are hindered by the vagaries of field surveys and incomplete data. This study developed the three approaches to data imputation based on machine learning (ML) algorithms with the aim of rescuing valuable field data. Using 163 host spider webs (132 complete data and 31 incomplete data), our results indicated that the data imputation based on random forest algorithm outperformed classification and regression trees, the k-nearest neighbor, and other conventional approaches (Wilcoxon signed-rank test and correlation difference have p-value from < 0.001–0.030). We then used rescued data based on a natural system involving kleptoparasitic spiders from Taiwan and Vietnam (Argyrodes miniaceus, Theridiidae) to test the occurrence and group size of kleptoparasites in natural populations. Our partial least-squares path modelling (PLS-PM) results demonstrated that the size of the host web (T = 6.890, p = 0.000) is a significant feature affecting group size. The resource size (T = 2.590, p = 0.010) and the microclimate (T = 3.230, p = 0.001) are significant features affecting the presence of kleptoparasites. The test of conformation of group size distribution to the ideal free distribution (IFD) model revealed that predictions pertaining to per-capita resource size were underestimated (bootstrap resampling mean slopes <IFD predicted slopes, p < 0.001). These findings highlight the importance of applying appropriate ML methods to the handling of missing field data.

Download Full-text

On a modified algorithm for the autoregressive recovery of the acoustic impedance

Geophysics ◽

10.1190/1.1441635 ◽

1984 ◽

Vol 49 (12) ◽

pp. 2190-2192 ◽

Cited By ~ 4

Author(s):

Tad. J. Ulrych ◽

Colin Walker

Keyword(s):

Acoustic Impedance ◽

Low Frequency ◽

Prediction Errors ◽

Low Frequencies ◽

Limited Frequency ◽

Band Limited ◽

Ar Modeling ◽

Ar Process ◽

Backward Prediction ◽

Zero Frequency

In a recent paper, Walker and Ulrych (1983) presented an algorithm for the recovery of the acoustic impedance from band‐limited seismic reflection records. The approach used is based on the autoregressive (AR) modeling of the band‐limited frequency transform of the data. This modeling procedure allows prediction of both the high and low missing frequencies. The low frequencies, which are particularly important in the inversion for the acoustic impedance, are determined by considering the low‐frequency band as a gap of missing data which is centered at zero frequency. The gap is filled by minimizing the sum of the squared forward and backward prediction errors which result when the known spectral data are modeled as an AR process.

Download Full-text

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

JMIR Public Health and Surveillance ◽

10.2196/30824 ◽

2021 ◽

Vol 7 (10) ◽

pp. e30824

Author(s):

Hansle Gwon ◽

Imjin Ahn ◽

Yunha Kim ◽

Hee Jun Kang ◽

Hyeram Seo ◽

...

Keyword(s):

Machine Learning ◽

Incomplete Data ◽

Missing Values ◽

Pearson Correlation ◽

Laboratory Data ◽

Complete Data ◽

Learning System ◽

Rank Test ◽

P Value ◽

Missing Value

Background When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. Objective The objective of this study was to impute numeric medical data such as physical data and laboratory data. We aimed to effectively impute data using a progressive method called self-training in the medical field where training data are scarce. Methods In this paper, we propose a self-training method that gradually increases the available data. Models trained with complete data predict the missing values in incomplete data. Among the incomplete data, the data in which the missing value is validly predicted are incorporated into the complete data. Using the predicted value as the actual value is called pseudolabeling. This process is repeated until the condition is satisfied. The most important part of this process is how to evaluate the accuracy of pseudolabels. They can be evaluated by observing the effect of the pseudolabeled data on the performance of the model. Results In self-training using random forest (RF), mean squared error was up to 12% lower than pure RF, and the Pearson correlation coefficient was 0.1% higher. This difference was confirmed statistically. In the Friedman test performed on MICE and RF, self-training showed a P value between .003 and .02. A Wilcoxon signed-rank test performed on the mean imputation showed the lowest possible P value, 3.05e-5, in all situations. Conclusions Self-training showed significant results in comparing the predicted values and actual values, but it needs to be verified in an actual machine learning system. And self-training has the potential to improve performance according to the pseudolabel evaluation method, which will be the main subject of our future research.

Download Full-text

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study (Preprint)

10.2196/preprints.30824 ◽

2021 ◽

Author(s):

Hansle Gwon ◽

Imjin Ahn ◽

Yunha Kim ◽

Hee Jun Kang ◽

Hyeram Seo ◽

...

Keyword(s):

Machine Learning ◽

Incomplete Data ◽

Missing Values ◽

Pearson Correlation ◽

Laboratory Data ◽

Complete Data ◽

Learning System ◽

Training Data ◽

Rank Test ◽

Missing Value

BACKGROUND When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. OBJECTIVE The objective of this study was to impute numeric medical data such as physical data and laboratory data. We aimed to effectively impute data using a progressive method called self-training in the medical field where training data are scarce. METHODS In this paper, we propose a self-training method that gradually increases the available data. Models trained with complete data predict the missing values in incomplete data. Among the incomplete data, the data in which the missing value is validly predicted are incorporated into the complete data. Using the predicted value as the actual value is called pseudolabeling. This process is repeated until the condition is satisfied. The most important part of this process is how to evaluate the accuracy of pseudolabels. They can be evaluated by observing the effect of the pseudolabeled data on the performance of the model. RESULTS In self-training using random forest (RF), mean squared error was up to 12% lower than pure RF, and the Pearson correlation coefficient was 0.1% higher. This difference was confirmed statistically. In the Friedman test performed on MICE and RF, self-training showed a P value between .003 and .02. A Wilcoxon signed-rank test performed on the mean imputation showed the lowest possible P value, 3.05e-5, in all situations. CONCLUSIONS Self-training showed significant results in comparing the predicted values and actual values, but it needs to be verified in an actual machine learning system. And self-training has the potential to improve performance according to the pseudolabel evaluation method, which will be the main subject of our future research.

Download Full-text

The Low frequency Space Array (LFSA)

Symposium - International Astronomical Union ◽

10.1017/s0074180900135260 ◽

1988 ◽

Vol 129 ◽

pp. 459-460

Author(s):

K. W. Weiler ◽

B. K. Dennison ◽

K. J. Johnston ◽

R. S. Simon ◽

J. H. Spencer ◽

...

Keyword(s):

High Resolution ◽

Milky Way ◽

Low Frequency ◽

High Sensitivity ◽

Electromagnetic Spectrum ◽

Low Frequencies ◽

Frequency Space ◽

Physical Limit ◽

Path Lengths ◽

Fundamental Physical

At the lowest radio frequencies (≤30 MHz), the Earth's ionosphere transmits poorly or not at all. This relatively unexplored region of the electromagnetic spectrum is thus an area where high resolution, high sensitivity observations can open a new window for astronomical investigations. Also, extending observations down to very low frequencies brings astronomy to a fundamental physical limit where the Milky Way becomes optically thick over relatively short path lengths due to diffuse free-free absorption.

Download Full-text

USING FIELD DATA FOR THE EVALUATION OF THE IMPACT OF ULTRA-LOW FREQUENCY MOTIONS ON THE FATIGUE OF MOORING LINES

10.26678/abcm.cobem2019.cob2019-0608 ◽

2019 ◽

Author(s):

Guilherme Borzacchiello ◽

Carl Albrecht ◽

Fabricio N Correa ◽

Breno Jacob ◽

Guilherme da Silva Leal

Keyword(s):

Field Data ◽

Low Frequency ◽

Mooring Lines ◽

The Impact

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

An Experimental Campaign in Antarctica for the Calibration of Low-frequency Space-borne Radiometers

2006 IEEE International Symposium on Geoscience and Remote Sensing ◽

10.1109/igarss.2006.1021 ◽

2006 ◽

Cited By ~ 3

Author(s):

G. Macelloni ◽

M. Brogioni ◽

P. Pampaloni

Keyword(s):

Low Frequency ◽

Frequency Space ◽

Experimental Campaign

Download Full-text