Inpainting radar missing data regions with deep learning

Abstract. Missing and low-quality data regions are a frequent problem for weather radars. They stem from a variety of sources: beam blockage, instrument failure, near-ground blind zones, and many others. Filling in missing data regions is often useful for estimating local atmospheric properties and the application of high-level data processing schemes without the need for preprocessing and error-handling steps – feature detection and tracking, for instance. Interpolation schemes are typically used for this task, though they tend to produce unrealistically spatially smoothed results that are not representative of the atmospheric turbulence and variability that are usually resolved by weather radars. Recently, generative adversarial networks (GANs) have achieved impressive results in the area of photo inpainting. Here, they are demonstrated as a tool for infilling radar missing data regions. These neural networks are capable of extending large-scale cloud and precipitation features that border missing data regions into the regions while hallucinating plausible small-scale variability. In other words, they can inpaint missing data with accurate large-scale features and plausible local small-scale features. This method is demonstrated on a scanning C-band and vertically pointing Ka-band radar that were deployed as part of the Cloud Aerosol and Complex Terrain Interactions (CACTI) field campaign. Three missing data scenarios are explored: infilling low-level blind zones and short outage periods for the Ka-band radar and infilling beam blockage areas for the C-band radar. Two deep-learning-based approaches are tested, a convolutional neural network (CNN) and a GAN that optimize pixel-level error or combined pixel-level error and adversarial loss respectively. Both deep-learning approaches significantly outperform traditional inpainting schemes under several pixel-level and perceptual quality metrics.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

Progressive System: A Deep-Learning Framework for Real-Time Data in Industrial Production

Processes ◽

10.3390/pr8060649 ◽

2020 ◽

Vol 8 (6) ◽

pp. 649

Author(s):

Yifeng Liu ◽

Wei Zhang ◽

Wenhao Du

Keyword(s):

Deep Learning ◽

Real Time ◽

Large Scale ◽

Quality Data ◽

Time Data ◽

High Quality ◽

Real Time System ◽

High Quality Data ◽

Learning Framework ◽

Data Accumulation

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.

Download Full-text

Deep learning for intensity mapping observations: component extraction

Monthly Notices of the Royal Astronomical Society Letters ◽

10.1093/mnrasl/slaa088 ◽

2020 ◽

Vol 496 (1) ◽

pp. L54-L58 ◽

Cited By ~ 2

Author(s):

Kana Moriwaki ◽

Nina Filippova ◽

Masato Shirasaki ◽

Naoki Yoshida

Keyword(s):

Deep Learning ◽

Galaxy Formation ◽

Large Scale ◽

Generative Adversarial Networks ◽

Intensity Mapping ◽

Adversarial Networks ◽

Galaxy Formation And Evolution ◽

Formation And Evolution ◽

Emission Line Galaxies ◽

Two Populations

ABSTRACT Line intensity mapping (LIM) is an emerging observational method to study the large-scale structure of the Universe and its evolution. LIM does not resolve individual sources but probes the fluctuations of integrated line emissions. A serious limitation with LIM is that contributions of different emission lines from sources at different redshifts are all confused at an observed wavelength. We propose a deep learning application to solve this problem. We use conditional generative adversarial networks to extract designated information from LIM. We consider a simple case with two populations of emission-line galaxies; H $\rm \alpha$ emitting galaxies at $z$ = 1.3 are confused with [O iii] emitters at $z$ = 2.0 in a single observed waveband at 1.5 $\mu{\textrm m}$. Our networks trained with 30 000 mock observation maps are able to extract the total intensity and the spatial distribution of H $\rm \alpha$ emitting galaxies at $z$ = 1.3. The intensity peaks are successfully located with 74 per cent precision. The precision increases to 91 per cent when we combine five networks. The mean intensity and the power spectrum are reconstructed with an accuracy of ∼10 per cent. The extracted galaxy distributions at a wider range of redshift can be used for studies on cosmology and on galaxy formation and evolution.

Download Full-text

Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices

10.1101/744789 ◽

2019 ◽

Author(s):

Ananya Bhattacharjee ◽

Md. Shamsuzzoha Bayzid

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Missing Data ◽

Phylogenetic Trees ◽

Large Scale ◽

Missing Values ◽

Gene Tree ◽

Estimation Methods ◽

Learning Technique ◽

Distance Matrices

AbstractBackgroundDue to the recent advances in sequencing technologies and species tree estimation methods capable of taking gene tree discordance into account, notable progress has been achieved in constructing large scale phylogenetic trees from genome wide data. However, substantial challenges remain in leveraging this huge amount of molecular data. One of the foremost among these challenges is the need for efficient tools that can handle missing data. Popular distance-based methods such as neighbor joining and UPGMA require that the input distance matrix does not contain any missing values.ResultsWe introduce two highly accurate machine learning based distance imputation techniques. One of our approaches is based on matrix factorization, and the other one is an autoencoder based deep learning technique. We evaluate these two techniques on a collection of simulated and biological datasets, and show that our techniques match or improve upon the best alternate techniques for distance imputation. Moreover, our proposed techniques can handle substantial amount of missing data, to the extent where the best alternate methods fail.ConclusionsThis study shows for the first time the power and feasibility of applying deep learning techniques for imputing distance matrices. The autoencoder based deep learning technique is highly accurate and scalable to large dataset. We have made these techniques freely available as a cross-platform software (available at https://github.com/Ananya-Bhattacharjee/ImputeDistances).

Download Full-text

Deep learning with feature embedding for compound-protein interaction prediction

10.1101/086033 ◽

2016 ◽

Cited By ~ 16

Author(s):

Fangping Wan ◽

Jianyang (Michael) Zeng

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Protein Interactions ◽

Large Scale ◽

Computational Models ◽

Drug Repositioning ◽

Representation Learning ◽

Small Scale ◽

Interaction Prediction ◽

Protein Interaction Prediction

AbstractAccurately identifying compound-protein interactions in silico can deepen our understanding of the mechanisms of drug action and significantly facilitate the drug discovery and development process. Traditional similarity-based computational models for compound-protein interaction prediction rarely exploit the latent features from current available large-scale unlabelled compound and protein data, and often limit their usage on relatively small-scale datasets. We propose a new scheme that combines feature embedding (a technique of representation learning) with deep learning for predicting compound-protein interactions. Our method automatically learns the low-dimensional implicit but expressive features for compounds and proteins from the massive amount of unlabelled data. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline for accurate compound-protein interaction prediction, even when the interaction knowledge of compounds and proteins is entirely unknown. Evaluations on current large-scale databases of the measured compound-protein affinities, such as ChEMBL and BindingDB, as well as known drug-target interactions from DrugBank have demonstrated the superior prediction performance of our method, and suggested that it can offer a useful tool for drug development and drug repositioning.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v3 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

<div>Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 2.38% to 5.27%. The code and the pre-trained model will be available at https://github.com/linlei1214/SITS-BERT upon publication.</div><div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

Overcoming challenges to data quality in the ASPREE clinical trial

Trials ◽

10.1186/s13063-019-3789-2 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Jessica E. Lockery ◽

◽

Taya A. Collyer ◽

Christopher M. Reid ◽

Michael E. Ernst ◽

...

Keyword(s):

Clinical Trials ◽

Missing Data ◽

Data Quality ◽

Large Scale ◽

Controlled Trial ◽

Data Entry ◽

Quality Data ◽

Community Based ◽

Data Set ◽

The Impact

Abstract Background Large-scale studies risk generating inaccurate and missing data due to the complexity of data collection. Technology has the potential to improve data quality by providing operational support to data collectors. However, this potential is under-explored in community-based trials. The Aspirin in reducing events in the elderly (ASPREE) trial developed a data suite that was specifically designed to support data collectors: the ASPREE Web Accessible Relational Database (AWARD). This paper describes AWARD and the impact of system design on data quality. Methods AWARD’s operational requirements, conceptual design, key challenges and design solutions for data quality are presented. Impact of design features is assessed through comparison of baseline data collected prior to implementation of key functionality (n = 1000) with data collected post implementation (n = 18,114). Overall data quality is assessed according to data category. Results At baseline, implementation of user-driven functionality reduced staff error (from 0.3% to 0.01%), out-of-range data entry (from 0.14% to 0.04%) and protocol deviations (from 0.4% to 0.08%). In the longitudinal data set, which contained more than 39 million data values collected within AWARD, 96.6% of data values were entered within specified query range or found to be accurate upon querying. The remaining data were missing (3.4%). Participant non-attendance at scheduled study activity was the most common cause of missing data. Costs associated with cleaning data in ASPREE were lower than expected compared with reports from other trials. Conclusions Clinical trials undertake complex operational activity in order to collect data, but technology rarely provides sufficient support. We find the AWARD suite provides proof of principle that designing technology to support data collectors can mitigate known causes of poor data quality and produce higher-quality data. Health information technology (IT) products that support the conduct of scheduled activity in addition to traditional data entry will enhance community-based clinical trials. A standardised framework for reporting data quality would aid comparisons across clinical trials. Trial registration International Standard Randomized Controlled Trial Number Register, ISRCTN83772183. Registered on 3 March 2005.

Download Full-text

LSUN-Stanford Car Dataset: Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training

Applied Sciences ◽

10.3390/app10144913 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4913

Author(s):

Tin Kramberger ◽

Božidar Potočnik

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Large Scale ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Image Dataset ◽

Image Datasets

Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved car image dataset that would be better suited for GAN training. To improve the performance of the GAN, we coupled the LSUN and Stanford car datasets. A new merged dataset was then pruned in order to adjust zoom levels and reduce the noise of images. This process resulted in fewer images that could be used for training, with increased quality though. This pruned dataset was evaluated by training the StyleGAN with original settings. Pruning the combined LSUN and Stanford datasets resulted in 2,067,710 images of cars with less noise and more adjusted zoom levels. The training of the StyleGAN on the LSUN-Stanford car dataset proved to be superior to the training with just the LSUN dataset by 3.7% using the Fréchet Inception Distance (FID) as a metric. Results pointed out that the proposed LSUN-Stanford car dataset is more consistent and better suited for training GAN neural networks than other currently available large car datasets.

Download Full-text

Automatic Extraction of Seismic Landslides in Large Areas with Complex Environments Based on Deep Learning: An Example of the 2018 Iburi Earthquake, Japan

Remote Sensing ◽

10.3390/rs12233992 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3992

Author(s):

Pengfei Zhang ◽

Chong Xu ◽

Siyuan Ma ◽

Xiaoyi Shao ◽

Yingying Tian ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Rapid Identification ◽

Small Scale ◽

Automatic Extraction ◽

Complex Environments ◽

Loss Assessment ◽

Affected Area ◽

Emergency Rescue ◽

Seismic Landslides

After a major earthquake, the rapid identification and mapping of co-seismic landslides in the whole affected area is of great significance for emergency rescue and loss assessment of seismic hazards. In recent years, researchers have achieved good results in research on a small scale and single environment characteristics of this issue. However, for the whole earthquake-affected area with large scale and complex environments, the correct rate of extracting co-seismic landslides remains low, and there is no ideal method to solve this problem. In this paper, Planet Satellite images with a spatial resolution of 3 m are used to train a seismic landslide recognition model based on the deep learning method to carry out rapid and automatic extraction of landslides triggered by the 2018 Iburi earthquake, Japan. The study area is about 671.87 km2, of which 60% is used to train the model, and the remaining 40% is used to verify the accuracy of the model. The results show that most of the co-seismic landslides can be identified by this method. In this experiment, the verification precision of the model is 0.7965 and the F1 score is 0.8288. This method can intelligently identify and map landslides triggered by earthquakes from Planet images. It has strong practicability and high accuracy. It can provide assistance for earthquake emergency rescue and rapid disaster assessment.

Download Full-text

The applications of deep neural networks to sdBV classification

Open Astronomy ◽

10.1515/astro-d-17-0450 ◽

2017 ◽

Vol 26 (1) ◽

Author(s):

Thomas M. Boudreaux

Keyword(s):

Neural Network ◽

Deep Learning ◽

Large Scale ◽

Feature Detection ◽

Synthetic Data ◽

Acoustic Mode ◽

Training Data ◽

Accurate Analysis ◽

Pulsating Stars ◽

High Speeds

AbstractWith several new large-scale surveys on the horizon, including LSST, TESS, ZTF, and Evryscope, faster and more accurate analysis methods will be required to adequately process the enormous amount of data produced. Deep learning, used in industry for years now, allows for advanced feature detection in minimally prepared datasets at very high speeds; however, despite the advantages of this method, its application to astrophysics has not yet been extensively explored. This dearth may be due to a lack of training data available to researchers. Here we generate synthetic data loosely mimicking the properties of acoustic mode pulsating stars and we show that two separate paradigms of deep learning - the Artificial Neural Network And the Convolutional Neural Network - can both be used to classify this synthetic data effectively. And that additionally this classification can be performed at relatively high levels of accuracy with minimal time spent adjusting network hyperparameters.

Download Full-text