Fault-Guided Seismic Stratigraphy Interpretation via Semi-Supervised Learning

Mapping Intimacies ◽

10.2118/207218-ms ◽

2021 ◽

Author(s):

Haibin Di ◽

Chakib Kada Kloucha ◽

Cen Li ◽

Aria Abubakar ◽

Zhun Li ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Model Building ◽

Structural Information ◽

Mapping Function ◽

Seismic Stratigraphy ◽

Training Data ◽

Entire Study ◽

Depositional Process ◽

Convolutional Autoencoder

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.

Download Full-text

A review: preprocessing techniques and data augmentation for sentiment analysis

Computational Social Networks ◽

10.1186/s40649-020-00080-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Huu-Thanh Duong ◽

Tram-Anh Nguyen-Thi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Data Augmentation ◽

Original Data ◽

Training Data ◽

Unseen Data ◽

Augmentation Techniques ◽

User Intervention

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning

10.1101/682880 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ge Liu ◽

Haoyang Zeng ◽

Jonas Mueller ◽

Brandon Carter ◽

Ziheng Wang ◽

...

Keyword(s):

Machine Learning ◽

Structural Information ◽

High Capacity ◽

Training Data ◽

Proper Function ◽

Integrative Approach ◽

Machine Learning Method ◽

Learning Method ◽

Target Specificity

AbstractThe precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. Here we present a machine learning method that can design human Immunoglobulin G (IgG) antibodies with target affinities that are superior to candidates from phage display panning experiments within a limited design budget. We also demonstrate that machine learning can improve target-specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data.SignificanceAntibody based therapeutics must meet both affinity and specificity metrics, and existing in vitro methods for meeting these metrics are based upon randomization and empirical testing. We demonstrate that with sufficient target-specific training data machine learning can suggest novel antibody variable domain sequences that are superior to those observed during training. Our machine learning method does not require any target structural information. We further show that data from disparate antibody campaigns can be combined by machine learning to improve antibody specificity.

Download Full-text

Using satellite imagery to understand and promote sustainable development

Science ◽

10.1126/science.abe8628 ◽

2021 ◽

Vol 371 (6535) ◽

pp. eabe8628

Author(s):

Marshall Burke ◽

Anne Driscoll ◽

David B. Lobell ◽

Stefano Ermon

Keyword(s):

Machine Learning ◽

Sustainable Development ◽

Satellite Imagery ◽

Model Building ◽

Model Performance ◽

Training Data ◽

Learning Approaches ◽

Research Directions ◽

Development Outcomes ◽

Research And Policy

Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and improving resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of model performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight research directions for the field.

Download Full-text

Noise Removal Process from Label Classification using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c3920.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 172-175

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Noise Removal ◽

Error Rates ◽

Training Data ◽

Learning Performance ◽

Training Dataset ◽

Noise Filtering ◽

Label Noise

Text classification and clustering approach is essential for big data environments. In supervised learning applications many classification algorithms have been proposed. In the era of big data, a large volume of training data is available in many machine learning works. However, there is a possibility of mislabeled or unlabeled data that are not labeled properly. Some labels may be incorrect resulted in label noise which in turn regress learning performance of a classifier. A general approach to address label noise is to apply noise filtering techniques to identify and remove noise before learning. A range of noise filtering approaches have been developed to improve the classifiers performance. This paper proposes noise filtering approach in text data during the training phase. Many supervised learning algorithms generates high error rates due to noise in training dataset, our work eliminates such noise and provides accurate classification system.

Download Full-text

Adaptive Context-Aware Energy Optimization for Services on Mobile Devices with Use of Machine Learning

Wireless Personal Communications ◽

10.1007/s11277-020-07657-9 ◽

2020 ◽

Vol 115 (3) ◽

pp. 1839-1867

Author(s):

Piotr Nawrocki ◽

Bartlomiej Sniezynski

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Mobile Devices ◽

Mobile Device ◽

Learning Algorithm ◽

Service Selection ◽

Mobile Systems ◽

Training Data ◽

Learning Mechanisms ◽

Network Connection

AbstractIn this paper we present an original adaptive task scheduling system, which optimizes the energy consumption of mobile devices using machine learning mechanisms and context information. The system learns how to allocate resources appropriately: how to schedule services/tasks optimally between the device and the cloud, which is especially important in mobile systems. Decisions are made taking the context into account (e.g. network connection type, location, potential time and cost of executing the application or service). In this study, a supervised learning agent architecture and service selection algorithm are proposed to solve this problem. Adaptation is performed online, on a mobile device. Information about the context, task description, the decision made and its results such as power consumption are stored and constitute training data for a supervised learning algorithm, which updates the knowledge used to determine the optimal location for the execution of a given type of task. To verify the solution proposed, appropriate software has been developed and a series of experiments have been conducted. Results show that as a result of the experience gathered and the learning process performed, the decision module has become more efficient in assigning the task to either the mobile device or cloud resources.

Download Full-text

Comparative Analysis of Machine Learning Models for Day-Ahead Photovoltaic Power Production Forecasting

Energies ◽

10.3390/en14041081 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1081

Author(s):

Spyros Theocharides ◽

Marios Theristis ◽

George Makrides ◽

Marios Kynigos ◽

Chrysovalantis Spanias ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Regression Tree ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Bayesian Neural Network ◽

Production Forecasting ◽

Main Challenge ◽

Machine Learning Models

A main challenge for integrating the intermittent photovoltaic (PV) power generation remains the accuracy of day-ahead forecasts and the establishment of robust performing methods. The purpose of this work is to address these technological challenges by evaluating the day-ahead PV production forecasting performance of different machine learning models under different supervised learning regimes and minimal input features. Specifically, the day-ahead forecasting capability of Bayesian neural network (BNN), support vector regression (SVR), and regression tree (RT) models was investigated by employing the same dataset for training and performance verification, thus enabling a valid comparison. The training regime analysis demonstrated that the performance of the investigated models was strongly dependent on the timeframe of the train set, training data sequence, and application of irradiance condition filters. Furthermore, accurate results were obtained utilizing only the measured power output and other calculated parameters for training. Consequently, useful information is provided for establishing a robust day-ahead forecasting methodology that utilizes calculated input parameters and an optimal supervised learning approach. Finally, the obtained results demonstrated that the optimally constructed BNN outperformed all other machine learning models achieving forecasting accuracies lower than 5%.

Download Full-text

GIS-BASED RAPID EARTHQUAKE EXPOSURE AND VULNERABILITY MAPPING USING LIDAR DEM AND MACHINE LEARNING ALGORITHMS: CASE OF PORAC, PAMPANGA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlvi-4-w6-2021-125-2021 ◽

2021 ◽

Vol XLVI-4/W6-2021 ◽

pp. 125-132

Author(s):

M. J. D. De Los Santos ◽

J. A. Principe

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Seismic Vulnerability ◽

Model Building ◽

Data Access ◽

Machine Learning Algorithms ◽

Training Data ◽

Vulnerability Mapping ◽

Support Vector ◽

Surface Model

Abstract. Disaster risk reduction and management (DRRM) not only requires a thorough understanding of hazards but also knowledge of how much built-up structures are exposed and vulnerable to a specific hazard. This study proposed a rapid earthquake exposure and vulnerability mapping methodology using the municipality of Porac, Pampanaga as a case study. To address the challenges and limitations of data access and availability in DRRM operations, this study utilized Light Detection and Ranging (LiDAR) data and machine learning (ML) algorithms to produce an exposure database and conduct vulnerability estimation in the study area. Buildings were delineated through image thresholding and classification of the normalized Digital Surface Model (nDSM) and an exposure database containing building attributes was created using Geographic Information System (GIS). ML algorithms such as Support Vector Machine (SVM), logistic regression, and Random Forest (RF) were then used to predict the model building type (MBT) of delineated buildings to estimate seismic vulnerability. Results showed that the SVM model yielded the lowest accuracy (53%) while logistic regression and RF models performed fairly (72% and 78% respectively) as indicated by their F-1 scores. To improve the accuracy of the exposure database and vulnerability estimation, this study recommends that the proposed building delineation process be further refined by experimenting with more appropriate thresholds or by conducting point cloud classification instead of pixel-based image classification. Moreover, ground truth MBT samples should be used as training data for MBT prediction. For future work, the methodology proposed in this study can be implemented when conducting earthquake damage assessments.

Download Full-text

Seismic stratigraphy interpretation by deep convolutional neural networks: A semisupervised workflow

Geophysics ◽

10.1190/geo2019-0433.1 ◽

2020 ◽

Vol 85 (4) ◽

pp. WA77-WA86 ◽

Cited By ~ 3

Author(s):

Haibin Di ◽

Zhun Li ◽

Hiren Maniar ◽

Aria Abubakar

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Seismic Data ◽

Network Architecture ◽

Domain Knowledge ◽

Model Building ◽

Seismic Stratigraphy ◽

Training Data ◽

Data Sets ◽

Deep Convolutional Neural Networks

Depicting geologic sequences from 3D seismic surveying is of significant value to subsurface reservoir exploration, but it is usually time- and labor-intensive for manual interpretation by experienced seismic interpreters. We have developed a semisupervised workflow for efficient seismic stratigraphy interpretation by using the state-of-the-art deep convolutional neural networks (CNNs). Specifically, the workflow consists of two components: (1) seismic feature self-learning (SFSL) and (2) stratigraphy model building (SMB), each of which is formulated as a deep CNN. Whereas the SMB is supervised by knowledge from domain experts and the associated CNN uses a similar network architecture typically used in image segmentation, the SFSL is designed as an unsupervised process and thus can be performed backstage while an expert prepares the training labels for the SMB CNN. Compared with conventional approaches, the our workflow is superior in two aspects. First, the SMB CNN, initialized by the SFSL CNN, successfully inherits the prior knowledge of the seismic features in the target seismic data. Therefore, it becomes feasible for completing the supervised training of the SMB CNN more efficiently using only a small amount of training data, for example, less than 0.1% of the available seismic data as demonstrated in this paper. Second, for the convenience of seismic experts in translating their domain knowledge into training labels, our workflow is designed to be applicable to three scenarios, trace-wise, paintbrushing, and full-sectional annotation. The performance of the new workflow is well-verified through application to three real seismic data sets. We conclude that the new workflow is not only capable of providing robust stratigraphy interpretation for a given seismic volume, but it also holds great potential for other problems in seismic data analysis.

Download Full-text

Business environmental analysis for textual data using data mining and sentence-level classification

Industrial Management & Data Systems ◽

10.1108/imds-07-2017-0317 ◽

2019 ◽

Vol 119 (1) ◽

pp. 69-88 ◽

Cited By ~ 5

Author(s):

Yoon-Sung Kim ◽

Hae-Chang Rim ◽

Do-Gil Lee

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Environmental Analysis ◽

Environmental Variables ◽

Classification Systems ◽

Training Data ◽

Semantic Features ◽

Content Type ◽

Textual Data ◽

Traditional Classification

Purpose The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks. Design/methodology/approach This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining. Findings The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems. Research limitations/implications This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies. Originality/value The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

Download Full-text

Generating Video From Images using GAN and CVAE

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6425.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1401-1404

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Generative Models ◽

Generative Model ◽

Learning Systems ◽

Training Data ◽

Variational Autoencoder ◽

Discriminative Model ◽

To Come

In a given scene, people can often easily predict a lot of quick future occasions that may occur. However generalized pixel-level expectation in Machine Learning systems is difficult in light of the fact that it struggles with the ambiguity inherent in predicting what's to come. However, the objective of the paper is to concentrate on predicting the dense direction of pixels in a scene — what will move in the scene, where it will travel, and how it will deform through the span of one second for which we propose a conditional variational autoencoder as a solution for this issue. We likewise propose another structure for assessing generative models through an adversarial procedure, wherein we simultaneously train two models, a generative model G that catches the information appropriation, and a discriminative model D that gauges the likelihood that an example originated from the training data instead of G. We focus on two uses of GANs semi-supervised learning, and the age of pictures that human's find visually realistic. We present the Moments in Time Dataset, an enormous scale human-clarified assortment of one million short recordings relating to dynamic situations unfolding within three seconds.

Download Full-text