Semi-supervised Wafer Map Pattern Recognition using Domain-Specific Data Augmentation and Contrastive Learning

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

When Do Auditors Use Specialists' Work to Improve Problem Representations of and Judgments about Complex Estimates?

The Accounting Review ◽

10.2308/accr-51926 ◽

2017 ◽

Vol 93 (4) ◽

pp. 177-202 ◽

Cited By ~ 21

Author(s):

Emily E. Griffith

Keyword(s):

Pattern Recognition ◽

Situational Factors ◽

Situational Factor ◽

High Quality ◽

Quality Audit ◽

Domain Specific ◽

Epistemic Motivation ◽

Specific Expertise ◽

Problem Representations

ABSTRACT Auditors are more likely to identify misstatements in complex estimates if they recognize problematic patterns among an estimate's underlying assumptions. Rich problem representations aid pattern recognition, but auditors likely have difficulty developing them given auditors' limited domain-specific expertise in this area. In two experiments, I predict and find that a relational cue in a specialist's work highlighting aggressive assumptions improves auditors' problem representations and subsequent judgments about estimates. However, this improvement only occurs when a situational factor (e.g., risk) increases auditors' epistemic motivation to incorporate the cue into their problem representations. These results suggest that auditors do not always respond to cues in specialists' work. More generally, this study highlights the role of situational factors in increasing auditors' epistemic motivation to develop rich problem representations, which contribute to high-quality audit judgments in this and other domains where pattern recognition is important.

Download Full-text

DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping

10.1101/2020.09.04.283234 ◽

2020 ◽

Author(s):

Geoffrey Schau ◽

Erik Burlingame ◽

Young Hwan Chang

Keyword(s):

Deep Learning ◽

Complete Information ◽

Specific Information ◽

Multimodal Integration ◽

Specific Data ◽

Domain Specific ◽

Cross Domain ◽

Input Feature ◽

Novel Approach ◽

Latent Representations

AbstractDeep learning systems have emerged as powerful mechanisms for learning domain translation models. However, in many cases, complete information in one domain is assumed to be necessary for sufficient cross-domain prediction. In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data domains while filtering out domain-specific data features. We introduce a novel approach to identify domainspecific information from sets of unpaired measurements in complementary data domains by considering a deep learning cross-domain autoencoder architecture designed to learn shared latent representations of data while enabling domain translation. We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific information on a toy dataset.

Download Full-text

Compressed domain-specific data processing and analysis

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8257941 ◽

2017 ◽

Author(s):

Dapeng Dong ◽

John Herbert

Keyword(s):

Data Processing ◽

Compressed Domain ◽

Specific Data ◽

Domain Specific

Download Full-text

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00305 ◽

2020 ◽

Vol 8 ◽

pp. 141-155

Author(s):

Kai Sun ◽

Dian Yu ◽

Dong Yu ◽

Claire Cardie

Keyword(s):

Reading Comprehension ◽

Prior Knowledge ◽

Data Augmentation ◽

Multiple Choice ◽

Model Performance ◽

Free Form ◽

World Knowledge ◽

Domain Specific ◽

Significant Performance ◽

Machine Reading

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C3 is available at https://dataset.org/c3/ .

Download Full-text

Domain-specific data mining for residents' transit pattern retrieval from incomplete information

Journal of Network and Computer Applications ◽

10.1016/j.jnca.2019.02.016 ◽

2019 ◽

Vol 134 ◽

pp. 62-71 ◽

Cited By ~ 2

Author(s):

Yongxin Liu ◽

Jianqiang Li ◽

Zhong Ming ◽

Houbing Song ◽

Xiaoxiong Weng ◽

...

Keyword(s):

Data Mining ◽

Incomplete Information ◽

Specific Data ◽

Domain Specific ◽

Pattern Retrieval

Download Full-text

Semi-supervised Wafer Map Pattern Recognition using Domain-Specific Data Augmentation and Contrastive Learning

Improving real-time CNN-based pupil detection through domain-specific data augmentation

Evaluation of domain specific data augmentation techniques for the classification of celiac disease using endoscopic imagery

Domain‐specific data augmentation for segmenting MR images of fatty infiltrated human thighs with neural networks

Domain-specific data augmentation for on-road object detection based on a deep neural network

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

When Do Auditors Use Specialists' Work to Improve Problem Representations of and Judgments about Complex Estimates?

DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping

Compressed domain-specific data processing and analysis

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Domain-specific data mining for residents' transit pattern retrieval from incomplete information

Export Citation Format