DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization

Diverse Decoding for Abstractive Document Summarization

Applied Sciences ◽

10.3390/app9030386 ◽

2019 ◽

Vol 9 (3) ◽

pp. 386 ◽

Cited By ~ 2

Author(s):

Xu-Wang Han ◽

Hai-Tao Zheng ◽

Jin-Yuan Chen ◽

Cong-Zhi Zhao

Keyword(s):

Experimental Evaluation ◽

State Of The Art ◽

Attention Mechanism ◽

Beam Search ◽

Daily Mail ◽

Document Summarization ◽

Novel Method ◽

Search Approach ◽

Abstractive Summarization ◽

Information Coverage

Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method.

Download Full-text

Graph Networks for Molecular Design

10.26434/chemrxiv.12843137 ◽

2020 ◽

Author(s):

Rocío Mercado ◽

Tobias Rastemo ◽

Edvard Lindelöf ◽

Günter Klambauer ◽

Ola Engkvist ◽

...

Keyword(s):

Network Architecture ◽

Deep Neural Network ◽

Molecular Design ◽

State Of The Art ◽

Generative Models ◽

Single Bond ◽

Neural Network Architecture ◽

Training Set ◽

Design Studies ◽

Graph Neural Networks

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>

Download Full-text

Neural Network Model for Assessing the Physical and Mechanical Properties of a Metal Material Based on Deep Learning

Journal of Digital Science ◽

10.33847/2686-8296.2.1_2 ◽

2020 ◽

pp. 18-28

Author(s):

Andrei Kliuev ◽

Roman Klestov ◽

Valerii Stolbov

Keyword(s):

Neural Network ◽

Mechanical Properties ◽

Deep Neural Network ◽

Physical And Mechanical Properties ◽

Training Set ◽

Test Set ◽

Algorithmic Stability ◽

Test Sets ◽

Trained Network ◽

Basic Test

The paper investigates the algorithmic stability of learning a deep neural network in problems of recognition of the materials microstructure. It is shown that at 8% of quantitative deviation in the basic test set the algorithm trained network loses stability. This means that with such a quantitative or qualitative deviation in the training or test sets, the results obtained with such trained network can hardly be trusted. Although the results of this study are applicable to the particular case, i.e. problems of recognition of the microstructure using ResNet-152, the authors propose a cheaper method for studying stability based on the analysis of the test, rather than the training set.

Download Full-text

Identification Method for Series Arc Faults Based on Wavelet Transform and Deep Neural Network

Energies ◽

10.3390/en13010142 ◽

2019 ◽

Vol 13 (1) ◽

pp. 142 ◽

Cited By ~ 2

Author(s):

Qiongfang Yu ◽

Yaqian Hu ◽

Yi Yang

Keyword(s):

Neural Network ◽

Wavelet Transform ◽

Power Supply ◽

Distribution Systems ◽

Deep Neural Network ◽

Experimental Result ◽

Discrete Wavelet ◽

Training Set ◽

Test Set ◽

Power Distribution System

The power supply quality and power supply safety of a low-voltage residential power distribution system is seriously affected by the occurrence of series arc faults. It is difficult to detect and extinguish them due to the characteristics of small current, high stochasticity, and strong concealment. In order to improve the overall safety of residential distribution systems, a novel method based on discrete wavelet transform (DWT) and deep neural network (DNN) is proposed to detect series arc faults in this paper. An experimental bed is built to obtain current signals under two states, normal and arcing. The collected signals are discomposed in different scales applying the DWT. The wavelet coefficient sequences are used for forming training set and test set. The deep neural network trained by training set under 4 different loads adaptively learn the feature of arc faults. The accuracy of arc faults recognition is sent through feeding test set into the model, about 97.75%. The experimental result shows that this method has good accuracy and generality under different types of loading.

Download Full-text

Graph Networks for Molecular Design

10.26434/chemrxiv.12843137.v1 ◽

2020 ◽

Author(s):

Rocío Mercado ◽

Tobias Rastemo ◽

Edvard Lindelöf ◽

Günter Klambauer ◽

Ola Engkvist ◽

...

Keyword(s):

Network Architecture ◽

Deep Neural Network ◽

Molecular Design ◽

State Of The Art ◽

Generative Models ◽

Single Bond ◽

Neural Network Architecture ◽

Training Set ◽

Design Studies ◽

Graph Neural Networks

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>

Download Full-text

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

Journal of Imaging ◽

10.3390/jimaging6060041 ◽

2020 ◽

Vol 6 (6) ◽

pp. 41 ◽

Cited By ~ 1

Author(s):

Björn Barz ◽

Joachim Denzler

Keyword(s):

Classification Accuracy ◽

State Of The Art ◽

Classification Performance ◽

Abstract Concepts ◽

Original Performance ◽

Generalization Capability ◽

Training Set ◽

Significant Drop ◽

Test Set ◽

Test Sets

The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art.

Download Full-text

DiffChaser: Detecting Disagreements for Deep Neural Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/800 ◽

2019 ◽

Cited By ~ 14

Author(s):

Xiaofei Xie ◽

Lei Ma ◽

Haijun Wang ◽

Yuekang Li ◽

Yang Liu ◽

...

Keyword(s):

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Optimization Procedure ◽

Black Box ◽

Massive Data ◽

Test Set ◽

Testing Framework ◽

Development Lifecycle ◽

Black Box Testing

The platform migration and customization have become an indispensable process of deep neural network (DNN) development lifecycle. A high-precision but complex DNN trained in the cloud on massive data and powerful GPUs often goes through an optimization phase (e.g, quantization, compression) before deployment to a target device (e.g, mobile device). A test set that effectively uncovers the disagreements of a DNN and its optimized variant provides certain feedback to debug and further enhance the optimization procedure. However, the minor inconsistency between a DNN and its optimized version is often hard to detect and easily bypasses the original test set. This paper proposes DiffChaser, an automated black-box testing framework to detect untargeted/targeted disagreements between version variants of a DNN. We demonstrate 1) its effectiveness by comparing with the state-of-the-art techniques, and 2) its usefulness in real-world DNN product deployment involved with quantization and optimization.

Download Full-text

Training machine learning models on patient level data segregation is crucial in practical clinical applications

10.1101/2020.04.23.20076406 ◽

2020 ◽

Author(s):

Mustafa Umit Oner ◽

Yi-Chih Cheng ◽

Hwee Kuan Lee ◽

Wing-Kin Sung

Keyword(s):

Neural Network ◽

Machine Learning ◽

Real World ◽

Deep Neural Network ◽

Strongly Correlated ◽

Training Set ◽

Test Set ◽

The Real ◽

Patient Level ◽

Test Sets

This article discusses the effect of segregation of histopathology images data into three sets; training set for training machine learning model, validation set for model selection and test set for testing model performance. We found that one must be cautious when segregating histological images data (slides) into training, validation and test sets because subtle mishandling of data can introduce data leakage and gives illusively good results on the test set. We performed this study on gene mutation prediction performance by using the deep neural network in the paper of Coudray et al. [1]. By using the provided code and the same set of data, we discovered that data segregation method of the paper suffered from a data leakage problem [2]. The paper pools all the slides from all patients and then segregates them exclusively into training, validation and test sets. In this way, none of the slides is used in more than one set. This seems to be a clean separation of the data. However, the paper did not consider that some slides were strongly correlated. For example, if the tumor of a patient is cut and stained to produce multiple slides, these slides are strongly correlated. If one slide is used for training and another one is used for testing, essentially, the deep neural network can memorize the pattern on the slide in the training set and apply this memory on the slide in the test set. Hence, by memorization, the deep neural network can predict very well on the slide in the test set. This mechanism of prediction is not useful in a practical clinical setting since no two tumors are the same in the real world. In this real setting, we demand the deep neural network to generalize across patients and tumors. Hereafter, we call this way of data segregation slide-level segregation. There is a better way to perform data segregation that is compatible for deployment of deep learning model in practical clinical settings. First, the patients are segregated exclusively into training, validation and test sets. All the slides belonging to the patients in the training set are used solely for training. Similarly, all the slides belonging to the patients in the test set are used for testing only. Segregation of data in this way forces the deep neural network to generalize across patients. We call this way of data segregation patient-level segregation.In slide-level segregation approach analysis, we obtained similar results to that presented in the paper by Coudray et al. [1]: overall performance on the test set was good. However, it was illusory due to data leakage. The model gave very good testing results on the slides that come from a patient who also has slides in the training set. On the other hand, the test result was quite bad on the slides that come from a patient who does not have any slides in the training set. Hereafter, we call the slide in the test set as seen-patient data if the corresponding patient also has some slides in the training set. Otherwise, the slide in the test set is called unseen-patient data if the corresponding patient does not have slides in the training set. Furthermore, we analyzed performance of the model on the data segregated by the patient-level segregation approach. Note that, in this approach, all patients in the test set mimics the real world clinical workflow. We observed a significant drop in the performance of the model on the test set of patient-level segregation approach compared to the performance on the test set of slide-level segregation approach. Moreover, the performance of the model on the test set of patient-level segregation approach was very similar to the performance on the unseen-patients data in the test set of slide-level segregation approach. Hence, we conclude that patient-level segregation approach is crucial and appropriate to simulate real world scenario, where each patient in the test set can be thought as a patient walking into clinic tomorrow.

Download Full-text

Document Summarization Based on Coverage with Noise Injection and Word Association

Information ◽

10.3390/info11110536 ◽

2020 ◽

Vol 11 (11) ◽

pp. 536

Author(s):

Heechan Kim ◽

Soowon Lee

Keyword(s):

Language Processing ◽

Word Association ◽

State Of The Art ◽

The State ◽

Daily Mail ◽

Document Summarization ◽

The Third ◽

Word Sequence ◽

Noise Injection ◽

Automatic Document Summarization

Automatic document summarization is a field of natural language processing that is rapidly improving with the development of end-to-end deep learning models. In this paper, we propose a novel summarization model that consists of three methods. The first is a coverage method based on noise injection that makes the attention mechanism select only important words by defining previous context information as noise. This alleviates the problem that the summarization model generates the same word sequence repeatedly. The second is a word association method to update the information of each word by comparing the information of the current step with the information of all previous decoding steps. According to following words, this catches a change in the meaning of the word that has been already decoded. The third is a method using a suppression loss function that explicitly minimizes the probabilities of non-answer words. The proposed summarization model showed good performance on some recall-oriented understudy for gisting evaluation (ROUGE) metrics compared to the state-of-the-art models in the CNN/Daily Mail summarization task, and the results were achieved with very few learning steps compared to the state-of-the-art models.

Download Full-text

Using Deep Neural Network to Diagnose Thyroid Nodules on Ultrasound in Patients With Hashimoto’s Thyroiditis

Frontiers in Oncology ◽

10.3389/fonc.2021.614172 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yiqing Hou ◽

Chao Chen ◽

Lu Zhang ◽

Wei Zhou ◽

Qinyang Lu ◽

...

Keyword(s):

Neural Network ◽

Hashimoto’S Thyroiditis ◽

High Performance ◽

Deep Neural Network ◽

Thyroid Nodules ◽

Hashimoto's Thyroiditis ◽

Training Set ◽

Test Set ◽

The One ◽

Benign Nodules

ObjectiveThe aim of this study is to develop a model using Deep Neural Network (DNN) to diagnose thyroid nodules in patients with Hashimoto’s Thyroiditis.MethodsIn this retrospective study, we included 2,932 patients with thyroid nodules who underwent thyroid ultrasonogram in our hospital from January 2017 to August 2019. 80% of them were included as training set and 20% as test set. Nodules suspected for malignancy underwent FNA or surgery for pathological results. Two DNN models were trained to diagnose thyroid nodules, and we chose the one with better performance. The features of nodules as well as parenchyma around nodules will be learned by the model to achieve better performance under diffused parenchyma. 10-fold cross-validation and an independent test set were used to evaluate the performance of the algorithm. The performance of the model was compared with that of the three groups of radiologists with clinical experience of <5 years, 5–10 years, >10 years respectively.ResultsIn total, 9,127 images were collected from 2,932 patients with 7,301 images for the training set and 1,806 for the test set. 56% of the patients enrolled had Hashimoto’s Thyroiditis. The model achieved an AUC of 0.924 for distinguishing malignant and benign nodules in the test set. It showed similar performance under diffused thyroid parenchyma and normal parenchyma with sensitivity of 0.881 versus 0.871 (p = 0.938) and specificity of 0.846 versus 0.822 (p = 0.178). In patients with HT, the model achieved an AUC of 0.924 to differentiate malignant and benign nodules which was significantly higher than that of the three groups of radiologists (AUC = 0.824, 0.857, 0.863 respectively, p < 0.05).ConclusionThe model showed high performance in diagnosing thyroid nodules under both normal and diffused parenchyma. In patients with Hashimoto’s Thyroiditis, the model showed a better performance compared to radiologists with various years of experience.

Download Full-text