DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios

Author(s):  
Xinyu Li ◽  
Fayuan Li ◽  
Lu Pan ◽  
Yuguang Chen ◽  
Weihua Peng ◽  
...  

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Kongfan Zhu ◽  
Rundong Guo ◽  
Weifeng Hu ◽  
Zeqiang Li ◽  
Yujun Li

Legal judgment prediction (LJP), as an effective and critical application in legal assistant systems, aims to determine the judgment results according to the information based on the fact determination. In real-world scenarios, to deal with the criminal cases, judges not only take advantage of the fact description, but also consider the external information, such as the basic information of defendant and the court view. However, most existing works take the fact description as the sole input for LJP and ignore the external information. We propose a Transformer-Hierarchical-Attention-Multi-Extra (THME) Network to make full use of the information based on the fact determination. We conduct experiments on a real-world large-scale dataset of criminal cases in the civil law system. Experimental results show that our method outperforms state-of-the-art LJP methods on all judgment prediction tasks.





2019 ◽  
Vol 66 ◽  
pp. 243-278
Author(s):  
Shashi Narayan ◽  
Shay B. Cohen ◽  
Mirella Lapata

We introduce "extreme summarization," a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question "What is the article about?". We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving research on this task further: (a) we collect a real-world, large scale dataset by harvesting online articles from the British Broadcasting Corporation (BBC); and (b) propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.



Author(s):  
Shen Gao ◽  
Xiuying Chen ◽  
Piji Li ◽  
Zhaochun Ren ◽  
Lidong Bing ◽  
...  

In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abstractive summarization task, reader-aware summarization confronts two main challenges: (1) Comments are informal and noisy; (2) jointly modeling the news document and the reader comments is challenging. To tackle the above challenges, we design an adversarial learning model named reader-aware summary generator (RASG), which consists of four components: (1) a sequence-to-sequence based summary generator; (2) a reader attention module capturing the reader focused aspects; (3) a supervisor modeling the semantic gap between the generated summary and reader focused aspects; (4) a goal tracker producing the goal for each generation step. The supervisor and the goal tacker are used to guide the training of our framework in an adversarial manner. Extensive experiments are conducted on our large-scale real-world text summarization dataset, and the results show that RASG achieves the stateof-the-art performance in terms of both automatic metrics and human evaluations. The experimental results also demonstrate the effectiveness of each module in our framework. We release our large-scale dataset for further research1.



2021 ◽  
Author(s):  
Changhun Jung ◽  
Mohammed Abuhamad ◽  
David Mohaisen ◽  
Kyungja Han ◽  
DaeHun Nyang

Abstract Background: Computer-aided methods for analyzing white blood cells (WBC) are popular due to the complexity of the manual alternatives. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge, in part due to the distribution of the five types that affect the condition of the immune system.Methods: (i) This work proposes W-Net, a CNN-based method for WBC classification. We evaluate W-Net on a real-world large-scale dataset that includes 6,562 real images of the five WBC types. (ii) For further benefits, we generate synthetic WBC images using Generative Adversarial Network to be used for education and research purposes through sharing.Results: (i) W-Net achieves an average accuracy of 97%. In comparison to state-of-the-art methods in the field of WBC classification, we show that W-Net outperforms other CNN- and RNN-based model architectures. Moreover, we show the benefits of using pre-trained W-Net in a transfer learning context when fine-tuned to specific task or accommodating another dataset. (ii) The synthetic WBC images are confirmed by experiments and a domain expert to have a high degree of similarity to the original images. The pre-trained W-Net and the generated WBC dataset are available for the community to facilitate reproducibility and follow up research work.Conclusion: This work proposed W-Net, a CNN-based architecture with a small number of layers, to accurately classify the five WBC types. We evaluated W-Net on a real-world large-scale dataset and addressed several challenges such as the transfer learning property and the class imbalance. W-Net achieved an average classification accuracy of 97%. We synthesized a dataset of new WBC image samples using DCGAN, which we released to the public for education and research purposes.



Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6733
Author(s):  
Hao Luo ◽  
Qingbo Wu ◽  
King Ngi Ngan ◽  
Hanxiao Luo ◽  
Haoran Wei ◽  
...  

Removing raindrops from a single image is a challenging problem due to the complex changes in shape, scale, and transparency among raindrops. Previous explorations have mainly been limited in two ways. First, publicly available raindrop image datasets have limited capacity in terms of modeling raindrop characteristics (e.g., raindrop collision and fusion) in real-world scenes. Second, recent deraining methods tend to apply shape-invariant filters to cope with diverse rainy images and fail to remove raindrops that are especially varied in shape and scale. In this paper, we address these raindrop removal problems from two perspectives. First, we establish a large-scale dataset named RaindropCityscapes, which includes 11,583 pairs of raindrop and raindrop-free images, covering a wide variety of raindrops and background scenarios. Second, a two-branch Multi-scale Shape Adaptive Network (MSANet) is proposed to detect and remove diverse raindrops, effectively filtering the occluded raindrop regions and keeping the clean background well-preserved. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art raindrop removal methods. Moreover, the extension of our method towards the rainy image segmentation and detection tasks validates the practicality of the proposed method in outdoor applications.



Author(s):  
Lei Chen ◽  
Shao-En Weng ◽  
Chu-Jun Peng ◽  
Hong-Han Shuai ◽  
Wen-Huang Cheng


Author(s):  
Chongsheng Zhang ◽  
Ruixing Zong ◽  
Shuang Cao ◽  
Yi Men ◽  
Bofeng Mo

Oracle Bone Inscriptions (OBI) research is very meaningful for both history and literature. In this paper, we introduce our contributions in AI-Powered Oracle Bone (OB) fragments rejoining and OBI recognition. (1) We build a real-world dataset OB-Rejoin, and propose an effective OB rejoining algorithm which yields a top-10 accuracy of 98.39%. (2) We design a practical annotation software to facilitate OBI annotation, and build OracleBone-8000, a large-scale dataset with character-level annotations. We adopt deep learning based scene text detection algorithms for OBI localization, which yield an F-score of 89.7%. We propose a novel deep template matching algorithm for OBI recognition which achieves an overall accuracy of 80.9%. Since we have been cooperating closely with OBI domain experts, our effort above helps advance their research. The resources of this work are available at https://github.com/chongshengzhang/OracleBone.



Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1588-P ◽  
Author(s):  
ROMIK GHOSH ◽  
ASHOK K. DAS ◽  
AMBRISH MITHAL ◽  
SHASHANK JOSHI ◽  
K.M. PRASANNA KUMAR ◽  
...  


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 2258-PUB
Author(s):  
ROMIK GHOSH ◽  
ASHOK K. DAS ◽  
SHASHANK JOSHI ◽  
AMBRISH MITHAL ◽  
K.M. PRASANNA KUMAR ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document