scholarly journals Sentence Embeddings and High-Speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents

Author(s):  
Hannes Westermann ◽  
Jaromír Šavelka ◽  
Vern R. Walker ◽  
Kevin D. Ashley ◽  
Karim Benyekhlef

Human-performed annotation of sentences in legal documents is an important prerequisite to many machine learning based systems supporting legal tasks. Typically, the annotation is done sequentially, sentence by sentence, which is often time consuming and, hence, expensive. In this paper, we introduce a proof-of-concept system for annotating sentences “laterally.” The approach is based on the observation that sentences that are similar in meaning often have the same label in terms of a particular type system. We use this observation in allowing annotators to quickly view and annotate sentences that are semantically similar to a given sentence, across an entire corpus of documents. Here, we present the interface of the system and empirically evaluate the approach. The experiments show that lateral annotation has the potential to make the annotation process quicker and more consistent.


2020 ◽  
Author(s):  
Mikołaj Morzy ◽  
Bartłomiej Balcerzak ◽  
Adam Wierzbicki ◽  
Adam Wierzbicki

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.



Author(s):  
Stephanie Owen ◽  
Samuel Cureton ◽  
Mathew Szuhan ◽  
Joel McCarten ◽  
Panagiota Arvanitis ◽  
...  


CrystEngComm ◽  
2021 ◽  
Author(s):  
Wancheng Yu ◽  
Can Zhu ◽  
Yosuke Tsunooka ◽  
Wei Huang ◽  
Yifan Dang ◽  
...  

This study proposes a new high-speed method for designing crystal growth systems. It is capable of optimizing large numbers of parameters simultaneously which is difficult for traditional experimental and computational techniques.



2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Elena Goi ◽  
Xi Chen ◽  
Qiming Zhang ◽  
Benjamin P. Cumming ◽  
Steffen Schoenhardt ◽  
...  

AbstractOptical machine learning has emerged as an important research area that, by leveraging the advantages inherent to optical signals, such as parallelism and high speed, paves the way for a future where optical hardware can process data at the speed of light. In this work, we present such optical devices for data processing in the form of single-layer nanoscale holographic perceptrons trained to perform optical inference tasks. We experimentally show the functionality of these passive optical devices in the example of decryptors trained to perform optical inference of single or whole classes of keys through symmetric and asymmetric decryption. The decryptors, designed for operation in the near-infrared region, are nanoprinted on complementary metal-oxide–semiconductor chips by galvo-dithered two-photon nanolithography with axial nanostepping of 10 nm1,2, achieving a neuron density of >500 million neurons per square centimetre. This power-efficient commixture of machine learning and on-chip integration may have a transformative impact on optical decryption3, sensing4, medical diagnostics5 and computing6,7.



Sign in / Sign up

Export Citation Format

Share Document