Sentence Embeddings and High-Speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents

Frontiers in Artificial Intelligence and Applications - Legal Knowledge and Information Systems ◽

10.3233/faia200860 ◽

2020 ◽

Author(s):

Hannes Westermann ◽

Jaromír Šavelka ◽

Vern R. Walker ◽

Kevin D. Ashley ◽

Karim Benyekhlef

Keyword(s):

Machine Learning ◽

Similarity Search ◽

High Speed ◽

Type System ◽

Computer Assisted ◽

Proof Of Concept ◽

Concept System ◽

Legal Documents ◽

Fast Computer ◽

Annotation Process

Human-performed annotation of sentences in legal documents is an important prerequisite to many machine learning based systems supporting legal tasks. Typically, the annotation is done sequentially, sentence by sentence, which is often time consuming and, hence, expensive. In this paper, we introduce a proof-of-concept system for annotating sentences “laterally.” The approach is based on the observation that sentences that are similar in meaning often have the same label in terms of a particular type system. We use this observation in allowing annotators to quickly view and annotate sentences that are semantically similar to a given sentence, across an entire corpus of documents. Here, we present the interface of the system and empirically evaluate the approach. The experiments show that lateral annotation has the potential to make the annotation process quicker and more consistent.

Faculty Opinions recommendation of An interpretation algorithm for molecular diagnosis of bacterial vaginosis in a maternity hospital using machine learning: proof-of-concept study.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.736997952.793579734 ◽

2020 ◽

Author(s):

Ronnie Lamont

Keyword(s):

Machine Learning ◽

Bacterial Vaginosis ◽

Molecular Diagnosis ◽

Maternity Hospital ◽

Proof Of Concept ◽

Concept Study ◽

Interpretation Algorithm

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Microplastic adulteration in homogenized fish and seafood - a mid-infrared and machine learning proof of concept

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2021.119985 ◽

2021 ◽

pp. 119985

Author(s):

Stephanie Owen ◽

Samuel Cureton ◽

Mathew Szuhan ◽

Joel McCarten ◽

Panagiota Arvanitis ◽

...

Keyword(s):

Machine Learning ◽

Proof Of Concept ◽

Mid Infrared

High-Speed and Accurate Meat Composition Imaging by Mechanically-Flexible Electrical Impedance Tomography With k-Nearest Neighbor and Fuzzy k-Means Machine Learning Approaches

IEEE Access ◽

10.1109/access.2021.3064315 ◽

2021 ◽

Vol 9 ◽

pp. 38792-38801

Author(s):

P. N. Darma ◽

M. Takei

Keyword(s):

Machine Learning ◽

Electrical Impedance Tomography ◽

High Speed ◽

Electrical Impedance ◽

Nearest Neighbor ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Impedance Tomography ◽

Meat Composition

ECHR-DB: On building an integrated open repository of legal documents for machine learning applications

Information Systems ◽

10.1016/j.is.2021.101822 ◽

2021 ◽

pp. 101822

Author(s):

Alexandre Quemy ◽

Robert Wrembel

Keyword(s):

Machine Learning ◽

Legal Documents ◽

Machine Learning Applications

Exploring real-time fault detection of high-speed train traction motor based on machine learning and wavelet analysis

Neural Computing and Applications ◽

10.1007/s00521-021-06284-0 ◽

2021 ◽

Author(s):

Yanshu Li

Keyword(s):

Machine Learning ◽

Fault Detection ◽

Wavelet Analysis ◽

Real Time ◽

High Speed ◽

High Speed Train ◽

Traction Motor

Chatter Prediction in High Speed Machining of Titanium Alloy (Ti-6Al-4V) using Machine Learning Techniques

Materials Today Proceedings ◽

10.1016/j.matpr.2020.04.286 ◽

2020 ◽

Vol 24 ◽

pp. 350-358

Author(s):

Koshy Zacharia ◽

P. Krishnakumar

Keyword(s):

Machine Learning ◽

Titanium Alloy ◽

High Speed ◽

Machine Learning Techniques ◽

High Speed Machining ◽

Learning Techniques ◽

Chatter Prediction

Geometrical design of a crystal growth system guided by a machine learning algorithm

CrystEngComm ◽

10.1039/d1ce00106j ◽

2021 ◽

Author(s):

Wancheng Yu ◽

Can Zhu ◽

Yosuke Tsunooka ◽

Wei Huang ◽

Yifan Dang ◽

...

Keyword(s):

Machine Learning ◽

Crystal Growth ◽

High Speed ◽

Learning Algorithm ◽

Computational Techniques ◽

Machine Learning Algorithm ◽

Geometrical Design ◽

Large Numbers ◽

Growth System ◽

Speed Method

This study proposes a new high-speed method for designing crystal growth systems. It is capable of optimizing large numbers of parameters simultaneously which is difficult for traditional experimental and computational techniques.

Machine Learning Cutting Force, Surface Roughness, and Tool Life in High Speed Turning Processes

Manufacturing Letters ◽

10.1016/j.mfglet.2021.07.005 ◽

2021 ◽

Author(s):

Yun Zhang ◽

Xiaojie Xu

Keyword(s):

Machine Learning ◽

Surface Roughness ◽

Cutting Force ◽

Tool Life ◽

High Speed ◽

High Speed Turning

Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip

Light Science & Applications ◽

10.1038/s41377-021-00483-z ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Elena Goi ◽

Xi Chen ◽

Qiming Zhang ◽

Benjamin P. Cumming ◽

Steffen Schoenhardt ◽

...

Keyword(s):

Machine Learning ◽

High Speed ◽

Near Infrared ◽

Single Layer ◽

Infrared Region ◽

Optical Devices ◽

Medical Diagnostics ◽

Oxide Semiconductor ◽

Process Data ◽

Neuron Density

AbstractOptical machine learning has emerged as an important research area that, by leveraging the advantages inherent to optical signals, such as parallelism and high speed, paves the way for a future where optical hardware can process data at the speed of light. In this work, we present such optical devices for data processing in the form of single-layer nanoscale holographic perceptrons trained to perform optical inference tasks. We experimentally show the functionality of these passive optical devices in the example of decryptors trained to perform optical inference of single or whole classes of keys through symmetric and asymmetric decryption. The decryptors, designed for operation in the near-infrared region, are nanoprinted on complementary metal-oxide–semiconductor chips by galvo-dithered two-photon nanolithography with axial nanostepping of 10 nm1,2, achieving a neuron density of >500 million neurons per square centimetre. This power-efficient commixture of machine learning and on-chip integration may have a transformative impact on optical decryption3, sensing4, medical diagnostics5 and computing6,7.