Fine Grained Classification of Personal Data Entities with Language Models

2022 ◽  
Author(s):  
Abhinav Nagpal ◽  
Riddhiman Dasgupta ◽  
Balaji Ganesan
Author(s):  
Jonas Austerjost ◽  
Robert Söldner ◽  
Christoffer Edlund ◽  
Johan Trygg ◽  
David Pollard ◽  
...  

Machine vision is a powerful technology that has become increasingly popular and accurate during the last decade due to rapid advances in the field of machine learning. The majority of machine vision applications are currently found in consumer electronics, automotive applications, and quality control, yet the potential for bioprocessing applications is tremendous. For instance, detecting and controlling foam emergence is important for all upstream bioprocesses, but the lack of robust foam sensing often leads to batch failures from foam-outs or overaddition of antifoam agents. Here, we report a new low-cost, flexible, and reliable foam sensor concept for bioreactor applications. The concept applies convolutional neural networks (CNNs), a state-of-the-art machine learning system for image processing. The implemented method shows high accuracy for both binary foam detection (foam/no foam) and fine-grained classification of foam levels.


2020 ◽  
Vol 54 (3) ◽  
pp. 647-696
Author(s):  
Beatriz Fernández ◽  
Fernando Zúñiga ◽  
Ane Berro

Abstract This paper explores the formal expression of two Basque dative argument types in combination with psych nouns and adjectives, in intransitive and transitive clauses: (i) those that express the experiencer, and (ii) those that express the stimulus of the psychological state denoted by the psych noun and adjective. In the intransitive structure involving a dative experiencer (DatExpIS), the stimulus is in the absolutive case, and the intransitive copula izan ‘be’ shows both dative and absolutive agreement. This construction basically corresponds to those built upon the piacere type of psychological verbs typified in (Belletti, Adriana & Luigi Rizzi. 1988. Psych-verbs and θ-theory. Natural Language and Linguistic Theory 6. 291–352) three-way classification of Italian psych verbs. In the intransitive structure involving a dative stimulus (DatStimIS), the experiencer is marked by absolutive case, and the same intransitive copula shows both absolutive and dative agreement (with the latter corresponding to the dative stimulus and not to the experiencer). We show that the behavior of the dative argument in the two constructions is just the opposite of each other regarding a number of morphosyntactic tests, including agreement, constituency, hierarchy and selection. Additionally, we explore two parallel transitive constructions that involve either a dative experiencer and an ergative stimulus (DatExpTS) or a dative stimulus and an ergative experiencer (DatStimTS), which employ the transitive copula *edun ‘have’. Considering these configurations, we propose an extended and more fine-grained typology of psych predicates.


2021 ◽  
Vol 2021 (2) ◽  
pp. 88-110
Author(s):  
Duc Bui ◽  
Kang G. Shin ◽  
Jong-Min Choi ◽  
Junbum Shin

Abstract Privacy policies are documents required by law and regulations that notify users of the collection, use, and sharing of their personal information on services or applications. While the extraction of personal data objects and their usage thereon is one of the fundamental steps in their automated analysis, it remains challenging due to the complex policy statements written in legal (vague) language. Prior work is limited by small/generated datasets and manually created rules. We formulate the extraction of fine-grained personal data phrases and the corresponding data collection or sharing practices as a sequence-labeling problem that can be solved by an entity-recognition model. We create a large dataset with 4.1k sentences (97k tokens) and 2.6k annotated fine-grained data practices from 30 real-world privacy policies to train and evaluate neural networks. We present a fully automated system, called PI-Extract, which accurately extracts privacy practices by a neural model and outperforms, by a large margin, strong rule-based baselines. We conduct a user study on the effects of data practice annotation which highlights and describes the data practices extracted by PI-Extract to help users better understand privacy-policy documents. Our experimental evaluation results show that the annotation significantly improves the users’ reading comprehension of policy texts, as indicated by a 26.6% increase in the average total reading score.


2018 ◽  
Vol 20 (4) ◽  
pp. 28-36 ◽  
Author(s):  
Mohamed Elkholy ◽  
Ahmed Elfatatry
Keyword(s):  

Author(s):  
Kseniia Antipova

This article explores the main approaches of Russian and foreign authors towards big data definition; reflects the classification of data, components of big data; and provides comparative characteristics to legal regulation of big data. The subject of this research is the legislation of the Russian Federation and legislation of the European Union that regulate the activity on collection, processing and use of big data, personal data and information; judicial and arbitration practice of the Russian Federation in the sphere of personal data; normative legal acts of the Russian Federation; governmental regulation of the Russian Federation and foreign countries in the area of processing, use and transmission of data; as well as legal doctrine in the field of research dedicated to the nature of big data. The relevance of this research is substantiated by the fact that there is yet no conceptual uniformity with regards to big data in the world; the essence and methods of regulating big data are not fully explored. The goal of this research is determine the legal qualification of the data that comprise big data. The task lies in giving definition to the term “big data”; demonstrate the approaches towards determination of legal nature of big data; conduct  classification of big data; outline the criteria for distinguishing data that comprise the concept of big data; formulate the model for optimal regulation of relations in the process of activity on collection, processing, and use of the data. The original definition of big data in the narrow and broad sense is provided. As a result, the author distinguishes the types of data, reflects the legal qualification of data depending on the category of data contained therein: industrial data, user data, and personal data. Attention is also turned to the contractual form of big data circulation.


2021 ◽  
Author(s):  
Ming Xie ◽  
Yunpeng Jia ◽  
Ying Li ◽  
Xiaohua Cai ◽  
Kai Cao

Abstract Laser-induced fluorescence (LIF) is an effective, all-weather oil spill identification method that has been widely applied for oil spill monitoring. However, the distinguishability on oil types is seldom considered while selecting excitation wavelength. This study is intended to find the optimal excitation wavelength for fine-grained classification of refined oil pollutants using LIF by comparing the distinguishability of fluorometric spectra under various excitation wavelengths on some typical types of refined-oil samples. The results show that the fluorometric spectra of oil samples significantly vary under different excitation wavelengths, and the four types of oil applied in this study are most likely to be distinguished under the excitation wavelengths of 395 nm and 420 nm. This study is expected to improve the ability of oil types identification using LIF method without increasing time or other cost, and also provides theoretical basis for the development of portable LIF devices for oil spill identification.


Sign in / Sign up

Export Citation Format

Share Document