Fine Grained Classification of Personal Data Entities with Language Models

Machine vision is a powerful technology that has become increasingly popular and accurate during the last decade due to rapid advances in the field of machine learning. The majority of machine vision applications are currently found in consumer electronics, automotive applications, and quality control, yet the potential for bioprocessing applications is tremendous. For instance, detecting and controlling foam emergence is important for all upstream bioprocesses, but the lack of robust foam sensing often leads to batch failures from foam-outs or overaddition of antifoam agents. Here, we report a new low-cost, flexible, and reliable foam sensor concept for bioreactor applications. The concept applies convolutional neural networks (CNNs), a state-of-the-art machine learning system for image processing. The implemented method shows high accuracy for both binary foam detection (foam/no foam) and fine-grained classification of foam levels.

Download Full-text

Datives with psych nouns and adjectives in Basque

Folia Linguistica ◽

10.1515/flin-2020-2050 ◽

2020 ◽

Vol 54 (3) ◽

pp. 647-696

Author(s):

Beatriz Fernández ◽

Fernando Zúñiga ◽

Ane Berro

Keyword(s):

Natural Language ◽

Linguistic Theory ◽

Psychological State ◽

Formal Expression ◽

Fine Grained ◽

Psych Verbs ◽

Other Regarding ◽

Psychological Verbs

Abstract This paper explores the formal expression of two Basque dative argument types in combination with psych nouns and adjectives, in intransitive and transitive clauses: (i) those that express the experiencer, and (ii) those that express the stimulus of the psychological state denoted by the psych noun and adjective. In the intransitive structure involving a dative experiencer (DatExpIS), the stimulus is in the absolutive case, and the intransitive copula izan ‘be’ shows both dative and absolutive agreement. This construction basically corresponds to those built upon the piacere type of psychological verbs typified in (Belletti, Adriana & Luigi Rizzi. 1988. Psych-verbs and θ-theory. Natural Language and Linguistic Theory 6. 291–352) three-way classification of Italian psych verbs. In the intransitive structure involving a dative stimulus (DatStimIS), the experiencer is marked by absolutive case, and the same intransitive copula shows both absolutive and dative agreement (with the latter corresponding to the dative stimulus and not to the experiencer). We show that the behavior of the dative argument in the two constructions is just the opposite of each other regarding a number of morphosyntactic tests, including agreement, constituency, hierarchy and selection. Additionally, we explore two parallel transitive constructions that involve either a dative experiencer and an ergative stimulus (DatExpTS) or a dative stimulus and an ergative experiencer (DatStimTS), which employ the transitive copula *edun ‘have’. Considering these configurations, we propose an extended and more fine-grained typology of psych predicates.

Download Full-text

Automated Extraction and Presentation of Data Practices in Privacy Policies

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0019 ◽

2021 ◽

Vol 2021 (2) ◽

pp. 88-110

Author(s):

Duc Bui ◽

Kang G. Shin ◽

Jong-Min Choi ◽

Junbum Shin

Keyword(s):

User Study ◽

Personal Information ◽

Personal Data ◽

Neural Model ◽

Automated Analysis ◽

Entity Recognition ◽

Automated System ◽

Privacy Policies ◽

Fine Grained ◽

Data Practices

Abstract Privacy policies are documents required by law and regulations that notify users of the collection, use, and sharing of their personal information on services or applications. While the extraction of personal data objects and their usage thereon is one of the fundamental steps in their automated analysis, it remains challenging due to the complex policy statements written in legal (vague) language. Prior work is limited by small/generated datasets and manually created rules. We formulate the extraction of fine-grained personal data phrases and the corresponding data collection or sharing practices as a sequence-labeling problem that can be solved by an entity-recognition model. We create a large dataset with 4.1k sentences (97k tokens) and 2.6k annotated fine-grained data practices from 30 real-world privacy policies to train and evaluate neural networks. We present a fully automated system, called PI-Extract, which accurately extracts privacy practices by a neural model and outperforms, by a large margin, strong rule-based baselines. We conduct a user study on the effects of data practice annotation which highlights and describes the data practices extracted by PI-Extract to help users better understand privacy-policy documents. Our experimental evaluation results show that the annotation significantly improves the users’ reading comprehension of policy texts, as indicated by a 26.6% increase in the average total reading score.

Download Full-text

Automatic Score Range Classification of Korean Essays Using Deep Learning-based Korean Language Models - The Case of KoBERT & KoGPT2 -

Journal of the International Network for Korean Language and Culture ◽

10.15652/ink.2021.18.1.217 ◽

2021 ◽

Vol 18 (1) ◽

pp. 217-241

Author(s):

Heeryeon Cho ◽

◽

Yumi Yi ◽

Hyeonyeol Im ◽

Junwoo Cha ◽

...

Keyword(s):

Deep Learning ◽

Language Models ◽

Korean Language ◽

Score Range

Download Full-text

Fine-grained Classification of Malicious Code Based on CNN and Multi-resolution Feature Fusion

10.1109/iccia52886.2021.00031 ◽

2021 ◽

Author(s):

Junmiao Liang ◽

Zhenhu Ning ◽

Yihua Zhou ◽

Dongzhi Cao

Keyword(s):

Feature Fusion ◽

Malicious Code ◽

Fine Grained

Download Full-text

Change Taxonomy: A Fine-Grained Classification of Software Change

IT Professional ◽

10.1109/mitp.2018.043141666 ◽

2018 ◽

Vol 20 (4) ◽

pp. 28-36 ◽

Cited By ~ 1

Author(s):

Mohamed Elkholy ◽

Ahmed Elfatatry

Keyword(s):

Fine Grained ◽

Software Change

Download Full-text

Multi-value Classification of Ambiguous Personal Data

Communications in Computer and Information Science - New Trends in Model and Data Engineering ◽

10.1007/978-3-030-32213-7_16 ◽

2019 ◽

pp. 202-208

Author(s):

Sigal Assaf ◽

Ariel Farkash ◽

Micha Moffie

Keyword(s):

Personal Data

Download Full-text

Privacy-Preserving Classification of Personal Data with Fully Homomorphic Encryption: An Application to High-Quality Ionospheric Data Prediction

Machine Learning for Cyber Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-62223-7_38 ◽

2020 ◽

pp. 437-446

Author(s):

Zheng Li ◽

Maohua Sun

Keyword(s):

Homomorphic Encryption ◽

Personal Data ◽

Privacy Preserving ◽

Fully Homomorphic Encryption ◽

High Quality ◽

Data Prediction

Download Full-text

Methods of big data definition: Russian and foreign experience

Юридические исследования ◽

10.25136/2409-7136.2021.9.36591 ◽

2021 ◽

pp. 143-157

Author(s):

Kseniia Antipova

Keyword(s):

Big Data ◽

Russian Federation ◽

Personal Data ◽

Legal Regulation ◽

The European Union ◽

Legal Doctrine ◽

Original Definition ◽

Data Definition ◽

The Russian Federation

This article explores the main approaches of Russian and foreign authors towards big data definition; reflects the classification of data, components of big data; and provides comparative characteristics to legal regulation of big data. The subject of this research is the legislation of the Russian Federation and legislation of the European Union that regulate the activity on collection, processing and use of big data, personal data and information; judicial and arbitration practice of the Russian Federation in the sphere of personal data; normative legal acts of the Russian Federation; governmental regulation of the Russian Federation and foreign countries in the area of processing, use and transmission of data; as well as legal doctrine in the field of research dedicated to the nature of big data. The relevance of this research is substantiated by the fact that there is yet no conceptual uniformity with regards to big data in the world; the essence and methods of regulating big data are not fully explored. The goal of this research is determine the legal qualification of the data that comprise big data. The task lies in giving definition to the term “big data”; demonstrate the approaches towards determination of legal nature of big data; conduct  classification of big data; outline the criteria for distinguishing data that comprise the concept of big data; formulate the model for optimal regulation of relations in the process of activity on collection, processing, and use of the data. The original definition of big data in the narrow and broad sense is provided. As a result, the author distinguishes the types of data, reflects the legal qualification of data depending on the category of data contained therein: industrial data, user data, and personal data. Attention is also turned to the contractual form of big data circulation.

Download Full-text

Experimental analysis on the optimal excitation wavelength for fine-grained identification of refined oil pollutants on water surface based on laser-induced fluorescence

10.21203/rs.3.rs-756586/v2 ◽

2021 ◽

Author(s):

Ming Xie ◽

Yunpeng Jia ◽

Ying Li ◽

Xiaohua Cai ◽

Kai Cao

Keyword(s):

Oil Spill ◽

Theoretical Basis ◽

Water Surface ◽

Laser Induced Fluorescence ◽

Excitation Wavelength ◽

Refined Oil ◽

Fine Grained ◽

Optimal Excitation ◽

Oil Spill Identification

Abstract Laser-induced fluorescence (LIF) is an effective, all-weather oil spill identification method that has been widely applied for oil spill monitoring. However, the distinguishability on oil types is seldom considered while selecting excitation wavelength. This study is intended to find the optimal excitation wavelength for fine-grained classification of refined oil pollutants using LIF by comparing the distinguishability of fluorometric spectra under various excitation wavelengths on some typical types of refined-oil samples. The results show that the fluorometric spectra of oil samples significantly vary under different excitation wavelengths, and the four types of oil applied in this study are most likely to be distinguished under the excitation wavelengths of 395 nm and 420 nm. This study is expected to improve the ability of oil types identification using LIF method without increasing time or other cost, and also provides theoretical basis for the development of portable LIF devices for oil spill identification.

Download Full-text