Privacy Protection in Enterprise Social Networks Using a Hybrid De-Identification System

Enterprise social networks (ESN) have been widely used within organizations as a communication infrastructure that allows employees to collaborate with each other and share files and documents. The shared documents may contain a large amount of sensitive information that affect the privacy of persons such as phone numbers, which must be protected against any kind of disclosure or unauthorized access. In this study, authors propose a hybrid de-identification system that extract sensitive information from textual documents shared in ESNs. The system is based on both machine learning and rule-based classifiers. Gradient boosted trees (GBTs) algorithm is used as machine learning classifier. Experiments ran on a modified CoNLL 2003 dataset show that GBTs algorithm achieve a very high F1-score (95%). Additionally, the rule-based classifier is consisted of regular expression and gazetteers in order to complement the machine learning classifier. Thereafter, the sensitive information extracted by the two classifiers are merged and encrypted using Format Preserving Encryption method.

Download Full-text

Discovering Fine-grained Sentiment in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8963 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8963 ◽

Cited By ~ 9

Author(s):

Wenbo Wang ◽

Lu Chen ◽

Ming Tan ◽

Shaojun Wang ◽

Amit P. Sheth

Keyword(s):

Machine Learning ◽

Hybrid System ◽

Rule Based ◽

Fine Grained ◽

Knowledge Based ◽

Learning Classifier ◽

The Mean ◽

Training Examples ◽

Rule Based Classifier ◽

Better Than

This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams.

Download Full-text

An Experimental Study of Diversity of Diabetes Disease Features by Bagging and Boosting Ensemble Method with Rule Based Machine Learning Classifier Algorithms

SN Computer Science ◽

10.1007/s42979-020-00446-y ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Dhyan Chandra Yadav ◽

Saurabh Pal

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Ensemble Method ◽

Rule Based ◽

Learning Classifier ◽

Classifier Algorithms

Download Full-text

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.99 ◽

2008 ◽

Vol 29 (1) ◽

pp. 21-42 ◽

Cited By ~ 1

Author(s):

S. Pilon ◽

M.J. Puttkammer ◽

G.B. Van Huyssteen

Keyword(s):

Machine Learning ◽

Training Data ◽

Practical Implementation ◽

Manual Annotation ◽

Machine Learning Technique ◽

Rule Based ◽

The Core ◽

Learning Classifier ◽

Learning Technique ◽

Rule Based Approach

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Download Full-text

Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog

2016 19th International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2016.7860214 ◽

2016 ◽

Cited By ~ 5

Author(s):

Umme Aymun Siddiqua ◽

Tanveer Ahsan ◽

Abu Nowshed Chy

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Machine Learning Techniques ◽

Rule Based ◽

Feature Sets ◽

Learning Techniques ◽

Rule Based Classifier

Download Full-text

Artificial intelligence applied to small businesses: the use of automatic feature engineering and machine learning for more accurate planning

Revista de Contabilidade e Organizações ◽

10.11606/issn.1982-6486.rco.2020.171481 ◽

2020 ◽

Vol 14 ◽

pp. e171481

Author(s):

Alexandre Moreira Nascimento ◽

Vinicius Veloso De Melo ◽

Anna Carolina Muller Queiroz ◽

Thomas Brashear-Alejandro ◽

Fernando de Souza Meirelles

Keyword(s):

Machine Learning ◽

Small Businesses ◽

Operational Planning ◽

Feature Engineering ◽

The Novel ◽

Rule Based ◽

Novel Technique ◽

Using Data ◽

Rule Based Classifier ◽

Medium Businesses

The purpose of this study is to develop a predictive model that increases the accuracy of business operational planning using data from a small business. By using Machine Learning (ML) techniques feature expansion, resampling, and combination techniques, it was possible to address several existing limitations in the available research. Then, the use of the novel technique of feature engineering allowed us to increase the accuracy of the model by finding 10 new features derived from the original ones and constructed automatically through the nonlinear relationships found between them. Finally, we built a rule-based classifier to predict the store's revenue with high accuracy. The results show the proposed approach open new possibilities for ML research applied to small and medium businesses.

Download Full-text

XCSR Learning from Compressed Data Acquired by Deep Neural Network

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0856 ◽

2017 ◽

Vol 21 (5) ◽

pp. 856-867 ◽

Cited By ~ 1

Author(s):

Kazuma Matsumoto ◽

Takato Tatsumi ◽

Hiroyuki Sato ◽

Tim Kovacs ◽

Keiki Takadama ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Learning Classifier System ◽

Rule Based ◽

Classifier System ◽

Learning Classifier ◽

The Neural Network ◽

Compressed Data

The correctness rate of classification of neural networks is improved by deep learning, which is machine learning of neural networks, and its accuracy is higher than the human brain in some fields. This paper proposes the hybrid system of the neural network and the Learning Classifier System (LCS). LCS is evolutionary rule-based machine learning using reinforcement learning. To increase the correctness rate of classification, we combine the neural network and the LCS. This paper conducted benchmark experiments to verify the proposed system. The experiment revealed that: 1) the correctness rate of classification of the proposed system is higher than the conventional LCS (XCSR) and normal neural network; and 2) the covering mechanism of XCSR raises the correctness rate of proposed system.

Download Full-text

Learning Classifier Systems: A Complete Introduction, Review, and Roadmap

Journal of Artificial Evolution and Applications ◽

10.1155/2009/736398 ◽

2009 ◽

Vol 2009 ◽

pp. 1-25 ◽

Cited By ~ 115

Author(s):

Ryan J. Urbanowicz ◽

Jason H. Moore

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Knowledge Discovery ◽

Evolutionary Biology ◽

Historical Review ◽

Machine Learning Algorithms ◽

Learning Classifier Systems ◽

Classifier Systems ◽

Rule Based ◽

Learning Classifier

If complexity is your problem, learning classifier systems (LCSs) may offer a solution. These rule-based, multifaceted, machine learning algorithms originated and have evolved in the cradle of evolutionary biology and artificial intelligence. The LCS concept has inspired a multitude of implementations adapted to manage the different problem domains to which it has been applied (e.g., autonomous robotics, classification, knowledge discovery, and modeling). One field that is taking increasing notice of LCS is epidemiology, where there is a growing demand for powerful tools to facilitate etiological discovery. Unfortunately, implementation optimization is nontrivial, and a cohesive encapsulation of implementation alternatives seems to be lacking. This paper aims to provide an accessible foundation for researchers of different backgrounds interested in selecting or developing their own LCS. Included is a simple yet thorough introduction, a historical review, and a roadmap of algorithmic components, emphasizing differences in alternative LCS implementations.

Download Full-text

An Implementation of Learning Classifier Systems for Rule-Based Machine Learning

Lecture Notes in Computer Science - Knowledge-Based Intelligent Information and Engineering Systems ◽

10.1007/11552451_7 ◽

2005 ◽

pp. 45-54

Author(s):

An-Pin Chen ◽

Mu-Yen Chen

Keyword(s):

Machine Learning ◽

Learning Classifier Systems ◽

Classifier Systems ◽

Rule Based ◽

Learning Classifier

Download Full-text

Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz149 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1247-1254 ◽

Cited By ~ 4

Author(s):

Michel Oleynik ◽

Amila Kugic ◽

Zdenko Kasáč ◽

Markus Kreuzthaler

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Strategies ◽

Short Term Memory ◽

Shared Task ◽

Short Term ◽

Rule Based ◽

Long Short Term Memory ◽

Rule Based Classifier ◽

Clinical Phenotyping

Abstract Objective Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset. Materials and Methods We participated in the 2018 National NLP Clinical Challenges (n2c2) Shared Task on cohort selection and received an annotated dataset with medical narratives of 202 patients for multilabel binary text classification. We set our baseline to a majority classifier, to which we compared a rule-based classifier and orthogonal machine learning strategies: support vector machines, logistic regression, and long short-term memory neural networks. We evaluated logistic regression and long short-term memory using both self-trained and pretrained BioWordVec word embeddings as input representation schemes. Results Rule-based classifier showed the highest overall micro F1 score (0.9100), with which we finished first in the challenge. Shallow machine learning strategies showed lower overall micro F1 scores, but still higher than deep learning strategies and the baseline. We could not show a difference in classification efficiency between self-trained and pretrained embeddings. Discussion Clinical context, negation, and value-based criteria hindered shallow machine learning approaches, while deep learning strategies could not capture the term diversity due to the small training dataset. Conclusion Shallow methods for clinical phenotyping can still outperform deep learning methods in small imbalanced data, even when supported by pretrained embeddings.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text