De-Identification of Unstructured Textual Data using Artificial Immune System for Privacy Preserving
The development of new technologies has led the world into a tipping point. One of these technologies is the big data which made the revolution of computer sciences. Big data has come with new challenges. These challenges can be resumed in the aim of creating scalable and efficient services that can treat huge amounts of heterogeneous data in small scale of time while preserving users' privacy. Textual data occupy a wide space in internet. These data could contain information that can lead to identify users. For that, the development of such approaches that can detect and remove any identifiable information has become a critical research area known as de-identification. This paper tackle the problem of privacy in textual data. The authors' proposed approach consists of using artificial immune systems and MapReduce to detect and hide identifiable words with no matter on their variants using the personnel information of the user from his profile. After many experiments, the system shows a high efficiency in term of number of detected words, the way they are hided with, and time of execution.