Why Data Protection And Transparency Are Not Enough When Facing Social Problems Of Machine Learning In A Big Data Context

2019 ◽  
pp. 42-45
Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1782
Author(s):  
Aurelio López-Fernández ◽  
Domingo S. Rodríguez-Baena ◽  
Francisco Gómez-Vela

Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context.


2019 ◽  
Vol 11 (1) ◽  
pp. 196 ◽  
Author(s):  
Jong Hwan Suh

In the digital age, the abundant unstructured data on the Internet, particularly online news articles, provide opportunities for identifying social problems and understanding social systems for sustainability. However, the previous works have not paid attention to the social-problem-specific perspectives of such big data, and it is currently unclear how information technologies can use the big data to identify and manage the ongoing social problems. In this context, this paper introduces and focuses on social-problem-specific key noun terms, namely SocialTERMs, which can be used not only to search the Internet for social-problem-related data, but also to monitor the ongoing and future events of social problems. Moreover, to alleviate time-consuming human efforts in identifying the SocialTERMs, this paper designs and examines the SocialTERM-Extractor, which is an automatic approach for identifying the key noun terms of social-problem-related topics, namely SPRTs, in a large number of online news articles and predicting the SocialTERMs among the identified key noun terms. This paper has its novelty as the first trial to identify and predict the SocialTERMs from a large number of online news articles, and it contributes to literature by proposing three types of text-mining-based features, namely temporal weight, sentiment, and complex network structural features, and by comparing the performances of such features with various machine learning techniques including deep learning. Particularly, when applied to a large number of online news articles that had been published in South Korea over a 12-month period and mostly written in Korean, the experimental results showed that Boosting Decision Tree gave the best performances with the full feature sets. They showed that the SocialTERMs can be predicted with high performances by the proposed SocialTERM-Extractor. Eventually, this paper can be beneficial for individuals or organizations who want to explore and use social-problem-related data in a systematical manner for understanding and managing social problems even though they are unfamiliar with ongoing social problems.


2019 ◽  
Author(s):  
Peter Kieseberg ◽  
Lukas Daniel Klausner ◽  
Andreas Holzinger

In discussions on the General Data Protection Regulation (GDPR), anonymisation and deletion are frequently mentioned as suitable technical and organisational methods (TOMs) for privacy protection. The major problem of distortion in machine learning environments, as well as related issues with respect to privacy, are rarely mentioned. The Big Data Analytics project addresses these issues.


Sign in / Sign up

Export Citation Format

Share Document