Why Data Protection And Transparency Are Not Enough When Facing Social Problems Of Machine Learning In A Big Data Context

Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context.

Download Full-text

SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques

Sustainability ◽

10.3390/su11010196 ◽

2019 ◽

Vol 11 (1) ◽

pp. 196 ◽

Cited By ~ 3

Author(s):

Jong Hwan Suh

Keyword(s):

Machine Learning ◽

Big Data ◽

Text Mining ◽

Social Problem ◽

Social Problems ◽

Online News ◽

Machine Learning Techniques ◽

The Internet ◽

Related Data ◽

Learning Techniques

In the digital age, the abundant unstructured data on the Internet, particularly online news articles, provide opportunities for identifying social problems and understanding social systems for sustainability. However, the previous works have not paid attention to the social-problem-specific perspectives of such big data, and it is currently unclear how information technologies can use the big data to identify and manage the ongoing social problems. In this context, this paper introduces and focuses on social-problem-specific key noun terms, namely SocialTERMs, which can be used not only to search the Internet for social-problem-related data, but also to monitor the ongoing and future events of social problems. Moreover, to alleviate time-consuming human efforts in identifying the SocialTERMs, this paper designs and examines the SocialTERM-Extractor, which is an automatic approach for identifying the key noun terms of social-problem-related topics, namely SPRTs, in a large number of online news articles and predicting the SocialTERMs among the identified key noun terms. This paper has its novelty as the first trial to identify and predict the SocialTERMs from a large number of online news articles, and it contributes to literature by proposing three types of text-mining-based features, namely temporal weight, sentiment, and complex network structural features, and by comparing the performances of such features with various machine learning techniques including deep learning. Particularly, when applied to a large number of online news articles that had been published in South Korea over a 12-month period and mostly written in Korean, the experimental results showed that Boosting Decision Tree gave the best performances with the full feature sets. They showed that the SocialTERMs can be predicted with high performances by the proposed SocialTERM-Extractor. Eventually, this paper can be beneficial for individuals or organizations who want to explore and use social-problem-related data in a systematical manner for understanding and managing social problems even though they are unfamiliar with ongoing social problems.

Download Full-text

Adaptation of Classical Machine Learning Algorithms to Big Data Context: Problems and Challenges : Case Study: Hidden Markov Models Under Spark

2019 1st International Conference on Smart Systems and Data Science (ICSSD) ◽

10.1109/icssd47982.2019.9002857 ◽

2019 ◽

Author(s):

Imad SASSI ◽

Sara OUAFTOUH ◽

Samir ANTER

Keyword(s):

Machine Learning ◽

Big Data ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Context

Download Full-text

Trending Algorithms in Machine Learning and issues along with Big data Context in real time Data Processing

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/138892020 ◽

2020 ◽

Vol 8 (9) ◽

pp. 5760-5763

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Real Time ◽

Time Data ◽

Real Time Data ◽

Real Time Data Processing ◽

Data Context

Download Full-text

Distortion in Real-World Analytic Processes

10.31219/osf.io/psz9y ◽

2019 ◽

Author(s):

Peter Kieseberg ◽

Lukas Daniel Klausner ◽

Andreas Holzinger

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Environments ◽

Real World ◽

Data Protection ◽

Privacy Protection ◽

Data Analytics ◽

Big Data Analytics ◽

General Data Protection Regulation ◽

General Data

In discussions on the General Data Protection Regulation (GDPR), anonymisation and deletion are frequently mentioned as suitable technical and organisational methods (TOMs) for privacy protection. The major problem of distortion in machine learning environments, as well as related issues with respect to privacy, are rarely mentioned. The Big Data Analytics project addresses these issues.

Download Full-text