Big Data Analysis Based on Machine Learning Techniques

During the last years, big data has become the new emerging trend that increasingly attracting the attention of the R&D community in several fields (e.g., image processing, database engineering, data mining, artificial intelligence). Marine data is part of these fields which accommodates this growth, hence the appearance of marine big data paradigm that monitoring advocates the assessment of human impact on marine data. Nonetheless, supporting acoustic sounds classification is missing in such environment, with taking into account the diversity of such data (i.e., sounds of living undersea species, sounds of human activities, and sounds of environmental effects). To overcome this issue, we propose in this paper an approach that efficiently allowing acoustic diversity classification using machine learning techniques. The aim is to reach an automated support of marine big data analysis. We have conducted a set of experiments, using a real marine dataset, in order to validate our approach and show its effectiveness and efficiency. To do so, three machine learning techniques are employed: (i) classic machine learning models (i.e., k-nearest neighbor and support vector machine), (ii) deep learning based on convolutional neural networks, and (iii) transfer learning based on the reuse of pretrained models.

Download Full-text

Big Data Analysis for Trend Recognition Using Machine Learning Techniques

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666200304141238 ◽

2020 ◽

Vol 10 (4) ◽

pp. 540-550

Author(s):

Cerene Mariam Abraham ◽

Mannathazhathu Sudheep Elayidom ◽

Thankappan Santhanakrishnan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Data Analysis ◽

Data Analytics ◽

Research Work ◽

Big Data Analytics ◽

Big Data Analysis ◽

Machine Learning Techniques ◽

Derivative Market

Background: Machine learning is one of the most popular research areas today. It relates closely to the field of data mining, which extracts information and trends from large datasets. Aims: The objective of this paper is to (a) illustrate big data analytics for the Indian derivative market and (b) identify trends in the data. Methods: Based on input from experts in the equity domain, the data are verified statistically using data mining techniques. Specifically, ten years of daily derivative data is used for training and testing purposes. The methods that are adopted for this research work include model generation using ARIMA, Hadoop framework which comprises mapping and reducing for big data analysis. Results: The results of this work are the observation of a trend that indicates the rise and fall of price in derivatives , generation of time-series similarity graph and plotting of frequency of temporal data. Conclusion: Big data analytics is an underexplored topic in the Indian derivative market and the results from this paper can be used by investors to earn both short-term and long-term benefits.

Download Full-text

An Empirical Comparison of Six Supervised Machine Learning Techniques on Spark Platform for Health Big Data

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1927-3_32 ◽

2018 ◽

pp. 299-307

Author(s):

Gayathri Nagarajan ◽

L. D. Dhinesh Babu

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Empirical Comparison ◽

Learning Techniques

Download Full-text

Big Data Classification and Internet of Things in Healthcare

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.2020040102 ◽

2020 ◽

Vol 11 (2) ◽

pp. 20-37 ◽

Cited By ~ 1

Author(s):

Amine Rghioui ◽

Jaime Lloret ◽

Abedlmajid Oumnad

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Big Data Analytics ◽

Data Classification ◽

Big Data Analysis ◽

Machine Intelligence ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Daunting Task

Every single day, a massive amount of data is generated by different medical data sources. Processing this wealth of data is indeed a daunting task, and it forces us to adopt smart and scalable computational strategies, including machine intelligence, big data analytics, and data classification. The authors can use the Big Data analysis for effective decision making in healthcare domain using the existing machine learning algorithms with some modification to it. The fundamental purpose of this article is to summarize the role of Big Data analysis in healthcare, and to provide a comprehensive analysis of the various techniques involved in mining big data. This article provides an overview of Big Data, applicability of it in healthcare, some of the work in progress and a future works. Therefore, in this article, the use of machine learning techniques is proposed for real-time diabetic patient data analysis from IoT devices and gateways.

Download Full-text

Analysed potential of big data and supervised machine learning techniques in effectively forecasting travel times from fused data

PROMET - Traffic&Transportation ◽

10.7307/ptt.v27i6.1762 ◽

2015 ◽

Vol 27 (6) ◽

pp. 515-528 ◽

Cited By ~ 3

Author(s):

Ivana Šemanjski

Keyword(s):

Machine Learning ◽

Big Data ◽

Random Forest ◽

Data Sources ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Travel Times ◽

The Road ◽

Learning Techniques ◽

Nearest Neighbours

Travel time forecasting is an interesting topic for many ITS services. Increased availability of data collection sensors increases the availability of the predictor variables but also highlights the high processing issues related to this big data availability. In this paper we aimed to analyse the potential of big data and supervised machine learning techniques in effectively forecasting travel times. For this purpose we used fused data from three data sources (Global Positioning System vehicles tracks, road network infrastructure data and meteorological data) and four machine learning techniques (k-nearest neighbours, support vector machines, boosting trees and random forest). To evaluate the forecasting results we compared them in-between different road classes in the context of absolute values, measured in minutes, and the mean squared percentage error. For the road classes with the high average speed and long road segments, machine learning techniques forecasted travel times with small relative error, while for the road classes with the small average speeds and segment lengths this was a more demanding task. All three data sources were proven itself to have a high impact on the travel time forecast accuracy and the best results (taking into account all road classes) were achieved for the k-nearest neighbours and random forest techniques.

Download Full-text

Big Data Classification and Internet of Things in Healthcare

10.4018/978-1-6684-3662-2.ch071 ◽

2022 ◽

pp. 1458-1476

Author(s):

Amine Rghioui ◽

Jaime Lloret ◽

Abedlmajid Oumnad

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Classification ◽

Big Data Analysis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Big Data Classification ◽

Iot Devices ◽

Effective Decision Making

Every single day, a massive amount of data is generated by different medical data sources. Processing this wealth of data is indeed a daunting task, and it forces us to adopt smart and scalable computational strategies, including machine intelligence, big data analytics, and data classification. The authors can use the Big Data analysis for effective decision making in healthcare domain using the existing machine learning algorithms with some modification to it. The fundamental purpose of this article is to summarize the role of Big Data analysis in healthcare, and to provide a comprehensive analysis of the various techniques involved in mining big data. This article provides an overview of Big Data, applicability of it in healthcare, some of the work in progress and a future works. Therefore, in this article, the use of machine learning techniques is proposed for real-time diabetic patient data analysis from IoT devices and gateways.

Download Full-text

A Study on Tourism Mobile Web Application based on Big Data Analysis Platform for the South of Thailand

2018 22nd International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2018.8712646 ◽

2018 ◽

Author(s):

Mallika Subongkod ◽

Sarun Duangsuwan ◽

Punyawi Jamjareegulgarn

Keyword(s):

Big Data ◽

Data Analysis ◽

Web Application ◽

Big Data Analysis ◽

The South ◽

Mobile Web ◽

Analysis Platform

Download Full-text

Bipolar Disorder and Oxidative Stress Injury Mechanism - Clinical Big Data Analysis Based on Machine Learning

Case Medical Research ◽

10.31525/ct1-nct03949218 ◽

2019 ◽

Author(s):

Keyword(s):

Oxidative Stress ◽

Machine Learning ◽

Bipolar Disorder ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Injury Mechanism ◽

Stress Injury ◽

Oxidative Stress Injury ◽

And Oxidative Stress

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Local mortality estimates during the COVID-19 pandemic in Italy

Journal of Population Economics ◽

10.1007/s00148-021-00857-y ◽

2021 ◽

Author(s):

Augusto Cerqua ◽

Roberta Di Stefano ◽

Marco Letta ◽

Sara Miccoli

Keyword(s):

Machine Learning ◽

Excess Mortality ◽

Control Method ◽

Local Level ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Mortality Data ◽

Official Method ◽

Learning Techniques ◽

Mortality Estimates

AbstractEstimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The “official” approach adopted by public institutions to estimate the “excess mortality” during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in “ordinary” years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community.

Download Full-text