On the Impact of Distance Metrics in Instance-Based Learning Algorithms

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks

Sensors ◽

10.3390/s21154953 ◽

2021 ◽

Vol 21 (15) ◽

pp. 4953

Author(s):

Sara Al-Emadi ◽

Abdulla Al-Ali ◽

Abdulaziz Al-Ali

Keyword(s):

Neural Network ◽

Deep Learning ◽

Recurrent Neural Network ◽

Learning Algorithms ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Networks ◽

Detection And Identification ◽

Learning Techniques ◽

The Impact

Drones are becoming increasingly popular not only for recreational purposes but in day-to-day applications in engineering, medicine, logistics, security and others. In addition to their useful applications, an alarming concern in regard to the physical infrastructure security, safety and privacy has arisen due to the potential of their use in malicious activities. To address this problem, we propose a novel solution that automates the drone detection and identification processes using a drone’s acoustic features with different deep learning algorithms. However, the lack of acoustic drone datasets hinders the ability to implement an effective solution. In this paper, we aim to fill this gap by introducing a hybrid drone acoustic dataset composed of recorded drone audio clips and artificially generated drone audio samples using a state-of-the-art deep learning technique known as the Generative Adversarial Network. Furthermore, we examine the effectiveness of using drone audio with different deep learning algorithms, namely, the Convolutional Neural Network, the Recurrent Neural Network and the Convolutional Recurrent Neural Network in drone detection and identification. Moreover, we investigate the impact of our proposed hybrid dataset in drone detection. Our findings prove the advantage of using deep learning techniques for drone detection and identification while confirming our hypothesis on the benefits of using the Generative Adversarial Networks to generate real-like drone audio clips with an aim of enhancing the detection of new and unfamiliar drones.

Download Full-text

Analyze the impact of the epidemic on New York taxis by machine learning algorithms and recommendations for optimal prediction algorithms

10.1145/3475851.3475861 ◽

2021 ◽

Author(s):

Zheng Liu ◽

Xinjing Xia ◽

Haipeng Zhang ◽

Zihui Xie

Keyword(s):

Machine Learning ◽

New York ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Optimal Prediction ◽

Prediction Algorithms ◽

The Impact

Download Full-text

Efficient Incremental Instance-based Learning Algorithms for Open World Malware Classification

10.1109/atc52653.2021.9598272 ◽

2021 ◽

Author(s):

Kien Hoang Dang ◽

Dai Tho Nguyen ◽

Thu Trang Nguyen Thi

Keyword(s):

Learning Algorithms ◽

Malware Classification ◽

Open World ◽

Instance Based Learning

Download Full-text

An analytical survey on the role of machine learning algorithms in case of intrusion detection

ACCENTS Transactions on Information Security ◽

10.19101/tis.2020.517002 ◽

2020 ◽

Vol 5 (19) ◽

pp. 32-35

Author(s):

Anand Vijay ◽

Kailash Patidar ◽

Manoj Yadav ◽

Rishi Kushwah

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Handling Mechanism ◽

The Impact

In this paper an analytical survey on the role of machine learning algorithms in case of intrusion detection has been presented and discussed. This paper shows the analytical aspects in the development of efficient intrusion detection system (IDS). The related study for the development of this system has been presented in terms of computational methods. The discussed methods are data mining, artificial intelligence and machine learning. It has been discussed along with the attack parameters and attack types. This paper also elaborates the impact of different attack and handling mechanism based on the previous papers.

Download Full-text

Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: the Evaluation of Text Classification Algorithms Using Machine Learning

10.20944/preprints201912.0220.v1 ◽

2019 ◽

Author(s):

Francesc López Seguí ◽

Ricardo Ander Egg Aguilar ◽

Gabriel de Maeztu ◽

Anna García-Altés ◽

Francesc García Cuyàs ◽

...

Keyword(s):

Machine Learning ◽

Primary Care ◽

Text Classification ◽

Learning Strategy ◽

Care Service ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Face To Face ◽

Classification Tool ◽

The Impact

Background: the primary care service in Catalonia has operated an asynchronous teleconsulting service between GPs and patients since 2015 (eConsulta), which has generated some 500,000 messages. New developments in big data analysis tools, particularly those involving natural language, can be used to accurately and systematically evaluate the impact of the service. Objective: the study was intended to examine the predictive potential of eConsulta messages through different combinations of vector representation of text and machine learning algorithms and to evaluate their performance. Methodology: 20 machine learning algorithms (based on 5 types of algorithms and 4 text representation techniques)were trained using a sample of 3,559 messages (169,102 words) corresponding to 2,268 teleconsultations (1.57 messages per teleconsultation) in order to predict the three variables of interest (avoiding the need for a face-to-face visit, increased demand and type of use of the teleconsultation). The performance of the various combinations was measured in terms of precision, sensitivity, F-value and the ROC curve. Results: the best-trained algorithms are generally effective, proving themselves to be more robust when approximating the two binary variables "avoiding the need of a face-to-face visit" and "increased demand" (precision = 0.98 and 0.97, respectively) rather than the variable "type of query"(precision = 0.48). Conclusion: to the best of our knowledge, this study is the first to investigate a machine learning strategy for text classification using primary care teleconsultation datasets. The study illustrates the possible capacities of text analysis using artificial intelligence. The development of a robust text classification tool could be feasible by validating it with more data, making it potentially more useful for decision support for health professionals.

Download Full-text

Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Volume 4B: Combustion, Fuels, and Emissions ◽

10.1115/gt2019-91319 ◽

2019 ◽

Author(s):

Michael McCartney ◽

Matthias Haeringer ◽

Wolfgang Polifke

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Spline Interpolation ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Test Time ◽

Minimal Amount ◽

Data Points ◽

The Impact

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

Download Full-text

The impact of Negative to Positive Training Dataset Ratio on Atrial Fibrillation Classification Machine Learning Algorithms Performance

Journal of Physics Conference Series ◽

10.1088/1742-6596/1500/1/012131 ◽

2020 ◽

Vol 1500 ◽

pp. 012131

Author(s):

Firdaus ◽

Andre Herviant Juliano ◽

Naufal Rachmatullah ◽

Sarifah Putri Rafflesia ◽

Dinna Yunika Hardiyanti ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Dataset ◽

The Impact

Download Full-text

To Enhance the Impact of Deep Learning-Based Algorithms in Determining the Behavior of an Individual based on Communication on Social Media

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3841.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 4433-4435

Keyword(s):

Social Media ◽

Deep Learning ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Opinion Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Social Media Text ◽

The Impact

In this digitized world, the Internet has become a prominent source to glean various kinds of information. In today’s scenario, people prefer virtual reality instead of one to one communication. The Majority of the population prefers social networking sites to voice themselves through posts, blogs, comments, likes, dislikes. Their sentiments can be found/traced using opinion mining or Sentiment analysis. Sentiment analysis of social media text is a useful technique for identifying peoples’ positive, negative or neutral emotions/sentiments/opinions. Sentiment analysis has gained special attention by researchers from last few years. Traditionally many machine learning algorithms were used to implement it like navie bays, Support Vector Machine and many more. But to overcome the drawbacks of ML in terms of complex classification algorithms different deep learning-based algorithms are introduced like CNN, RNN, and HNN. In this paper, we have studied different deep learning algorithms and intended to propose a deep learning-based model to analyze the behavior of an individual using social media text. Results given by the proposed model can utilize in a range of different fields like business, education, industry, politics, psychology, security, etc.

Download Full-text

Analysis of the Threshold Variation of the FlexCon-C Algorithm for Semi-supervised Learning

10.5753/eniac.2018.4466 ◽

2018 ◽

Author(s):

Arthur C. Gorgônio ◽

Cainan T. Alves ◽

Amarildo J. F. Lucena ◽

Flavius L. Gorgônio ◽

Karliane M. O. Vale ◽

...

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

High Confidence ◽

Supervised Learning Algorithms ◽

Threshold Variation ◽

The Impact

Semi-supervised learning algorithms are able to train classifiers from a small portion of initially labeled objects. The reliability of the classification process depends on several factors that include the type of classifier used and a set of parameters that customize them. One of the most important factors is a threshold that determines which instances are included per iteration, allowing to label only instances with high confidence values. This article analyzes different values for the variation factor of the FlexCon-C algorithm and measures the impact of this change on its accuracy. The results consider thirty different databases, four classifiers and five different percentages of pre-labeled data.

Download Full-text