scholarly journals SSL-VFC4.5: An approach to adapt Very Fast C4.5 classification algorithm to deal with semi-supervised learning

2021 ◽  
Author(s):  
Carlos Eduardo Nass ◽  
Agustín Alejandro Ortíz Díaz ◽  
Fabiano Baldo

The growing popularity of audio and video streaming, industry 4.0 and IoT (Internet of Things) technologies contribute to the fast augment of the generation of various types of data. Therefore, to analyze these data for decision-making, supervised machine learning techniques need to be fast while keeping a suitable predicting performance even in many real-life scenarios where labeled data are expensive and hard to be gotten. To overcome this problem, this work proposes an adaptation to the Very Fast C4.5 (VFC4.5) algorithm implementing on it a semi-supervised impurity metric presented in the literature. The results pointed out that this adaptation can slightly increase the accuracy of the VFC4.5 when the datasets have the presence of a very few amount of labeled instances, but it increases the training time, especially when the number of labeled instances in the datasets increase.

2016 ◽  
Author(s):  
Ευτύχιος Πρωτοπαπαδάκης

Ο όρος μάθηση με μερική επίβλεψη αναφέρεται σε ένα ευρύ πεδίο τεχνικών μηχανικής μάθησης, οι οποίες χρησιμοποιούν τα μη τιτλοφορημένα δεδομένα για να εξάγουν επιπλέον ωφέλιμη πληροφορία. Η μερική επίβλεψη αντιμετωπίζει προβλήματα που σχετίζονται με την επεξεργασία και την αξιοποίηση μεγάλου όγκου δεδομένων και τα όποια κόστη σχετίζονται με αυτά (π.χ. χρόνος επεξεργασίας, ανθρώπινα λάθη). Απώτερος σκοπός είναι η ασφαλή εξαγωγή συμπερασμάτων, κανόνων ή προτάσεων. Τα μοντέλα λήψης απόφασης που χρησιμοποιούν τεχνικές μερικής μάθησης έχουν ποικίλα πλεονεκτήματα. Σε πρώτη φάση, χρειάζονται μικρό πλήθος τιτλοφορημένων δεδομένων για την αρχικοποίηση τους. Στη συνέχεια, τα νέα δεδομένα που θα εμφανιστούν αξιοποιούνται και τροποποιούν κατάλληλα το μοντέλο. Ως εκ τούτου, έχουμε ένα συνεχώς εξελισσόμενο μοντέλο λήψης αποφάσεων, με την ελάχιστη δυνατή προσπάθεια.Τεχνικές που προσαρμόζονται εύκολα και οικονομικά είναι οι κατεξοχήν κατάλληλες για τον έλεγχο συστημάτων, στα οποία παρατηρούνται συχνές αλλαγές στον τρόπο λειτουργίας. Ενδεικτικά πεδία εφαρμογής εφαρμογής ευέλικτων συστημάτων υποστήριξης λήψης αποφάσεων με μερική μάθηση είναι: η επίβλεψη γραμμών παραγωγής, η επιτήρηση θαλάσσιων συνόρων, η φροντίδα ηλικιωμένων, η εκτίμηση χρηματοπιστωτικού κινδύνου, ο έλεγχος για δομικές ατέλειες και η διαφύλαξη της πολιτιστικής κληρονομιάς.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Rafael Vega Vega ◽  
Héctor Quintián ◽  
Carlos Cambra ◽  
Nuño Basurto ◽  
Álvaro Herrero ◽  
...  

Present research proposes the application of unsupervised and supervised machine-learning techniques to characterize Android malware families. More precisely, a novel unsupervised neural-projection method for dimensionality-reduction, namely, Beta Hebbian Learning (BHL), is applied to visually analyze such malware. Additionally, well-known supervised Decision Trees (DTs) are also applied for the first time in order to improve characterization of such families and compare the original features that are identified as the most important ones. The proposed techniques are validated when facing real-life Android malware data by means of the well-known and publicly available Malgenome dataset. Obtained results support the proposed approach, confirming the validity of BHL and DTs to gain deep knowledge on Android malware.


2021 ◽  
Vol 22 (1) ◽  
pp. 13-28
Author(s):  
Mir Shahnawaz Ahmad ◽  
Shahid Mehraj Shah

The interconnection of large number of smart devices and sensors for critical information gathering and analysis over the internet has given rise to the Internet of Things (IoT) network. In recent times, IoT has emerged as a prime field for solving diverse real-life problems by providing a smart and affordable solutions. The IoT network has various constraints like: limited computational capacity of sensors, heterogeneity of devices, limited energy resource and bandwidth etc. These constraints restrict the use of high-end security mechanisms, thus making these type of networks more vulnerable to various security attacks including malicious insider attacks. Also, it is very difficult to detect such malicious insiders in the network due to their unpredictable behaviour and the ubiquitous nature of IoT network makes the task more difficult. To solve such problems machine learning techniques can be used as they have the ability to learn the behaviour of the system and predict the particular anomaly in the system. So, in this paper we have discussed various security requirements and challenges in the IoT network. We have also applied various supervised machine learning techniques on available IoT dataset to deduce which among them is best suited to detect the malicious insider attacks in the IoT network.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


Author(s):  
Augusto Cerqua ◽  
Roberta Di Stefano ◽  
Marco Letta ◽  
Sara Miccoli

AbstractEstimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The “official” approach adopted by public institutions to estimate the “excess mortality” during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in “ordinary” years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community.


Author(s):  
Vara Vundavalli ◽  
Farhat Barsha ◽  
Mohammad Masum ◽  
Hossain Shahriar ◽  
Hisham Haddad

2021 ◽  
Vol 14 (3) ◽  
pp. 1-21
Author(s):  
Roy Abitbol ◽  
Ilan Shimshoni ◽  
Jonathan Ben-Dov

The task of assembling fragments in a puzzle-like manner into a composite picture plays a significant role in the field of archaeology as it supports researchers in their attempt to reconstruct historic artifacts. In this article, we propose a method for matching and assembling pairs of ancient papyrus fragments containing mostly unknown scriptures. Papyrus paper is manufactured from papyrus plants and therefore portrays typical thread patterns resulting from the plant’s stems. The proposed algorithm is founded on the hypothesis that these thread patterns contain unique local attributes such that nearby fragments show similar patterns reflecting the continuations of the threads. We posit that these patterns can be exploited using image processing and machine learning techniques to identify matching fragments. The algorithm and system which we present support the quick and automated classification of matching pairs of papyrus fragments as well as the geometric alignment of the pairs against each other. The algorithm consists of a series of steps and is based on deep-learning and machine learning methods. The first step is to deconstruct the problem of matching fragments into a smaller problem of finding thread continuation matches in local edge areas (squares) between pairs of fragments. This phase is solved using a convolutional neural network ingesting raw images of the edge areas and producing local matching scores. The result of this stage yields very high recall but low precision. Thus, we utilize these scores in order to conclude about the matching of entire fragments pairs by establishing an elaborate voting mechanism. We enhance this voting with geometric alignment techniques from which we extract additional spatial information. Eventually, we feed all the data collected from these steps into a Random Forest classifier in order to produce a higher order classifier capable of predicting whether a pair of fragments is a match. Our algorithm was trained on a batch of fragments which was excavated from the Dead Sea caves and is dated circa the 1st century BCE. The algorithm shows excellent results on a validation set which is of a similar origin and conditions. We then tried to run the algorithm against a real-life set of fragments for which we have no prior knowledge or labeling of matches. This test batch is considered extremely challenging due to its poor condition and the small size of its fragments. Evidently, numerous researchers have tried seeking matches within this batch with very little success. Our algorithm performance on this batch was sub-optimal, returning a relatively large ratio of false positives. However, the algorithm was quite useful by eliminating 98% of the possible matches thus reducing the amount of work needed for manual inspection. Indeed, experts that reviewed the results have identified some positive matches as potentially true and referred them for further investigation.


Sign in / Sign up

Export Citation Format

Share Document