scholarly journals The BitcoinHeist: Classifications of Ransomware Crime Families

Author(s):  
Micheline Al Harrack

Ransomware attacks are on the rise and attackers are hijacking valuable information from different critical infrastructures and businesses requiring ransom payments to release the encrypted files. Payments in cryptocurrencies are designed to evade tracing the transactions and the recipients. With anonymity being paramount, tracing cryptocurrencies payments due to malicious activity and criminal transactions is a complicated process. Therefore, the need to identify these transactions and label them is crucial to categorize them as legitimate digital currency trade and exchange or malicious activity operations. Machine learning techniques are utilized to train the machine to recognize specific transactions and trace them back to malicious transactions or benign ones. I propose to work on the Bitcoin Heist data set to classify the different malicious transactions. The different transactions features are analyzed to predict a classifier label among the classifiers that have been identified as ransomware or associated with malicious activity. I use decision tree classifiers and ensemble learning to implement a random forest classifier. Results are assessed to evaluate accuracy, precision, and recall. I limit the study design to known ransomware identified previously and made available under the Bitcoin transaction graph from January 2009 to December 2018.

Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2021 ◽  
pp. 155005942110608
Author(s):  
Jakša Vukojević ◽  
Damir Mulc ◽  
Ivana Kinder ◽  
Eda Jovičić ◽  
Krešimir Friganović ◽  
...  

In everyday clinical practice, there is an ongoing debate about the nature of major depressive disorder (MDD) in patients with borderline personality disorder (BPD). The underlying research does not give us a clear distinction between those 2 entities, although depression is among the most frequent comorbid diagnosis in borderline personality patients. The notion that depression can be a distinct disorder but also a symptom in other psychopathologies led our team to try and delineate those 2 entities using 146 EEG recordings and machine learning. The utilized algorithms, developed solely for this purpose, could not differentiate those 2 entities, meaning that patients suffering from MDD did not have significantly different EEG in terms of patients diagnosed with MDD and BPD respecting the given data and methods used. By increasing the data set and the spatiotemporal specificity, one could have a more sensitive diagnostic approach when using EEG recordings. To our knowledge, this is the first study that used EEG recordings and advanced machine learning techniques and further confirmed the close interrelationship between those 2 entities.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


Author(s):  
Shashidhara Bola

A new method is proposed to classify the lung nodules as benign and malignant. The method is based on analysis of lung nodule shape, contour, and texture for better classification. The data set consists of 39 lung nodules of 39 patients which contain 19 benign and 20 malignant nodules. Lung regions are segmented based on morphological operators and lung nodules are detected based on shape and area features. The proposed algorithm was tested on LIDC (lung image database consortium) datasets and the results were found to be satisfactory. The performance of the method for distinction between benign and malignant was evaluated by the use of receiver operating characteristic (ROC) analysis. The method achieved area under the ROC curve was 0.903 which reduces the false positive rate.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2019 ◽  
Vol 11 (16) ◽  
pp. 1943 ◽  
Author(s):  
Omid Rahmati ◽  
Saleh Yousefi ◽  
Zahra Kalantari ◽  
Evelyn Uuemaa ◽  
Teimur Teimurian ◽  
...  

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.


2020 ◽  
Vol 10 (18) ◽  
pp. 6527 ◽  
Author(s):  
Omar Sharif ◽  
Mohammed Moshiul Hoque ◽  
A. S. M. Kayes ◽  
Raza Nowrozy ◽  
Iqbal H. Sarker

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents has been growing enormously in recent years through instant messaging, social networking posts, blogs, online portals and other digital platforms. Unfortunately, the misapplication of technologies has increased with this rapid growth of online content, which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio, or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier ‘tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.


2020 ◽  
Author(s):  
Sonam Wangchuk ◽  
Tobias Bolch

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>


Sign in / Sign up

Export Citation Format

Share Document