large scale analysis
Recently Published Documents


TOTAL DOCUMENTS

916
(FIVE YEARS 313)

H-INDEX

62
(FIVE YEARS 10)

2022 ◽  
Vol 16 (2) ◽  
pp. 1-29
Author(s):  
Kai Wang ◽  
Jun Pang ◽  
Dingjie Chen ◽  
Yu Zhao ◽  
Dapeng Huang ◽  
...  

Exploiting the anonymous mechanism of Bitcoin, ransomware activities demanding ransom in bitcoins have become rampant in recent years. Several existing studies quantify the impact of ransomware activities, mostly focusing on the amount of ransom. However, victims’ reactions in Bitcoin that can well reflect the impact of ransomware activities are somehow largely neglected. Besides, existing studies track ransom transfers at the Bitcoin address level, making it difficult for them to uncover the patterns of ransom transfers from a macro perspective beyond Bitcoin addresses. In this article, we conduct a large-scale analysis of ransom payments, ransom transfers, and victim migrations in Bitcoin from 2012 to 2021. First, we develop a fine-grained address clustering method to cluster Bitcoin addresses into users, which enables us to identify more addresses controlled by ransomware criminals. Second, motivated by the fact that Bitcoin activities and their participants already formed stable industries, such as Darknet and Miner , we train a multi-label classification model to identify the industry identifiers of users. Third, we identify ransom payment transactions and then quantify the amount of ransom and the number of victims in 63 ransomware activities. Finally, after we analyze the trajectories of ransom transferred across different industries and track victims’ migrations across industries, we find out that to obscure the purposes of their transfer trajectories, most ransomware criminals (e.g., operators of Locky and Wannacry) prefer to spread ransom into multiple industries instead of utilizing the services of Bitcoin mixers. Compared with other industries, Investment is highly resilient to ransomware activities in the sense that the number of users in Investment remains relatively stable. Moreover, we also observe that a few victims become active in the Darknet after paying ransom. Our findings in this work can help authorities deeply understand ransomware activities in Bitcoin. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal activities that have similarly adopted bitcoins as their payments.


2022 ◽  
Vol 8 ◽  
pp. e835
Author(s):  
David Schindler ◽  
Felix Bensmann ◽  
Stefan Dietze ◽  
Frank Krüger

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.


2022 ◽  
Vol 11 (1) ◽  
Author(s):  
Daniele Rama ◽  
Tiziano Piccardi ◽  
Miriam Redi ◽  
Rossano Schifanella

AbstractWikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article’s body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of free knowledge environments. To bridge this gap, we collect data about English Wikipedia reader interactions with images during one month and perform the first large-scale analysis of how interactions with images happen on Wikipedia. First, we quantify the overall engagement with images, finding that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content. Second, we study what factors associate with image engagement and observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people. Third, we look at interactions with Wikipedia article previews and find that images help support reader information need when navigating through the site, especially for more popular pages. The findings in this study deepen our understanding of the role of images for free knowledge and provide a guide for Wikipedia editors and web user communities to enrich the world’s largest source of encyclopedic knowledge.


Author(s):  
Gabriele Scalia

AbstractOver the last few years, machine learning has revolutionized countless areas and fields. Nowadays, AI bears promise for analyzing, extracting knowledge, and driving discovery across many scientific domains such as chemistry, biology, and genomics. However, the specific challenges posed by scientific data demand to adapt machine learning techniques to new requirements. We investigate machine learning-driven scientific data analysis, focusing on a set of key requirements. These include the management of uncertainty for complex data and models, the estimation of system properties starting from low-volume and imprecise collected data, the support to scientific model development through large-scale analysis of experimental data, and the machine learning-driven integration of complementary experimental technologies.


2021 ◽  
Author(s):  
Rong Huang ◽  
Yinrong Liu ◽  
Jianling Chen ◽  
Zuyu Lu ◽  
Jiajia Wang ◽  
...  

Abstract Background: Angelica dahurica, belonging to the Apiaceae family, whose dry root is a famous traditional Chinese medicine named as “Bai zhi”. There are two cultivars (A. dahurica cv. ‘Hangbaizhi’ and A. dahurica cv. ‘Qibaizhi’), which have been domesticated for thousands of years. Long term artificial selection has led to great changes in root phenotypes of the two cultivars, and also decreased their adaptability to environment. We proposed hypothesis that the cultivars may lose some genetic diversity and highly differentiate from wild A. dahurica during the domestication process. However, few studies have been carried out on how domestication affects the genetic variation of this species. Here, we accessed the levels of genetic variation and differentiation within and between wild A. dahurica and its cultivars using 12 SSR markers. Results: The results revealed that the genetic diversity of the cultivars was much lower than that of wild A. dahurica, and A. dahurica cv. ‘Qibaizhi’ had lower genetic diversity compared to A. dahurica cv. ‘Hangbaizhi’. AMOVA analysis showed significant genetic differentiation between the wild and cultivated A. dahurica, and between A. dahurica cv. ‘Hangbaizhi’ and A. dahurica cv. ‘Qibaizhi’. The results of Bayesian, UPGMA, NJ and PcoA clustering analysis indicated that all 15 populations were assigned to two genetic clusters corresponding to the wild and cultivated resources. Bayesian clustering analysis further divided the cultivated resources into two sub-clusters corresponding to the two cultivars. Conclusions:Our study suggests that domestication process is likely the major factor resulting in the loss of genetic diversity in cultivated A. dahurica and significant genetic differentiation from the wild resources due to founder effect and/or artificially directional selections. This large-scale analysis of population genetics could provide valuable information for genetic resources conservation and breeding programs of Angelica dahurica.


Author(s):  
O. G. Litvinova

One of the fundamental urban planning tasks is currently a study of the settlement system properties. In Russian and foreign historical and urban planning science, settlement is studied according to the hierarchical location of settlements. Small and medium-sized settlements are considered as elementary lower units of large cities, their structure and formation processes are not studied. Accordingly, they are rarely considered in elaborating strategic programs of the regional development. The paper proposes the urban retrospective method, which provides a deep and large-scale analysis of the settlement system in the coastal area of the Angara River.Research is based on the cartographic sources developed by governmental institutions whose the activity depends on statistical data. Here belong Ministry of Internal Affairs, Ministry of Agriculture, Ministry of Railways. The comparative analysis of the sources provides modeling and identification of the settlement system with respect to small settlements in the coastal area of the Angara River in different periods. Significant results include the quantitative data on small settlements, since they are not interesting to urban planners of today.


Sign in / Sign up

Export Citation Format

Share Document