large scale analysis Latest Research Papers

Towards transdisciplinary impact of scientific publications: A longitudinal, comprehensive, and large-scale analysis on Microsoft Academic Graph

Information Processing & Management ◽

10.1016/j.ipm.2021.102859 ◽

2022 ◽

Vol 59 (2) ◽

pp. 102859

Author(s):

Yong Huang ◽

Wei Lu ◽

Jialin Liu ◽

Qikai Cheng ◽

Yi Bu

Keyword(s):

Large Scale ◽

Scale Analysis ◽

Scientific Publications ◽

Large Scale Analysis ◽

Microsoft Academic

Probabilistic unsupervised classification for large-scale analysis of spectral imaging data

International Journal of Applied Earth Observation and Geoinformation ◽

10.1016/j.jag.2022.102675 ◽

2022 ◽

Vol 107 ◽

pp. 102675

Author(s):

Emmanuel Paradis

Keyword(s):

Large Scale ◽

Spectral Imaging ◽

Unsupervised Classification ◽

Scale Analysis ◽

Imaging Data ◽

Large Scale Analysis

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

ACM Transactions on the Web ◽

10.1145/3494557 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-29

Author(s):

Kai Wang ◽

Jun Pang ◽

Dingjie Chen ◽

Yu Zhao ◽

Dapeng Huang ◽

...

Keyword(s):

Empirical Analysis ◽

Large Scale ◽

Classification Model ◽

Scale Analysis ◽

Clustering Method ◽

Fine Grained ◽

Large Scale Analysis ◽

The Impact ◽

Transfer Trajectories

Exploiting the anonymous mechanism of Bitcoin, ransomware activities demanding ransom in bitcoins have become rampant in recent years. Several existing studies quantify the impact of ransomware activities, mostly focusing on the amount of ransom. However, victims’ reactions in Bitcoin that can well reflect the impact of ransomware activities are somehow largely neglected. Besides, existing studies track ransom transfers at the Bitcoin address level, making it difficult for them to uncover the patterns of ransom transfers from a macro perspective beyond Bitcoin addresses. In this article, we conduct a large-scale analysis of ransom payments, ransom transfers, and victim migrations in Bitcoin from 2012 to 2021. First, we develop a fine-grained address clustering method to cluster Bitcoin addresses into users, which enables us to identify more addresses controlled by ransomware criminals. Second, motivated by the fact that Bitcoin activities and their participants already formed stable industries, such as Darknet and Miner , we train a multi-label classification model to identify the industry identifiers of users. Third, we identify ransom payment transactions and then quantify the amount of ransom and the number of victims in 63 ransomware activities. Finally, after we analyze the trajectories of ransom transferred across different industries and track victims’ migrations across industries, we find out that to obscure the purposes of their transfer trajectories, most ransomware criminals (e.g., operators of Locky and Wannacry) prefer to spread ransom into multiple industries instead of utilizing the services of Bitcoin mixers. Compared with other industries, Investment is highly resilient to ransomware activities in the sense that the number of users in Investment remains relatively stable. Moreover, we also observe that a few victims become active in the Darknet after paying ransom. Our findings in this work can help authorities deeply understand ransomware activities in Bitcoin. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal activities that have similarly adopted bitcoins as their payments.

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

PeerJ Computer Science ◽

10.7717/peerj-cs.835 ◽

2022 ◽

Vol 8 ◽

pp. e835

Author(s):

David Schindler ◽

Felix Bensmann ◽

Stefan Dietze ◽

Frank Krüger

Keyword(s):

Information Extraction ◽

Large Scale ◽

Automated Detection ◽

Knowledge Graph ◽

Pubmed Central ◽

Scientific Methods ◽

Large Scale Analysis ◽

Different Types ◽

Citation Practices

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.

A large scale study of reader interactions with images on Wikipedia

EPJ Data Science ◽

10.1140/epjds/s13688-021-00312-8 ◽

2022 ◽

Vol 11 (1) ◽

Author(s):

Daniele Rama ◽

Tiziano Piccardi ◽

Miriam Redi ◽

Rossano Schifanella

Keyword(s):

Visual Arts ◽

Large Scale ◽

Scale Analysis ◽

Information Need ◽

Visual Content ◽

Large Scale Analysis ◽

User Communities ◽

Order Of Magnitude ◽

The Web

AbstractWikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article’s body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of free knowledge environments. To bridge this gap, we collect data about English Wikipedia reader interactions with images during one month and perform the first large-scale analysis of how interactions with images happen on Wikipedia. First, we quantify the overall engagement with images, finding that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content. Second, we study what factors associate with image engagement and observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people. Third, we look at interactions with Wikipedia article previews and find that images help support reader information need when navigating through the site, especially for more popular pages. The findings in this study deepen our understanding of the role of images for free knowledge and provide a guide for Wikipedia editors and web user communities to enrich the world’s largest source of encyclopedic knowledge.

Corrigendum: The Dihydrofolate Reductase Protein-Fragment Complementation Assay: A Survival-Selection Assay for Large-Scale Analysis of Protein–Protein Interactions

Cold Spring Harbor Protocols ◽

10.1101/pdb.corr107812 ◽

2022 ◽

Vol 2022 (1) ◽

pp. pdb.corr107812

Author(s):

Stephen W. Michnick ◽

Emmanuel D. Levy ◽

Christian R. Landry ◽

Jacqueline Kowarzyk ◽

Vincent Messier

Keyword(s):

Dihydrofolate Reductase ◽

Protein Interactions ◽

Large Scale ◽

Scale Analysis ◽

Protein Protein Interactions ◽

Protein Fragment ◽

Large Scale Analysis ◽

Survival Selection ◽

Protein Fragment Complementation ◽

Protein Fragment Complementation Assay

Coping and Regulatory Responses on Social Media during Health Crisis: a Large-scale Analysis

10.24251/hicss.2022.457 ◽

2022 ◽

Author(s):

Olga Abramova ◽

Katharina Batzel ◽

Daniela Modesti

Keyword(s):

Social Media ◽

Large Scale ◽

Scale Analysis ◽

Health Crisis ◽

Large Scale Analysis

Machine Learning for Scientific Data Analysis

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_10 ◽

2022 ◽

pp. 115-126

Author(s):

Gabriele Scalia

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Large Scale ◽

Model Development ◽

Scientific Data ◽

Machine Learning Techniques ◽

Scientific Model ◽

Large Scale Analysis ◽

Scientific Data Analysis ◽

System Properties

AbstractOver the last few years, machine learning has revolutionized countless areas and fields. Nowadays, AI bears promise for analyzing, extracting knowledge, and driving discovery across many scientific domains such as chemistry, biology, and genomics. However, the specific challenges posed by scientific data demand to adapt machine learning techniques to new requirements. We investigate machine learning-driven scientific data analysis, focusing on a set of key requirements. These include the management of uncertainty for complex data and models, the estimation of system properties starting from low-volume and imprecise collected data, the support to scientific model development through large-scale analysis of experimental data, and the machine learning-driven integration of complementary experimental technologies.

Limited Genetic Diversity and High Differentiation in Angelica Dahurica Resulted by Domestication: Insights to Breeding and Conservation

10.21203/rs.3.rs-1160144/v1 ◽

2021 ◽

Author(s):

Rong Huang ◽

Yinrong Liu ◽

Jianling Chen ◽

Zuyu Lu ◽

Jiajia Wang ◽

...

Keyword(s):

Genetic Diversity ◽

Genetic Variation ◽

Genetic Differentiation ◽

Clustering Analysis ◽

Large Scale ◽

Angelica Dahurica ◽

Significant Genetic Differentiation ◽

Large Scale Analysis ◽

Domestication Process ◽

Resources Conservation

Abstract Background： Angelica dahurica, belonging to the Apiaceae family, whose dry root is a famous traditional Chinese medicine named as “Bai zhi”. There are two cultivars (A. dahurica cv. ‘Hangbaizhi’ and A. dahurica cv. ‘Qibaizhi’), which have been domesticated for thousands of years. Long term artificial selection has led to great changes in root phenotypes of the two cultivars, and also decreased their adaptability to environment. We proposed hypothesis that the cultivars may lose some genetic diversity and highly differentiate from wild A. dahurica during the domestication process. However, few studies have been carried out on how domestication affects the genetic variation of this species. Here, we accessed the levels of genetic variation and differentiation within and between wild A. dahurica and its cultivars using 12 SSR markers. Results: The results revealed that the genetic diversity of the cultivars was much lower than that of wild A. dahurica, and A. dahurica cv. ‘Qibaizhi’ had lower genetic diversity compared to A. dahurica cv. ‘Hangbaizhi’. AMOVA analysis showed significant genetic diﬀerentiation between the wild and cultivated A. dahurica, and between A. dahurica cv. ‘Hangbaizhi’ and A. dahurica cv. ‘Qibaizhi’. The results of Bayesian, UPGMA, NJ and PcoA clustering analysis indicated that all 15 populations were assigned to two genetic clusters corresponding to the wild and cultivated resources. Bayesian clustering analysis further divided the cultivated resources into two sub-clusters corresponding to the two cultivars. Conclusions：Our study suggests that domestication process is likely the major factor resulting in the loss of genetic diversity in cultivated A. dahurica and significant genetic differentiation from the wild resources due to founder effect and/or artificially directional selections. This large-scale analysis of population genetics could provide valuable information for genetic resources conservation and breeding programs of Angelica dahurica.

Settlement system in the Angara coastal area in the 17th21st centuries in terms of strategic planning

Vestnik Tomskogo gosudarstvennogo arkhitekturno-stroitel nogo universiteta JOURNAL of Construction and Architecture ◽

10.31675/1607-1859-2021-23-6-98-116 ◽

2021 ◽

Vol 23 (6) ◽

pp. 98-116

Author(s):

O. G. Litvinova

Keyword(s):

Urban Planning ◽

Coastal Area ◽

Large Scale ◽

Formation Processes ◽

Settlement System ◽

Large Cities ◽

Large Scale Analysis ◽

Hierarchical Location ◽

Governmental Institutions ◽

System Properties

One of the fundamental urban planning tasks is currently a study of the settlement system properties. In Russian and foreign historical and urban planning science, settlement is studied according to the hierarchical location of settlements. Small and medium-sized settlements are considered as elementary lower units of large cities, their structure and formation processes are not studied. Accordingly, they are rarely considered in elaborating strategic programs of the regional development. The paper proposes the urban retrospective method, which provides a deep and large-scale analysis of the settlement system in the coastal area of the Angara River.Research is based on the cartographic sources developed by governmental institutions whose the activity depends on statistical data. Here belong Ministry of Internal Affairs, Ministry of Agriculture, Ministry of Railways. The comparative analysis of the sources provides modeling and identification of the settlement system with respect to small settlements in the coastal area of the Angara River in different periods. Significant results include the quantitative data on small settlements, since they are not interesting to urban planners of today.

large scale analysis
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Towards transdisciplinary impact of scientific publications: A longitudinal, comprehensive, and large-scale analysis on Microsoft Academic Graph

Probabilistic unsupervised classification for large-scale analysis of spectral imaging data

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

A large scale study of reader interactions with images on Wikipedia

Corrigendum: The Dihydrofolate Reductase Protein-Fragment Complementation Assay: A Survival-Selection Assay for Large-Scale Analysis of Protein–Protein Interactions

Coping and Regulatory Responses on Social Media during Health Crisis: a Large-scale Analysis

Machine Learning for Scientific Data Analysis

Limited Genetic Diversity and High Differentiation in Angelica Dahurica Resulted by Domestication: Insights to Breeding and Conservation

Settlement system in the Angara coastal area in the 17th21st centuries in terms of strategic planning

Export Citation Format

large scale analysisRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Towards transdisciplinary impact of scientific publications: A longitudinal, comprehensive, and large-scale analysis on Microsoft Academic Graph

Probabilistic unsupervised classification for large-scale analysis of spectral imaging data

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

A large scale study of reader interactions with images on Wikipedia

Corrigendum: The Dihydrofolate Reductase Protein-Fragment Complementation Assay: A Survival-Selection Assay for Large-Scale Analysis of Protein–Protein Interactions

Coping and Regulatory Responses on Social Media during Health Crisis: a Large-scale Analysis

Machine Learning for Scientific Data Analysis

Limited Genetic Diversity and High Differentiation in Angelica Dahurica Resulted by Domestication: Insights to Breeding and Conservation

Settlement system in the Angara coastal area in the 17th21st centuries in terms of strategic planning

large scale analysis
Recently Published Documents