multiple data sources
Recently Published Documents


TOTAL DOCUMENTS

397
(FIVE YEARS 96)

H-INDEX

33
(FIVE YEARS 5)

2021 ◽  
Vol 580 ◽  
pp. 819-837
Author(s):  
Yang Ma ◽  
Xujun Zhao ◽  
Chaowei Zhang ◽  
Jifu Zhang ◽  
Xiao Qin

2021 ◽  
Vol 12 (5) ◽  
pp. 1-23
Author(s):  
Chuanbo Hu ◽  
Minglei Yin ◽  
Bin Liu ◽  
Xin Li ◽  
Yanfang Ye

Illicit drug trafficking via social media sites such as Instagram have become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge for the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Nearly 4,000 user accounts, of which more than 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.


2021 ◽  
pp. 1-15
Author(s):  
Ali Reza Honarvar ◽  
Ashkan Sami

At present, the issue of air quality in populated urban areas is recognized as an environmental crisis. Air pollution affects the sustainability of the city. In controlling air pollution and protecting its hazards from humans, air quality data are very important. However, the costs of constructing and maintaining air quality registration infrastructure are very expensive and high, and air quality data recording at one point will not be generalizable to even a few kilometers. Some of the gains come from the integration of multiple data sources, which can never be achieved through independent single-source processing. Urban organizations in each city independently produce and record data relevant to the organization’s goals and objectives. These issues create separate data silos associated with an urban system. These data are varied in model and structure, and the integration of such data provides an appropriate opportunity to discover knowledge that can be useful in urban planning and decision making. This paper aims to show the generality of our previous research, which proposed a novel model to predict Particulate Matter (PM) as the main factor of air quality in the regions of the cities where air quality sensors are not available through urban big data resources integration, by extending the model and experiments with various configuration for different settings in smart cities. This work extends the evaluation scenarios of the model with the extended dataset of city of Aarhus, in Denmark, and compare the model performance against various specified baselines. Details of removing the heterogeneity of multiple data sources in the Multiple Data Set Aggregator & Heterogeneity Remover (MDA&HR) and improving the operation of Train Data Splitter (TDS) part of the model by focusing on the finding more similar pattern of air quality also are presented in this paper. The acceptable accuracy of the results shows the generality of the model.


2021 ◽  
Author(s):  
Volker Hoffmann ◽  
Jonatan Ralf Axel Klemets ◽  
Bendik Nybakk Torsaeter ◽  
Gjert H. Rosenlund ◽  
Christian A. Andresen

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12113
Author(s):  
David L. Miller ◽  
David Fifield ◽  
Ewan Wakefield ◽  
Douglas B. Sigourney

Spatial models of density and abundance are widely used in both ecological research (e.g., to study habitat use) and wildlife management (e.g., for population monitoring and environmental impact assessment). Increasingly, modellers are tasked with integrating data from multiple sources, collected via different observation processes. Distance sampling is an efficient and widely used survey and analysis technique. Within this framework, observation processes are modelled via detection functions. We seek to take multiple data sources and fit them in a single spatial model. Density surface models (DSMs) are a two-stage approach: first accounting for detectability via distance sampling methods, then modelling distribution via a generalized additive model. However, current software and theory does not address the issue of multiple data sources. We extend the DSM approach to accommodate data from multiple surveys, collected via conventional distance sampling, double-observer distance sampling (used to account for incomplete detection at zero distance) and strip transects. Variance propagation ensures that uncertainty is correctly accounted for in final estimates of abundance. Methods described here are implemented in the dsm R package. We briefly analyse two datasets to illustrate these new developments. Our new methodology enables data from multiple distance sampling surveys of different types to be treated in a single spatial model, enabling more robust abundance estimation, potentially over wider geographical or temporal domains.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jin Chen ◽  
Tianyuan Chen ◽  
Yifei Song ◽  
Bin Hao ◽  
Ling Ma

AbstractPrior literature emphasizes the distinct roles of differently affiliated venture capitalists (VCs) in nurturing innovation and entrepreneurship. Although China has become the second largest VC market in the world, the unavailability of high-quality datasets on VC affiliation in China’s market hinders such research efforts. To fill up this important gap, we compiled a new panel dataset of VC affiliation in China’s market from multiple data sources. Specifically, we drew on a list of 6,553 VCs that have invested in China between 2000 and 2016 from CVSource database, collected VC’s shareholder information from public sources, and developed a multi-stage procedure to label each VC as the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university endowment/spin-out unit), and PenVC (pension-affiliated VC). We also denoted whether a VC has foreign background. This dataset helps researchers conduct more nuanced investigations into the investment behaviors of different VCs and their distinct impacts on innovation and entrepreneurship in China’s context.


Sign in / Sign up

Export Citation Format

Share Document