massive data set
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 6)

H-INDEX

3
(FIVE YEARS 1)

Insects ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1127
Author(s):  
Jovana Bila Dubaić ◽  
Slađan Simonović ◽  
Milan Plećaš ◽  
Ljubiša Stanisavljević ◽  
Slobodan Davidović ◽  
...  

It is assumed that wild honey bees have become largely extinct across Europe since the 1980s, following the introduction of exotic ectoparasitic mite (Varroa) and the associated spillover of various pathogens. However, several recent studies reported on unmanaged colonies that survived the Varroa mite infestation. Herewith, we present another case of unmanaged, free-living population of honey bees in SE Europe, a rare case of feral bees inhabiting a large and highly populated urban area: Belgrade, the capital of Serbia. We compiled a massive data-set derived from opportunistic citizen science (>1300 records) during the 2011–2017 period and investigated whether these honey bee colonies and the high incidence of swarms could be a result of a stable, self-sustaining feral population (i.e., not of regular inflow of swarms escaping from local managed apiaries), and discussed various explanations for its existence. We also present the possibilities and challenges associated with the detection and effective monitoring of feral/wild honey bees in urban settings, and the role of citizen science in such endeavors. Our results will underpin ongoing initiatives to better understand and support naturally selected resistance mechanisms against the Varroa mite, which should contribute to alleviating current threats and risks to global apiculture and food production security.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Giovanni Bonaccorsi ◽  
Francesco Pierri ◽  
Francesco Scotti ◽  
Andrea Flori ◽  
Francesco Manaresi ◽  
...  

AbstractLockdowns implemented to address the COVID-19 pandemic have disrupted human mobility flows around the globe to an unprecedented extent and with economic consequences which are unevenly distributed across territories, firms and individuals. Here we study socioeconomic determinants of mobility disruption during both the lockdown and the recovery phases in Italy. For this purpose, we analyze a massive data set on Italian mobility from February to October 2020 and we combine it with detailed data on pre-existing local socioeconomic features of Italian administrative units. Using a set of unsupervised and supervised learning techniques, we reliably show that the least and the most affected areas persistently belong to two different clusters. Notably, the former cluster features significantly higher income per capita and lower income inequality than the latter. This distinction persists once the lockdown is lifted. The least affected areas display a swift (V-shaped) recovery in mobility patterns, while poorer, most affected areas experience a much slower (U-shaped) recovery: as of October 2020, their mobility was still significantly lower than pre-lockdown levels. These results are then detailed and confirmed with a quantile regression analysis. Our findings show that economic segregation has, thus, strengthened during the pandemic.


Stats ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 682-700
Author(s):  
Jonatha Sousa Pimentel ◽  
Raydonal Ospina ◽  
Anderson Ara

The development of a country involves directly investing in the education of its citizens. Learning analytics/educational data mining (LA/EDM) allows access to big observational structured/unstructured data captured from educational settings and relies mostly on machine learning algorithms to extract useful information. Support vector regression (SVR) is a supervised statistical learning approach that allows modelling and predicts the performance tendency of students to direct strategic plans for the development of high-quality education. In Brazil, performance can be evaluated at the national level using the average grades of a student on their National High School Exams (ENEMs) based on their socioeconomic information and school records. In this paper, we focus on increasing the computational efficiency of SVR applied to ENEM for online requisitions. The results are based on an analysis of a massive data set composed of more than five million observations, and they also indicate computational learning time savings of more than 90%, as well as providing a prediction of performance that is compatible with traditional modeling.


2021 ◽  
Vol 3 ◽  
Author(s):  
Aengus Bridgman ◽  
Eric Merkley ◽  
Oleg Zhilin ◽  
Peter John Loewen ◽  
Taylor Owen ◽  
...  

The COVID-19 pandemic has occurred alongside a worldwide infodemic where unprecedented levels of misinformation have contributed to widespread misconceptions about the novel coronavirus. Conspiracy theories, poorly sourced medical advice, and information trivializing the virus have ignored national borders and spread quickly. This information spread has occurred despite generally strong preferences for domestic national media and social media networks that tend to be geographically bounded. How, then, is (mis)information crossing borders so rapidly? Using social media and survey data, we evaluate the extent to which consumption and propagation patterns of domestic and international traditional news and social media can help inform theorizing about cross-national information spread. In a detailed case study of Canada, we employ a large multi-wave survey and a massive data set of Canadian Twitter users. We show that the majority of misinformation circulating on Twitter that is shared by Canadian accounts is retweeted from U.S.-based accounts. Moreover, exposure to U.S.-based media outlets is associated with COVID-19 misperceptions and increased exposure to U.S.-based information on Twitter is associated with an increased likelihood to post misinformation. We thus theorize and empirically identify a key globalizing infodemic pathway: disregard for national origin of social media posting.


2021 ◽  
Author(s):  
Ismael Hernández-González ◽  
Valeria Mateo-Estrada ◽  
Santiago Castillo-Ramírez

AbstractAntimicrobial resistance (AR) is a major global threat to public health. Understanding the population dynamics of AR is critical to restrain and control this issue. However, no study has provided a global picture of the resistome of Acinetobacter baumannii, a very important nosocomial pathogen. Here we analyze 1450+ genomes (covering > 40 countries and > 4 decades) to infer the global population dynamics of the resistome of this species. We show that gene flow and horizontal transfer have driven the dissemination of AR genes in A. baumannii. We found considerable variation in AR gene content across lineages. Although the individual AR gene histories have been affected by recombination, the AR gene content has been shaped by the phylogeny. Furthermore, many AR genes have been transferred to other well-known pathogens, such as Pseudomonas aeruginosa or Klebsiella pneumoniae. Finally, despite using this massive data set, we were not able to sample the whole diversity of AR genes, which suggests that this species has an open resistome. Ours results highlight the high mobilization risk of AR genes between important pathogens. On a broader perspective, this study gives a framework for an emerging perspective (resistome-centric) on the genome epidemiology (and surveillance) of bacterial pathogens.


2020 ◽  
Vol 12 (15) ◽  
pp. 6001 ◽  
Author(s):  
Eduardo Graells-Garrido ◽  
Vanessa Peña-Araya ◽  
Loreto Bravo

The rising availability of digital traces provides a fertile ground for data-driven solutions to problems in cities. However, even though a massive data set analyzed with data science methods may provide a powerful and cost-effective solution to a problem, its adoption by relevant stakeholders is not guaranteed due to adoption barriers such as lack of interpretability and interoperability. In this context, this paper proposes a methodology toward bridging two disciplines, data science and transportation, to identify, understand, and solve transportation planning problems with data-driven solutions that are suitable for adoption by urban planners and policy makers. The methodology is defined by four steps where people from both disciplines go from algorithm and model definition to the development of a potentially adoptable solution with evaluated outputs. We describe how this methodology was applied to define a model to infer commuting trips with mode of transportation from mobile phone data, and we report the lessons learned during the process.


2017 ◽  
Vol 7 (2) ◽  
pp. 251-275
Author(s):  
Edgar Dobriban

Abstract Researchers in data-rich disciplines—think of computational genomics and observational cosmology—often wish to mine large bodies of $P$-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp, a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the $P$-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous ‘standard’ methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).


Author(s):  
Yang Yang ◽  
Tiezhu Li ◽  
Tao Zhang ◽  
Wanyu Yang

In recent years, a growing number of cities in China have successively rolled out bicycle-sharing systems to facilitate bicycle use, including not only metropolises but also some underdeveloped cities with populations of less than 1 million. One of those underdeveloped cities, Xuchang, launched its bicycle-sharing system in 2014. This service provides a convenient way for members to cycle for some of their short trips. Interest in the bicycle-sharing systems of metropolises is growing rapidly; however, studies on underdeveloped cities are still limited. This study investigated the factors influencing the adoption of a bicycle-sharing system in Xuchang, by analyzing massive smart card data from July 2014 to mid-April 2015 and 500 intercept survey questionnaires in April 2015. Different questions were ready for members and nonmembers in the questionnaires and the statistical results show the characteristics of users of the Xuchang bicycle-sharing system, including demographic characteristics, travel habits, and degree of satisfaction. Moreover, the space–time distribution characteristics of the Xuchang bicycle-sharing system were analyzed by dividing a massive data set into three groups: weekdays, weekends, and holidays. Results showed that compared with the clearly defined role of “resolve the last-kilometer problem” in a metropolis, bicycle-sharing in underdeveloped cities acts as an alternative way of transportation rather than a transfer traffic mode. Results also showed that bicycle-sharing systems gained more popularity in underdeveloped cities than in metropolises because of the smaller extent of egression, resident travel habits, the traffic environment, and so on.


2015 ◽  
Vol 05 (04) ◽  
pp. 1046-1056
Author(s):  
Radhika A. ◽  
◽  
Michael Arock ◽  

2014 ◽  
Vol 687-691 ◽  
pp. 1668-1671
Author(s):  
Bin Luo ◽  
Tong Zhou Zhao ◽  
De Hua Li ◽  
Dun Bo Cai

In this paper, we study long-range dependence of hydrological records with high frequent and massive data set. For detecting breakpoints, we apply the Evolutionary Wavelet Spectrum (EWS) to provide a segmentation of the original time series. And rescaled range analysis (R/S) for estimating the Hurst exponent that describe the long-range dependence phenomenon are used. The results affirm that the hydrological records have long-range dependent (LRD) behaviors.


Sign in / Sign up

Export Citation Format

Share Document