OLTP In Real Life: A Large-scale Study of Database Behavior in Modern Online Retail

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Robots for Elderly Care in the Home: A Landscape Analysis and Co-Design Toolkit

International Journal of Social Robotics ◽

10.1007/s12369-021-00816-3 ◽

2021 ◽

Author(s):

Gianluca Bardaro ◽

Alessio Antonini ◽

Enrico Motta

Keyword(s):

Large Scale ◽

Daily Life ◽

Elderly Care ◽

Real Life ◽

Robotic Platform ◽

Holistic View ◽

Personal Assistants ◽

Healthcare Interventions ◽

The Impact ◽

The Eu

AbstractOver the last two decades, several deployments of robots for in-house assistance of older adults have been trialled. However, these solutions are mostly prototypes and remain unused in real-life scenarios. In this work, we review the historical and current landscape of the field, to try and understand why robots have yet to succeed as personal assistants in daily life. Our analysis focuses on two complementary aspects: the capabilities of the physical platform and the logic of the deployment. The former analysis shows regularities in hardware configurations and functionalities, leading to the definition of a set of six application-level capabilities (exploration, identification, remote control, communication, manipulation, and digital situatedness). The latter focuses on the impact of robots on the daily life of users and categorises the deployment of robots for healthcare interventions using three types of services: support, mitigation, and response. Our investigation reveals that the value of healthcare interventions is limited by a stagnation of functionalities and a disconnection between the robotic platform and the design of the intervention. To address this issue, we propose a novel co-design toolkit, which uses an ecological framework for robot interventions in the healthcare domain. Our approach connects robot capabilities with known geriatric factors, to create a holistic view encompassing both the physical platform and the logic of the deployment. As a case study-based validation, we discuss the use of the toolkit in the pre-design of the robotic platform for an pilot intervention, part of the EU large-scale pilot of the EU H2020 GATEKEEPER project.

Download Full-text

Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks

Big Data and Cognitive Computing ◽

10.3390/bdcc5010014 ◽

2021 ◽

Vol 5 (1) ◽

pp. 14

Author(s):

Christos Makris ◽

Georgios Pispirigos

Keyword(s):

Social Networks ◽

Graph Partitioning ◽

Large Scale ◽

Real Life ◽

Information Networks ◽

Digital Marketing ◽

Partitioning Problems ◽

Iterative Solutions ◽

Community Extraction ◽

Stability And Accuracy

Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy.

Download Full-text

Large-scale study finds no glyphosate-cancer connection

C&EN Global Enterprise ◽

10.1021/cen-09546-govcon3 ◽

2017 ◽

Vol 95 (46) ◽

pp. 15-15

Keyword(s):

Large Scale ◽

Large Scale Study

Download Full-text

Sex differences in the associations of nonmedical use of prescription drugs with self-injurious thoughts and behaviors among adolescents: A large-scale study in China

Journal of Affective Disorders ◽

10.1016/j.jad.2021.02.034 ◽

2021 ◽

Vol 285 ◽

pp. 29-36

Author(s):

Bo Xie ◽

Beifang Fan ◽

Wanxin Wang ◽

Wenyan Li ◽

Ciyong Lu ◽

...

Keyword(s):

Sex Differences ◽

Prescription Drugs ◽

Large Scale ◽

Large Scale Study ◽

Nonmedical Use ◽

And Behaviors

Download Full-text

Correction to: Methane and Electricity Production from Poultry Litter Digestion in the Amazon Region of Brazil: A Large-Scale Study

Waste and Biomass Valorization ◽

10.1007/s12649-021-01447-5 ◽

2021 ◽

Author(s):

Marcelo Mendes Pedroza ◽

Wanderson Gomes da Silva ◽

Luciene Santos de Carvalho ◽

Alice Rocha de Souza ◽

Girlene Figueiredo Maciel

Keyword(s):

Large Scale ◽

Poultry Litter ◽

Amazon Region ◽

Electricity Production ◽

Large Scale Study

Download Full-text

First Detection of SARS-CoV-2 B.1.1.7 Variant of Concern in an Asymptomatic Dog in Spain

Viruses ◽

10.3390/v13071379 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1379

Author(s):

Sandra Barroso-Arévalo ◽

Belén Rivera ◽

Lucas Domínguez ◽

José M. Sánchez-Vizcaíno

Keyword(s):

Active Surveillance ◽

Large Scale ◽

Infectious Virus ◽

Viral Isolation ◽

Viral Loads ◽

Large Scale Study ◽

Rectal Swabs

Natural SARS-CoV-2 infection in pets has been widely documented during the last year. Although the majority of reports suggested that dogs’ susceptibility to the infection is low, little is known about viral pathogenicity and transmissibility in the case of variants of concern, such as B.1.1.7 in this species. Here, as part of a large-scale study on SARS-CoV-2 prevalence in pets in Spain, we have detected the B.1.1.7 variant of concern (VOC) in a dog whose owners were infected with SARS-CoV-2. The animal did not present any symptoms, but viral loads were high in the nasal and rectal swabs. In addition, viral isolation was possible from both swabs, demonstrating that the dog was shedding infectious virus. Seroconversion occurred 23 days after the first sampling. This study documents the first detection of B.1.1.7 VOC in a dog in Spain and emphasizes the importance of performing active surveillance and genomic investigation on infected animals.

Download Full-text