Accelerating Ranking in E-Commerce Search Engines through Contextual Factor Selection

In large-scale search systems, the quality of the ranking results is continually improved with the introduction of more factors from complex procedures. Meanwhile, the increase in factors demands more computation resources and increases system response latency. It has been observed that, under some certain context a search instance may require only a small set of useful factors instead of all factors in order to return high quality results. Therefore, removing ineffective factors accordingly can significantly improve system efficiency. In this paper, we report our experience incorporating our Contextual Factor Selection (CFS) approach into the Taobao e-commerce platform to optimize the selection of factors based on the context of each search query in order to simultaneously achieve high quality search results while significantly reducing latency time. This problem is treated as a combinatorial optimization problem which can be tackled through a sequential decision-making procedure. The problem can be efficiently solved by CFS through a deep reinforcement learning method with reward shaping to address the problems of reward signal scarcity and wide reward signal distribution in real-world search engines. Through extensive off-line experiments based on data from the Taobao.com platform, CFS is shown to significantly outperform state-of-the-art approaches. Online deployment on Taobao.com demonstrated that CFS is able to reduce average search latency time by more than 40% compared to the previous approach with negligible reduction in search result quality. Under peak usage during the Single's Day Shopping Festival (November 11th) in 2017, CFS reduced peak load search latency time by 33% compared to the previous approach, helping Taobao.com achieve 40% higher revenue than the same period during 2016.

Download Full-text

Improving Search Engine Efficiency through Contextual Factor Selection

AI Magazine ◽

10.1609/aimag.v42i2.15099 ◽

2021 ◽

Vol 42 (2) ◽

pp. 50-58

Author(s):

Anxiang Zeng ◽

Han Yu ◽

Qing Da ◽

Yusen Zhan ◽

Yang Yu ◽

...

Keyword(s):

Search Engine ◽

Large Scale ◽

Learning To Rank ◽

Contextual Factor ◽

System Response ◽

Search Query ◽

Previous Approach ◽

Engine Efficiency ◽

Search Result ◽

Selection Of

Learning to rank (LTR) is an important artificial intelligence (AI) approach supporting the operation of many search engines. In large-scale search systems, the ranking results are continually improved with the introduction of more factors to be considered by LTR. However, the more factors being considered, the more computation resources required, which in turn, results in increased system response latency. Therefore, removing redundant factors can significantly improve search engine efficiency. In this paper, we report on our experience incorporating our Contextual Factor Selection (CFS) deep reinforcement learning approach into the Taobao e-commerce platform to optimize the selection of factors based on the context of each search query to simultaneously maintaining search result quality while significantly reducing latency. Online deployment on Taobao.com demonstrated that CFS is able to reduce average search latency under everyday use scenarios by more than 40% compared to the previous approach with comparable search result quality. Under peak usage during the Single’s Day Shopping Festival (November 11th) in 2017, CFS reduced the average search latency by 20% compared to the previous approach.

Download Full-text

Vibration characteristics of triple-gear-rotor system in compressed air energy storage under variable torque load

Science Progress ◽

10.1177/0036850420987058 ◽

2021 ◽

Vol 104 (1) ◽

pp. 003685042098705

Author(s):

Xinran Wang ◽

Yangli Zhu ◽

Wen Li ◽

Dongxu Hu ◽

Xuehui Zhang ◽

...

Keyword(s):

Large Scale ◽

Three Dimensional ◽

Element Model ◽

Rotor System ◽

Dynamic Responses ◽

System Response ◽

Rotating Speed ◽

Torque Load ◽

Set Up ◽

Two Stages

This paper focuses on the effects of the off-design operation of CAES on the dynamic characteristics of the triple-gear-rotor system. A finite element model of the system is set up with unbalanced excitations, torque load excitations, and backlash which lead to variations of tooth contact status. An experiment is carried out to verify the accuracy of the mathematical model. The results show that when the system is subjected to large-scale torque load lifting at a high rotating speed, it has two stages of relatively strong periodicity when the torque load is light, and of chaotic when the torque load is heavy, with the transition between the two states being relatively quick and violent. The analysis of the three-dimensional acceleration spectrum and the meshing force shows that the variation in the meshing state and the fluctuation of the meshing force is the basic reasons for the variation in the system response with the torque load. In addition, the three rotors in the triple-gear-rotor system studied show a strong similarity in the meshing states and meshing force fluctuations, which result in the similarity in the dynamic responses of the three rotors.

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Tiered Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441299 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-52

Author(s):

Lorenzo De Stefani ◽

Erisa Terolli ◽

Eli Upfal

Keyword(s):

Large Scale ◽

Analysis Of Algorithms ◽

Base Layer ◽

Single Edge ◽

Real World Data ◽

High Quality ◽

Large Graphs ◽

Massive Graphs ◽

Variance Estimate ◽

Low Probability

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Download Full-text

Large-Scale and Deep-Seated Gravitational Slope Deformations on Mars: A Review

Geosciences ◽

10.3390/geosciences11040174 ◽

2021 ◽

Vol 11 (4) ◽

pp. 174

Author(s):

Marco Emanuele Discenza ◽

Carlo Esposito ◽

Goro Komatsu ◽

Enrico Miccadei

Keyword(s):

Large Scale ◽

High Quality ◽

Surface Data ◽

Geomorphological Processes ◽

Gravitational Processes ◽

Quality Surface ◽

High Quality Surface ◽

Mars Missions

The availability of high-quality surface data acquired by recent Mars missions and the development of increasingly accurate methods for analysis have made it possible to identify, describe, and analyze many geological and geomorphological processes previously unknown or unstudied on Mars. Among these, the slow and large-scale slope deformational phenomena, generally known as Deep-Seated Gravitational Slope Deformations (DSGSDs), are of particular interest. Since the early 2000s, several studies were conducted in order to identify and analyze Martian large-scale gravitational processes. Similar to what happens on Earth, these phenomena apparently occur in diverse morpho-structural conditions on Mars. Nevertheless, the difficulty of directly studying geological, structural, and geomorphological characteristics of the planet makes the analysis of these phenomena particularly complex, leaving numerous questions to be answered. This paper reports a synthesis of all the known studies conducted on large-scale deformational processes on Mars to date, in order to provide a complete and exhaustive picture of the phenomena. After the synthesis of the literature studies, the specific characteristics of the phenomena are analyzed, and the remaining main open issued are described.

Download Full-text

FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00068 ◽

2020 ◽

Cited By ~ 1

Author(s):

Haotian Yang ◽

Hao Zhu ◽

Yanru Wang ◽

Mingkai Huang ◽

Qiu Shen ◽

...

Keyword(s):

Large Scale ◽

High Quality ◽

3D Face

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

A Novel and Efficient High-Yield Method for Preparing Bacterial Ghosts

Toxins ◽

10.3390/toxins13060420 ◽

2021 ◽

Vol 13 (6) ◽

pp. 420

Author(s):

Yi Ma ◽

Liu Cui ◽

Meng Wang ◽

Qiuli Sun ◽

Kaisheng Liu ◽

...

Keyword(s):

Large Scale ◽

Expression System ◽

High Yield ◽

Wild Type ◽

High Quality ◽

Efficient Protection ◽

Bacterial Ghosts ◽

Cell Envelopes ◽

Extracellular Structures ◽

First Time

Bacterial ghosts (BGs) are empty cell envelopes possessing native extracellular structures without a cytoplasm and genetic materials. BGs are proposed to have significant prospects in biomedical research as vaccines or delivery carriers. The applications of BGs are often limited by inefficient bacterial lysis and a low yield. To solve these problems, we compared the lysis efficiency of the wild-type protein E (EW) from phage ΦX174 and the screened mutant protein E (EM) in the Escherichia coli BL21(DE3) strain. The results show that the lysis efficiency mediated by protein EM was improved. The implementation of the pLysS plasmid allowed nearly 100% lysis efficiency, with a high initial cell density as high as OD600 = 2.0, which was higher compared to the commonly used BG preparation method. The results of Western blot analysis and immunofluorescence indicate that the expression level of protein EM was significantly higher than that of the non-pLysS plasmid. High-quality BGs were observed by SEM and TEM. To verify the applicability of this method in other bacteria, the T7 RNA polymerase expression system was successfully constructed in Salmonella enterica (S. Enterica, SE). A pET vector containing EM and pLysS were introduced to obtain high-quality SE ghosts which could provide efficient protection for humans and animals. This paper describes a novel and commonly used method to produce high-quality BGs on a large scale for the first time.

Download Full-text

Research on Modulation Multiplexing System and Technique Employing Tilted Grating

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.397-400.1643 ◽

2013 ◽

Vol 397-400 ◽

pp. 1643-1647

Author(s):

Hui Bo Wang ◽

Zhi Quan Li

Keyword(s):

Delay Time ◽

Large Scale ◽

Intelligent System ◽

Simulation Analysis ◽

Response Speed ◽

Signal Frequency ◽

System Response ◽

Photodiode Array ◽

Speed Simulation ◽

Demodulation Method

A dual demodulation technique based on tilted grating and InGaAs photodiode array is proposed; using the coupling modes of the cladding, a wavelength demodulation method with the tilted grating as the spectroscopic device is realized. This method can achieve that the demodulation of the channel in which the sensing information changed and the optimization of collection rules of the system. Two tunable F-P filters scan and demodulate the sensing path simultaneously to further improve the system response speed. Simulation analysis and experiments results indicate that the average demodulation time is 40ms and the average signal frequency can reach 15Hz. In addition, the demodulation bandwidth is 40nm, and its wavelength demodulation precision can reach 20pm. The system has advantages of the shorter delay time, and the demodulation time is immune to the number of channels, etc.. Therefore, this system is able to meet the smart requirement of some complex systems and large scale distributed intelligent system.

Download Full-text