A Theoretical and Experimental Comparison of Large-Scale Join Algorithms in Spark

The photovoltaic (PV) industry has seen remarkable progress in recent years, especially considering the advancement in materials and cell architecture. The potential of these technologies is investigated in a high insolation region of Southwestern United States, namely Las Vegas, where there is an abundance of surrounding barren land available for large scale installations. An experimental comparison of different PV technologies (HIT-Si, poly-c-Si, a-Si, and triple junction a-Si) under identical climatic conditions is the basis of this study. All tested modules have identical operating conditions, i.e. fixed installation plane, geographic location, and climatic conditions. The experiment verifies thin-film’s temperature independency, HIT-Si’s superior performance, and summarizes winter energy production of popular technologies in our climate. Lastly, an economic analysis is performed to compare the different technologies for prospective utility scale PV installations in southern Nevada, or similar climatic regions.

Download Full-text

Experimental Comparison of Excision and Swabbing Microbiological Sampling Methods for Carcasses

Journal of Food Protection ◽

10.4315/0362-028x-68.10.2163 ◽

2005 ◽

Vol 68 (10) ◽

pp. 2163-2168 ◽

Cited By ~ 23

Author(s):

RICHARD PEPPERELL ◽

CAROL-ANN REID ◽

SILVIA NICOLAU SOLANO ◽

MICHAEL L. HUTCHISON ◽

LISA D. WALTERS ◽

...

Keyword(s):

Large Scale ◽

Sampling Methods ◽

Bacterial Load ◽

Experimental Comparison ◽

Plate Count ◽

Meat Industry ◽

Custom Made ◽

Plate Counts ◽

The Difference ◽

K 12

Bovine sides, ovine carcasses, and porcine carcasses were individually inoculated by dipping in various suspensions of a marker organism (Escherichia coli K-12 or Pseudomonas fluorescens), alone or in combination with two meat-derived bacterial strains, and were sampled by two standard methods: cotton wet-dry swabbing and excision. The samples were examined for bacterial counts on plate count agar (PCA plate counts) and on violet red brilliant green agar (VRBGA plate counts) by standard International Organization for Standardization methods. Average bacterial recoveries by swabbing, expressed as a percentage of the appropriate recoveries achieved by excision, varied widely (2 to 100%). Several factors that potentially contributed to relatively low and highly variable bacterial recoveries obtained by swabbing were investigated in separate experiments. Neither the difference in size of the swabbed area (10, 50, or 100 cm2 on beef carcasses) nor the difference in time of swabbing (20 or 60 min after inoculation of pig carcasses) had a significant effect on the swabbing recoveries of the marker organism used. In an experiment with swabs preinoculated with the marker organism and then used for carcass swabbing, on average, 12% of total bacterial load was transferred inversely (i.e., from the swab to the carcass during the standard swabbing procedure). In another experiment, on average, 14% of total bacterial load was not released from the swab into the diluent during standard swab homogenization. Use of custom-made swabs with abrasive butts, around which metal pieces of pan scourers were wound, markedly increased PCA plate count recoveries from noninoculated lamb carcasses at commercial abattoirs compared with cotton swabs. In spite of the observed inferiority of the cotton wet-dry swabbing method compared with the excision method for bacterial recovery, the former is clearly preferred by the meat industry because it does not damage the carcass. Therefore, further large-scale evaluation of the two carcass sampling methods has been undertaken under commercial conditions and reported separately.

Download Full-text

Feature Point Matching Method for Aerial Image Based on Recursive Diffusion Algorithm

Symmetry ◽

10.3390/sym13030407 ◽

2021 ◽

Vol 13 (3) ◽

pp. 407

Author(s):

Jiayan Shen ◽

Xiucheng Guo ◽

Wenzong Zhou ◽

Yiming Zhang ◽

Juchen Li

Keyword(s):

Large Scale ◽

High Density ◽

Feature Point ◽

Aerial Images ◽

Image Feature ◽

Aerial Image ◽

Experimental Comparison ◽

Matching Accuracy ◽

Point Matching ◽

Feature Point Matching

Aerial images are large-scale and susceptible to light. Traditional image feature point matching algorithms cannot achieve satisfactory matching accuracy for aerial images. This paper proposes a recursive diffusion algorithm, which is scale-invariant and can be used to extract symmetrical areas of different images. This narrows the matching range of feature points by extracting high-density areas of the image and improving the matching accuracy through correlation analysis of high-density areas. Through experimental comparison, it can be found that the recursive diffusion algorithm has more advantages compared to the correlation coefficient method and the mean shift algorithm when matching accuracy of aerial images, especially when the light of aerial images changes greatly.

Download Full-text

Accurate end-to-end delay bound analysis for large-scale network via experimental comparison

IEICE Transactions on Communications ◽

10.1587/transcom.2021ebp3106 ◽

2021 ◽

Author(s):

Xiao HONG ◽

Yuehong GAO ◽

Hongwen YANG

Keyword(s):

Large Scale ◽

Experimental Comparison ◽

Delay Bound ◽

Large Scale Network ◽

End To End Delay ◽

Scale Network ◽

End To End ◽

Bound Analysis

Download Full-text

Optimizations for filter-based join algorithms in MapReduce

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201220 ◽

2021 ◽

pp. 1-18

Author(s):

Salahaldeen Rababa ◽

Amer Al-Badarneh

Keyword(s):

Cost Analysis ◽

Execution Time ◽

Large Scale ◽

Programming Model ◽

State Of The Art ◽

Total Execution Time ◽

Large Scale Data ◽

Heterogeneous Datasets ◽

Join Algorithms ◽

Scale Data

Large-scale datasets collected from heterogeneous sources often require a join operation to extract valuable information. MapReduce is an efficient programming model for processing large-scale data. However, it has some limitations in processing heterogeneous datasets. This is because of the large amount of redundant intermediate records that are transferred through the network. Several filtering techniques have been developed to improve the join performance, but they require multiple MapReduce jobs to process the input datasets. To address this issue, the adaptive filter-based join algorithms are presented in this paper. Specifically, three join algorithms are introduced to perform the processes of filters creation and redundant records elimination within a single MapReduce job. A cost analysis of the introduced join algorithms shows that the I/O cost is reduced compared to the state-of-the-art filter-based join algorithms. The performance of the join algorithms was evaluated in terms of the total execution time and the total amount of I/O data transferred. The experimental results show that the adaptive Bloom join, semi-adaptive intersection Bloom join, and adaptive intersection Bloom join decrease the total execution time by 30%, 25%, and 35%, respectively; and reduce the total amount of I/O data transferred by 18%, 25%, and 50%, respectively.

Download Full-text

Effects of cognitive design principles on user’s performance and preference

Information Design Journal ◽

10.1075/idj.21.2.05wes ◽

2014 ◽

Vol 21 (2) ◽

pp. 129-145

Author(s):

Hans Westerbeek ◽

Marije van Amelsvoort ◽

Alfons Maes ◽

Marc Swerts

Keyword(s):

Large Scale ◽

Design Principle ◽

Experimental Comparison ◽

Limited Information ◽

World Cup ◽

Natural Form ◽

Natural Mapping ◽

Information Displays ◽

Visual Variables ◽

Cognitive Design

We present an analytic and a large scale experimental comparison of two informationally equivalent information displays of soccer statistics. Both displays were presented by the BBC during the 2010 FIFA World Cup. The displays mainly differ in terms of the number and types of cognitively natural mappings between visual variables and meaning. Theoretically, such natural form-meaning mappings help users to interpret the information quickly and easily. However, our analysis indicates that the design which contains most of these mappings is inevitably inconsistent in how forms and meanings are mapped to each other. The experiment shows that this inconsistency was detrimental for how fast people can find information in the display and for which display people prefer to use. Our findings shed new light on the well-established cognitive design principle of natural mapping: while in theory, information designs may benefit from natural mapping, in practice its applicability may be limited. Information designs that contain a high number of form-meaning mappings, for example, for aesthetic reasons, risk being inconsistent and too complex for users, leading them to find information less quickly and less easily.

Download Full-text

Experimental comparison of Oxygen Consumption Calorimetry and Sensible Enthalpy Rise Approach for determining the heat release rate of large-scale lithium-ion battery fires

Fire Safety Journal ◽

10.1016/j.firesaf.2021.103447 ◽

2021 ◽

Vol 126 ◽

pp. 103447

Author(s):

Sascha Voigt ◽

Felix Sträubig ◽

Stephan Palis ◽

Arno Kwade ◽

Christian Knaust

Keyword(s):

Oxygen Consumption ◽

Heat Release ◽

Lithium Ion Battery ◽

Heat Release Rate ◽

Release Rate ◽

Large Scale ◽

Lithium Ion ◽

Experimental Comparison ◽

Oxygen Consumption Calorimetry

Download Full-text

NetworKit: A tool suite for large-scale complex network analysis

Network Science ◽

10.1017/nws.2016.20 ◽

2016 ◽

Vol 4 (4) ◽

pp. 508-530 ◽

Cited By ~ 48

Author(s):

CHRISTIAN L. STAUDT ◽

ALEKSEJS SAZONOVS ◽

HENNING MEYERHENKE

Keyword(s):

Network Analysis ◽

Large Scale ◽

Experimental Comparison ◽

Algorithm Engineering ◽

Data Sets ◽

Domain Experts ◽

Modular Software ◽

Wide Range ◽

Open Source Software Package ◽

Efficient Data

AbstractWe introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python frontend, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

Download Full-text

Learning from user interactions with rankings

ACM SIGIR Forum ◽

10.1145/3483382.3483402 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Harrie Oosterhuis

Keyword(s):

Supervised Learning ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Learning To Rank ◽

User Preferences ◽

User Preference ◽

Experimental Comparison ◽

The Third ◽

Ranking Systems

Ranking systems form the basis for online search engines and recommendation services. They process large collections of items, for instance web pages or e-commerce products, and present the user with a small ordered selection. The goal of a ranking system is to help a user find the items they are looking for with the least amount of effort. Thus the rankings they produce should place the most relevant or preferred items at the top of the ranking. Learning to rank is a field within machine learning that covers methods which optimize ranking systems w.r.t. this goal. Traditional supervised learning to rank methods utilize expert-judgements to evaluate and learn, however, in many situations such judgements are impossible or infeasible to obtain. As a solution, methods have been introduced that perform learning to rank based on user clicks instead. The difficulty with clicks is that they are not only affected by user preferences, but also by what rankings were displayed. Therefore, these methods have to prevent being biased by other factors than user preference. This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. The first part of the thesis consists of three chapters that look at online learning to rank algorithms which learn by directly interacting with users. Its first chapter considers large scale evaluation and shows existing methods do not guarantee correctness and user experience, we then introduce a novel method that can guarantee both. The second chapter proposes a novel pairwise method for learning from clicks that contrasts with the previous prevalent dueling-bandit methods. Our experiments show that our pairwise method greatly outperforms the dueling-bandit approach. The third chapter further confirms these findings in an extensive experimental comparison, furthermore, we also show that the theory behind the dueling-bandit approach is unsound w.r.t. deterministic ranking systems. The second part of the thesis consists of four chapters that look at counterfactual learning to rank algorithms which learn from historically logged click data. Its first chapter takes the existing approach and makes it applicable to top- k settings where not all items can be displayed at once. It also shows that state-of-the-art supervised learning to rank methods can be applied in the counterfactual scenario. The second chapter introduces a method that combines the robust generalization of feature-based models with the high-performance specialization of tabular models. The third chapter looks at evaluation and introduces a method for finding the optimal logging policy that collects click data in a way that minimizes the variance of estimated ranking metrics. By applying this method during the gathering of clicks, one can turn counterfactual evaluation into online evaluation. The fourth chapter proposes a novel counterfactual estimator that considers the possibility that the logging policy has been updated during the gathering of click data. As a result, it can learn much more efficiently when deployed in an online scenario where interventions can take place. The resulting approach is thus both online and counterfactual, our experimental results show that its performance matches the state-of-the-art in both the online and the counterfactual scenario. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Maarten de Rijke. Available at: https://hdl.handle.net/11245.1/8ff3aa38-97fb-4d2a-8127-a29a03af4d5c.

Download Full-text

A New Information-Theoretic Method for Advertisement Conversion Rate Prediction for Large-Scale Sparse Data Based on Deep Learning

Entropy ◽

10.3390/e22060643 ◽

2020 ◽

Vol 22 (6) ◽

pp. 643 ◽

Cited By ~ 1

Author(s):

Qianchen Xia ◽

Jianghua Lv ◽

Shilong Ma ◽

Bocheng Gao ◽

Zhenhua Wang

Keyword(s):

Conversion Rate ◽

Large Scale ◽

Prediction Models ◽

User Behavior ◽

Online Advertising ◽

Sparse Data ◽

User Preferences ◽

Experimental Comparison ◽

Purchasing Intention ◽

Behavior Sequences

With the development of online advertising technology, the accurate targeted advertising based on user preferences is obviously more suitable both for the market and users. The amount of conversion can be properly increased by predicting the user’s purchasing intention based on the advertising Conversion Rate (CVR). According to the high-dimensional and sparse characteristics of the historical behavior sequences, this paper proposes a LSLM_LSTM model, which is for the advertising CVR prediction based on large-scale sparse data. This model aims at minimizing the loss, utilizing the Adaptive Moment Estimation (Adam) optimization algorithm to mine the nonlinear patterns hidden in the data automatically. Through the experimental comparison with a variety of typical CVR prediction models, it is found that the proposed LSLM_LSTM model can utilize the time series characteristics of user behavior sequences more effectively, as well as mine the potential relationship hidden in the features, which brings higher accuracy and trains faster compared to those with consideration of only low or high order features.

Download Full-text