scholarly journals Large-Scale Inference of Multivariate Regression for Heavy-Tailed and Asymmetric Data

2023 ◽  
Author(s):  
Youngseok Song ◽  
Wen Zhou ◽  
Wen-Xin Zhou
2012 ◽  
Vol 15 (03n04) ◽  
pp. 1150019 ◽  
Author(s):  
GERHARD JÄGER

The paper investigates the quantitative distribution of language types across languages of the world. The studies are based on three large-scale typological data bases: The World Color Survey, the Automated Similarity Judgment Project data base, and the World Atlas of Language Structures. The main finding is that a surprisingly large and varied collection of linguistic typologies show power law behavior. The bulk of the paper deals with the statistical validation of these findings.


2020 ◽  
Vol 34 (10) ◽  
pp. 13769-13770
Author(s):  
Xiuying Chen ◽  
Daorui Xiao ◽  
Shen Gao ◽  
Guojun Liu ◽  
Wei Lin ◽  
...  

Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM). Existing sponsored search models are all based on traditional statistical models, which have poor RPM performance when queries follow a heavy-tailed distribution. Here, we propose an RPMoriented Query Rewriting Framework (RQRF) which outputs related bid keywords that can yield high RPM. RQRF embeds both queries and bid keywords to vectors in the same implicit space, converting the rewriting probability between each query and keyword to the distance between the two vectors. For label construction, we propose an RPM-oriented sample construction method, labeling keywords based on whether or not they can lead to high RPM. Extensive experiments are conducted to evaluate performance of RQRF. In a one month large-scale real-world traffic of e-commerce sponsored search system, the proposed model significantly outperforms traditional baseline.


Author(s):  
Dylan Marcus T. Ordoñez ◽  
Rene C. Batac

In this paper, we present a simple discrete model of cascade behavior in an actual geographical space with built environments. By simultaneously triggering and relaxing random locations in a network of Voronoi cells interacting via the gravity model, we observe nontrivial statistics with heavy-tailed distributions of cells and actual area extents involved in the cascade. The distributions of these affected areas follow unimodal statistics, unlike the other externally-driven models operating over uniform neighborhoods that exhibit power-laws. Majority of the cascades are limited within the immediate neighborhoods of adjacent Voronoi cells, even for sufficiently large triggering magnitudes. The results are viewed from the perspective of inhomogeneous driving in sandpile-based models, and benchmarked with distributions obtained in other geographic datasets. The method offers a complexity perspective into the generation of large-scale events in physical and intangible flows, and explains their origins from cascaded accumulations of slow, random, and intermittent processes.


2017 ◽  
Vol 25 (2) ◽  
pp. 241-259 ◽  
Author(s):  
Gregory Eady

What explains why some survey respondents answer truthfully to a sensitive survey question, while others do not? This question is central to our understanding of a wide variety of attitudes, beliefs, and behaviors, but has remained difficult to investigate empirically due to the inherent problem of distinguishing those who are telling the truth from those who are misreporting. This article proposes a solution to this problem. It develops a method to model, within a multivariate regression context, whether survey respondents provide one response to a sensitive item in a list experiment, but answer otherwise when asked to reveal that belief openly in response to a direct question. As an empirical application, the method is applied to an original large-scale list experiment to investigate whether those on the ideological left are more likely to misreport their responses to questions about prejudice than those on the right. The method is implemented for researchers as open-source software.


Author(s):  
Matthew P. Reed ◽  
Matthew B. Parkinson

Anthropometric data are widely used in the design of chairs, seats, and other furniture intended for seated use. These data are valuable for determining the overall height, width, and depth of a chair, but contain little information about body shape that can be used to choose appropriate contours for backrests. A new method is presented for statistical modeling of three-dimensional torso shape for use in designing chairs and seats. Laser-scan data from a large-scale civilian anthropometric survey were extracted and analyzed using principal component analysis. Multivariate regression was applied to predict the average body shape as a function of overall anthropometric variables. For optimization applications, the statistical model can be exercised to randomly sample the space of torso shapes for automated virtual fitting trials. This approach also facilitates trade-off analyses and other the application of other design decision-making methods. Although seating is the specific example here, the method is generally applicable to other designing for human variability situations in which applicable body contour data are available.


Webology ◽  
2021 ◽  
Vol 18 (2) ◽  
pp. 462-474
Author(s):  
Marischa Elveny ◽  
Mahyuddin KM Nasution ◽  
Muhammad Zarlis ◽  
Syahril Efendi

Business intelligence can be said to be techniques and tools as acquisition, transforming raw data into meaningful and useful information for business analysis purposes. This study aims to build business intelligence in optimizing large-scale data based on e-metrics. E-metrics are data created from electronic-based customer behavior. As more and more large data sets become available, the challenge of analyzing data sets will get bigger and bigger. Therefore, business intelligence is currently facing new challenges, but also interesting opportunities, where can describe in real time the needs of the market share. Optimization is done using adaptive multivariate regression that can be address high-dimensional data and produce accurate predictions of response variables and produce continuous models in knots based on the smallest GCV value, where large and diverse data are simplified and then modeled based on the level of behavior similarity, basic measurements of distances, attributes, times, places, and transactions between social actors. Customer purchases will represent each preferred behaviour and a formula can be used to calculate the score for each customer using 7 input variables. Adaptive multivariate regression looks for customer behaviour so as to get the results of cutting the deviation which is the determining factor for performance on the data. The results show there are strategies and information needed for a sustainable business. Where merchants who sell fast food or food stalls are more in demand by customers.


2011 ◽  
Vol 243-249 ◽  
pp. 3685-3688
Author(s):  
Rui Gao

The textural stress has great effect on the stability of rock. According to the measured geo-stress data, through FEM and combined with the linear multivariate regression method, the geo-stress field was conducted. Using these methods, a diversion tunnel of a large-scale hydropower station was analyzed to study the stress distribution. It was shown that the stress concentration was located at the bottom of the wall and the arch top, the stress in the wall was little and the failure happened at the bottom of the wall and the arch top, then the bottom board and some areas far from the tunnel. Under the condition without considering textural stress, the stress concentration area located in the wall and the failure happened at the bottom of the wall, then in the middle of the wall and at the arch top.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Swarup Chattopadhyay ◽  
Tanujit Chakraborty ◽  
Kuntal Ghosh ◽  
Asit K. Das

Sign in / Sign up

Export Citation Format

Share Document