scholarly journals Troubleshooting Customer Behaviour Against Merchants with Adaptive Multivariate Regression

Webology ◽  
2021 ◽  
Vol 18 (2) ◽  
pp. 462-474
Author(s):  
Marischa Elveny ◽  
Mahyuddin KM Nasution ◽  
Muhammad Zarlis ◽  
Syahril Efendi

Business intelligence can be said to be techniques and tools as acquisition, transforming raw data into meaningful and useful information for business analysis purposes. This study aims to build business intelligence in optimizing large-scale data based on e-metrics. E-metrics are data created from electronic-based customer behavior. As more and more large data sets become available, the challenge of analyzing data sets will get bigger and bigger. Therefore, business intelligence is currently facing new challenges, but also interesting opportunities, where can describe in real time the needs of the market share. Optimization is done using adaptive multivariate regression that can be address high-dimensional data and produce accurate predictions of response variables and produce continuous models in knots based on the smallest GCV value, where large and diverse data are simplified and then modeled based on the level of behavior similarity, basic measurements of distances, attributes, times, places, and transactions between social actors. Customer purchases will represent each preferred behaviour and a formula can be used to calculate the score for each customer using 7 input variables. Adaptive multivariate regression looks for customer behaviour so as to get the results of cutting the deviation which is the determining factor for performance on the data. The results show there are strategies and information needed for a sustainable business. Where merchants who sell fast food or food stalls are more in demand by customers.

Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


GigaScience ◽  
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
T Cameron Waller ◽  
Jordan A Berg ◽  
Alexander Lex ◽  
Brian E Chapman ◽  
Jared Rutter

Abstract Background Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism’s cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. Results We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. Conclusions Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.


Author(s):  
Richard J. Anthony ◽  
John P. Clark ◽  
Stephen W. Kennedy ◽  
John M. Finnegan ◽  
Dean Johnson ◽  
...  

This paper describes a large scale heat flux instrumentation effort for the AFRL HIT Research Turbine. The work provides a unique amount of high frequency instrumentation to acquire fast response unsteady heat flux in a fully rotational, cooled turbine rig along with unsteady pressure data to investigate thermal loading and unsteady aerodynamic airfoil interactions. Over 1200 dynamic sensors are installed on the 1 & 1/2 stage turbine rig. Airfoils include 658 double-sided thin film gauges for heat flux, 289 fast-response Kulite pressure sensors for unsteady aerodynamic measurements, and over 40 thermocouples. An overview of the instrumentation is given with in-depth focus on the non-commercial thin film heat transfer sensors designed and produced in the Heat Flux Instrumentation Laboratory at WPAFB. The paper further describes the necessary upgrade of data acquisition systems and signal conditioning electronics to handle the increased channel requirements of the HIT Research Turbine. More modern, reliable, and efficient data processing and analysis code provides better handling of large data sets and allows easy integration with the turbine design and analysis system under development at AFRL. Example data from cooled transient blowdown tests in the TRF are included along with measurement uncertainty.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


Author(s):  
Yu-Cheng Chou ◽  
David Ko ◽  
Harry H. Cheng ◽  
Roger L. Davis ◽  
Bo Chen

Two challenging problems in the area of scientific computation are long computation time and large-scale, distributed, and diverse data sets. As the scale of science and engineering applications rapidly expands, these two problems become more manifest than ever. This paper presents the concept of Mobile Agent-based Computational Steering (MACS) for distributed simulation. The MACS allows users to apply new or modified algorithms to a running application by altering certain sections of the program code without the need of stopping the execution and recompiling the program code. The concept has been validated through an application for dynamic CFD data post processing. The validation results show that the MACS has a great potential to enhance productivity and data manageability of large-scale distributed computational systems.


Author(s):  
Brian Hoeschen ◽  
Darcy Bullock ◽  
Mark Schlappi

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.


2017 ◽  
Vol 14 (S339) ◽  
pp. 310-313
Author(s):  
R. Kgoadi ◽  
I. Whittingham ◽  
C. Engelbrecht

AbstractClustering algorithms constitute a multi-disciplinary analytical tool commonly used to summarise large data sets. Astronomical classifications are based on similarity, where celestial objects are assigned to a specific class according to specific physical features. The aim of this project is to obtain relevant information from high-dimensional data (at least three input variables in a data-frame) derived from stellar light-curves using a number of clustering algorithms such as K-means and Expectation Maximisation. In addition to identifying the best performing algorithm, we also identify a subset of features that best define stellar groups. Three methodologies are applied to a sample of Kepler time series in the temperature range 6500–19,000 K. In that spectral range, at least four classes of variable stars are expected to be found: δ Scuti, γ Doradus, Slowly Pulsating B (SPB), and (the still equivocal) Maia stars.


2001 ◽  
Vol 79 (7) ◽  
pp. 1209-1231 ◽  
Author(s):  
Rich Mooi

The fossil record of the Echinodermata is relatively complete, and is represented by specimens retaining an abundance of features comparable to that found in extant forms. This yields a half-billion-year record of evolutionary novelties unmatched in any other major group, making the Echinodermata a primary target for studies of biological change. Not all of this change can be understood by studying the rocks alone, leading to synthetic research programs. Study of literature from the past 20 years indicates that over 1400 papers on echinoderm paleontology appeared in that time, and that overall productivity has remained almost constant. Analysis of papers appearing since 1990 shows that research is driven by new finds including, but not restricted to, possible Precambrian echinoderms, bizarre new edrioasteroids, early crinoids, exquisitely preserved homalozoans, echinoids at the K-T boundary, and Antarctic echinoids, stelleroids, and crinoids. New interpretations of echinoderm body wall homologies, broad-scale syntheses of embryological information, the study of developmental trajectories through molecular markers, and the large-scale ecological and phenotypic shifts being explored through morphometry and analyses of large data sets are integrated with study of the fossils themselves. Therefore, recent advances reveal a remarkable and continuing synergistic expansion in our understanding of echinoderm evolutionary history.


2020 ◽  
Vol 20 (6) ◽  
pp. 5-17
Author(s):  
Hrachya Astsatryan ◽  
Aram Kocharyan ◽  
Daniel Hagimont ◽  
Arthur Lalayan

AbstractThe optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.


2019 ◽  
Author(s):  
N. Tessa Pierce ◽  
Luiz Irber ◽  
Taylor Reiter ◽  
Phillip Brooks ◽  
C. Titus Brown

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.


Sign in / Sign up

Export Citation Format

Share Document