Troubleshooting Customer Behaviour Against Merchants with Adaptive Multivariate Regression

Business intelligence can be said to be techniques and tools as acquisition, transforming raw data into meaningful and useful information for business analysis purposes. This study aims to build business intelligence in optimizing large-scale data based on e-metrics. E-metrics are data created from electronic-based customer behavior. As more and more large data sets become available, the challenge of analyzing data sets will get bigger and bigger. Therefore, business intelligence is currently facing new challenges, but also interesting opportunities, where can describe in real time the needs of the market share. Optimization is done using adaptive multivariate regression that can be address high-dimensional data and produce accurate predictions of response variables and produce continuous models in knots based on the smallest GCV value, where large and diverse data are simplified and then modeled based on the level of behavior similarity, basic measurements of distances, attributes, times, places, and transactions between social actors. Customer purchases will represent each preferred behaviour and a formula can be used to calculate the score for each customer using 7 input variables. Adaptive multivariate regression looks for customer behaviour so as to get the results of cutting the deviation which is the determining factor for performance on the data. The results show there are strategies and information needed for a sustainable business. Where merchants who sell fast food or food stalls are more in demand by customers.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Compartment and hub definitions tune metabolic networks for metabolomic interpretations

GigaScience ◽

10.1093/gigascience/giz137 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

T Cameron Waller ◽

Jordan A Berg ◽

Alexander Lex ◽

Brian E Chapman ◽

Jared Rutter

Keyword(s):

Large Scale ◽

Metabolic Networks ◽

Shortest Paths ◽

Large Data ◽

Differential Regulation ◽

Large Data Sets ◽

Data Sets ◽

Human Metabolism ◽

Experimental Conditions ◽

Systemic Model

Abstract Background Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism’s cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. Results We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. Conclusions Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.

Download Full-text

Flexible Non-Intrusive Heat Flux Instrumentation for the AFRL Research Turbine

Volume 5: Heat Transfer, Parts A and B ◽

10.1115/gt2011-46853 ◽

2011 ◽

Cited By ~ 7

Author(s):

Richard J. Anthony ◽

John P. Clark ◽

Stephen W. Kennedy ◽

John M. Finnegan ◽

Dean Johnson ◽

...

Keyword(s):

Thin Film ◽

Heat Flux ◽

Large Scale ◽

Thermal Loading ◽

Pressure Sensors ◽

Large Data ◽

Fast Response ◽

Data Sets ◽

Unsteady Aerodynamic ◽

Analysis System

This paper describes a large scale heat flux instrumentation effort for the AFRL HIT Research Turbine. The work provides a unique amount of high frequency instrumentation to acquire fast response unsteady heat flux in a fully rotational, cooled turbine rig along with unsteady pressure data to investigate thermal loading and unsteady aerodynamic airfoil interactions. Over 1200 dynamic sensors are installed on the 1 & 1/2 stage turbine rig. Airfoils include 658 double-sided thin film gauges for heat flux, 289 fast-response Kulite pressure sensors for unsteady aerodynamic measurements, and over 40 thermocouples. An overview of the instrumentation is given with in-depth focus on the non-commercial thin film heat transfer sensors designed and produced in the Heat Flux Instrumentation Laboratory at WPAFB. The paper further describes the necessary upgrade of data acquisition systems and signal conditioning electronics to handle the increased channel requirements of the HIT Research Turbine. More modern, reliable, and efficient data processing and analysis code provides better handling of large data sets and allows easy integration with the turbine design and analysis system under development at AFRL. Example data from cooled transient blowdown tests in the TRF are included along with measurement uncertainty.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Mobile Agent-Based Computational Steering for Distributed Simulation

Volume 12: New Developments in Simulation Methods and Software for Engineering Applications ◽

10.1115/imece2007-43672 ◽

2007 ◽

Author(s):

Yu-Cheng Chou ◽

David Ko ◽

Harry H. Cheng ◽

Roger L. Davis ◽

Bo Chen

Keyword(s):

Mobile Agent ◽

Large Scale ◽

Distributed Simulation ◽

Computation Time ◽

Data Sets ◽

Computational Steering ◽

Program Code ◽

Agent Based ◽

Diverse Data ◽

Computational Systems

Two challenging problems in the area of scientific computation are long computation time and large-scale, distributed, and diverse data sets. As the scale of science and engineering applications rapidly expands, these two problems become more manifest than ever. This paper presents the concept of Mobile Agent-based Computational Steering (MACS) for distributed simulation. The MACS allows users to apply new or modified algorithms to a running application by altering certain sections of the program code without the need of stopping the execution and recompiling the program code. The concept has been validated through an application for dynamic CFD data post processing. The validation results show that the MACS has a great potential to enhance productivity and data manageability of large-scale distributed computational systems.

Download Full-text

Estimating Intersection Control Delay Using Large Data Sets of Travel Time from a Global Positioning System

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198105191700103 ◽

2005 ◽

Vol 1917 (1) ◽

pp. 18-27

Author(s):

Brian Hoeschen ◽

Darcy Bullock ◽

Mark Schlappi

Keyword(s):

Travel Time ◽

Traffic Engineering ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Control Delay ◽

Diverse Data ◽

Intersection Control ◽

Better Than

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.

Download Full-text

Searching for Pulsating Stars Using Clustering Algorithms

Proceedings of the International Astronomical Union ◽

10.1017/s1743921318002855 ◽

2017 ◽

Vol 14 (S339) ◽

pp. 310-313

Author(s):

R. Kgoadi ◽

I. Whittingham ◽

C. Engelbrecht

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Relevant Information ◽

Variable Stars ◽

Data Sets ◽

Specific Class ◽

Pulsating Stars ◽

Expectation Maximisation ◽

Input Variables ◽

Physical Features

AbstractClustering algorithms constitute a multi-disciplinary analytical tool commonly used to summarise large data sets. Astronomical classifications are based on similarity, where celestial objects are assigned to a specific class according to specific physical features. The aim of this project is to obtain relevant information from high-dimensional data (at least three input variables in a data-frame) derived from stellar light-curves using a number of clustering algorithms such as K-means and Expectation Maximisation. In addition to identifying the best performing algorithm, we also identify a subset of features that best define stellar groups. Three methodologies are applied to a sample of Kepler time series in the temperature range 6500–19,000 K. In that spectral range, at least four classes of variable stars are expected to be found: δ Scuti, γ Doradus, Slowly Pulsating B (SPB), and (the still equivocal) Maia stars.

Download Full-text

Not all written in stone: interdisciplinary syntheses in echinoderm paleontology

Canadian Journal of Zoology ◽

10.1139/z00-217 ◽

2001 ◽

Vol 79 (7) ◽

pp. 1209-1231 ◽

Cited By ~ 16

Author(s):

Rich Mooi

Keyword(s):

Evolutionary History ◽

Large Scale ◽

Body Wall ◽

Developmental Trajectories ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Primary Target ◽

The Past ◽

Broad Scale

The fossil record of the Echinodermata is relatively complete, and is represented by specimens retaining an abundance of features comparable to that found in extant forms. This yields a half-billion-year record of evolutionary novelties unmatched in any other major group, making the Echinodermata a primary target for studies of biological change. Not all of this change can be understood by studying the rocks alone, leading to synthetic research programs. Study of literature from the past 20 years indicates that over 1400 papers on echinoderm paleontology appeared in that time, and that overall productivity has remained almost constant. Analysis of papers appearing since 1990 shows that research is driven by new finds including, but not restricted to, possible Precambrian echinoderms, bizarre new edrioasteroids, early crinoids, exquisitely preserved homalozoans, echinoids at the K-T boundary, and Antarctic echinoids, stelleroids, and crinoids. New interpretations of echinoderm body wall homologies, broad-scale syntheses of embryological information, the study of developmental trajectories through molecular markers, and the large-scale ecological and phenotypic shifts being explored through morphometry and analyses of large data sets are integrated with study of the fossils themselves. Therefore, recent advances reveal a remarkable and continuing synergistic expansion in our understanding of echinoderm evolutionary history.

Download Full-text

Performance Optimization System for Hadoop and Spark Frameworks

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0056 ◽

2020 ◽

Vol 20 (6) ◽

pp. 5-17

Author(s):

Hrachya Astsatryan ◽

Aram Kocharyan ◽

Daniel Hagimont ◽

Arthur Lalayan

Keyword(s):

Performance Optimization ◽

Large Scale ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Apache Hadoop ◽

Compression Factor ◽

Large Scale Data ◽

Additional Processing ◽

Mapreduce Model

AbstractThe optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Download Full-text

Large-scale sequence comparisons with sourmash

10.1101/687285 ◽

2019 ◽

Author(s):

N. Tessa Pierce ◽

Luiz Irber ◽

Taylor Reiter ◽

Phillip Brooks ◽

C. Titus Brown

Keyword(s):

Software Package ◽

Large Scale ◽

Sequence Similarity ◽

Protein Sequences ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Sequence Comparisons ◽

Large Databases ◽

Scale Sequence

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

Download Full-text