data sets
Recently Published Documents





2022 ◽  
Vol 34 (3) ◽  
pp. 0-0

Network emerging e-commerce refers to the development of wireless broadband technology, smart terminal technology, near-field network, etc. as the driving force. It is the emerging e-commerce represented by the continuous development of modern e-commerce and the integration of commerce. This paper proposes to use Michael Porter’s cluster theory method, income increasing algorithm, and spatial Gini coefficient method to sort out and analyze the research results of industrial agglomeration problems, further study the relationship of e-commerce industry agglomeration mechanism, and build agglomeration simulation model , the construction of the centripetal force model of the industrial agglomeration area, through the analysis of the production factors of the e-commerce industry, and then study the influence of each factor on the development of the e-commerce industry. Finally, this paper selects and uses 16 standard mechanical data sets to investigate and analyze the agglomeration mechanism of the e-commerce industry, which verifies the accuracy and overall applicability of the method.

2022 ◽  
Vol 16 (2) ◽  
pp. 1-21
Michael Nelson ◽  
Sridhar Radhakrishnan ◽  
Chandra Sekharan ◽  
Amlan Chatterjee ◽  
Sudhindra Gopal Krishna

Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.

2022 ◽  
Vol 13 (1) ◽  
pp. 1-25
Fan Chen ◽  
Jiaoxiong Xia ◽  
Honghao Gao ◽  
Huahu Xu ◽  
Wei Wei

The management of public opinion and the use of big data monitoring to accurately judge and verify all kinds of information are valuable aspects in the enterprise management decision-making process. The sentiment analysis of reviews is a key decision-making tool for e-commerce development. Most existing review sentiment analysis methods involve sequential modeling but do not focus on the semantic relationships. However, Chinese semantics are different from English semantics in terms of the sentence structure. Irrelevant contextual words may be incorrectly identified as cues for sentiment prediction. The influence of the target words in reviews must be considered. Thus, this paper proposes the TRG-DAtt model for sentiment analysis based on target relational graph (TRG) and double attention network (DAtt) to analyze the emotional information to support decision making. First, dependency tree-based TRG is introduced to independently and fully mine the semantic relationships. We redefine and constrain the dependency and use it as the edges to connect the target and context words. Second, we design dependency graph attention network (DGAT) and interactive attention network (IAT) to form the DAtt and obtain the emotional features of the target words and reviews. DGAT models the dependency of the TRG by aggregating the semantic information. Next, the target emotional enhancement features obtained by the DGAT are input to the IAT. The influence of each target word on the review can be obtained through the interaction. Finally, the target emotional enhancement features are weighted by the impact factor to generate the review's emotional features. In this study, extensive experiments were conducted on the car and Meituan review data sets, which contain consumer reviews on cars and stores, respectively. The results demonstrate that the proposed model outperforms the existing models.

2023 ◽  
Vol 55 (1) ◽  
pp. 1-33
Fan Xu ◽  
Victor S. Sheng ◽  
Mingwen Wang

With the proliferation of social sensing, large amounts of observation are contributed by people or devices. However, these observations contain disinformation. Disinformation can propagate across online social networks at a relatively low cost, but result in a series of major problems in our society. In this survey, we provide a comprehensive overview of disinformation and truth discovery in social sensing under a unified perspective, including basic concepts and the taxonomy of existing methodologies. Furthermore, we summarize the mechanism of disinformation from four different perspectives (i.e., text only, text with image/multi-modal, text with propagation, and fusion models). In addition, we review existing solutions based on these requirements and compare their pros and cons and give a sort of guide to usage based on a detailed lesson learned. To facilitate future studies in this field, we summarize related publicly accessible real-world data sets and open source codes. Last but the most important, we emphasize potential future research topics and challenges in this domain through a deep analysis of most recent methods.

2022 ◽  
Vol 22 (2) ◽  
pp. 1-31
Monica Babeş-Vroman ◽  
Thuytien N. Nguyen ◽  
Thu D. Nguyen

With the number of jobs in computer occupations on the rise, there is a greater need for computer science (CS) graduates than ever. At the same time, most CS departments across the country are only seeing 25–30% of women students in their classes, meaning that we are failing to draw interest from a large portion of the population. In this work, we explore the gender gap in CS at Rutgers University–New Brunswick, a large public R1 research university, using three data sets that span thousands of students across six academic years. Specifically, we combine these data sets to study the gender gaps in four core CS courses and explore the correlation of several factors with retention and the impact of these factors on changes to the gender gap as students proceed through the CS courses toward completing the CS major. For example, we find that a significant percentage of women students taking the introductory CS1 course for majors do not intend to major in CS, which may be a contributing factor to a large increase in the gender gap immediately after CS1. This finding implies that part of the retention task is attracting these women students to further explore the major. Results from our study include both novel findings and findings that are consistent with known challenges for increasing gender diversity in CS. In both cases, we provide extensive quantitative data in support of the findings.

In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.

2022 ◽  
Vol 8 (1) ◽  
pp. 1-32
Sajid Hasan Apon ◽  
Mohammed Eunus Ali ◽  
Bishwamittra Ghosh ◽  
Timos Sellis

Social networks with location enabling technologies, also known as geo-social networks, allow users to share their location-specific activities and preferences through check-ins. A user in such a geo-social network can be attributed to an associated location (spatial), her preferences as keywords (textual), and the connectivity (social) with her friends. The fusion of social, spatial, and textual data of a large number of users in these networks provide an interesting insight for finding meaningful geo-social groups of users supporting many real-life applications, including activity planning and recommendation systems. In this article, we introduce a novel query, namely, Top- k Flexible Socio-Spatial Keyword-aware Group Query (SSKGQ), which finds the best k groups of varying sizes around different points of interest (POIs), where the groups are ranked based on the social and textual cohesiveness among members and spatial closeness with the corresponding POI and the number of members in the group. We develop an efficient approach to solve the SSKGQ problem based on our theoretical upper bounds on distance, social connectivity, and textual similarity. We prove that the SSKGQ problem is NP-Hard and provide an approximate solution based on our derived relaxed bounds, which run much faster than the exact approach by sacrificing the group quality slightly. Our extensive experiments on real data sets show the effectiveness of our approaches in different real-life settings.

2022 ◽  
Vol 55 (1) ◽  
Nie Zhao ◽  
Chunming Yang ◽  
Fenggang Bian ◽  
Daoyou Guo ◽  
Xiaoping Ouyang

In situ synchrotron small-angle X-ray scattering (SAXS) is a powerful tool for studying dynamic processes during material preparation and application. The processing and analysis of large data sets generated from in situ X-ray scattering experiments are often tedious and time consuming. However, data processing software for in situ experiments is relatively rare, especially for grazing-incidence small-angle X-ray scattering (GISAXS). This article presents an open-source software suite (SGTools) to perform data processing and analysis for SAXS and GISAXS experiments. The processing modules in this software include (i) raw data calibration and background correction; (ii) data reduction by multiple methods; (iii) animation generation and intensity mapping for in situ X-ray scattering experiments; and (iv) further data analysis for the sample with an order degree and interface correlation. This article provides the main features and framework of SGTools. The workflow of the software is also elucidated to allow users to develop new features. Three examples are demonstrated to illustrate the use of SGTools for dealing with SAXS and GISAXS data. Finally, the limitations and future features of the software are also discussed.

2022 ◽  
Jens-Erik Lund Snee ◽  
Elizabeth L. Miller

ABSTRACT The paleogeographic evolution of the western U.S. Great Basin from the Late Cretaceous to the Cenozoic is critical to understanding how the North American Cordillera at this latitude transitioned from Mesozoic shortening to Cenozoic extension. According to a widely applied model, Cenozoic extension was driven by collapse of elevated crust supported by crustal thicknesses that were potentially double the present ~30–35 km. This model is difficult to reconcile with more recent estimates of moderate regional extension (≤50%) and the discovery that most high-angle, Basin and Range faults slipped rapidly ca. 17 Ma, tens of millions of years after crustal thickening occurred. Here, we integrated new and existing geochronology and geologic mapping in the Elko area of northeast Nevada, one of the few places in the Great Basin with substantial exposures of Paleogene strata. We improved the age control for strata that have been targeted for studies of regional paleoelevation and paleoclimate across this critical time span. In addition, a regional compilation of the ages of material within a network of middle Cenozoic paleodrainages that developed across the Great Basin shows that the age of basal paleovalley fill decreases southward roughly synchronous with voluminous ignimbrite flareup volcanism that swept south across the region ca. 45–20 Ma. Integrating these data sets with the regional record of faulting, sedimentation, erosion, and magmatism, we suggest that volcanism was accompanied by an elevation increase that disrupted drainage systems and shifted the continental divide east into central Nevada from its Late Cretaceous location along the Sierra Nevada arc. The north-south Eocene–Oligocene drainage divide defined by mapping of paleovalleys may thus have evolved as a dynamic feature that propagated southward with magmatism. Despite some local faulting, the northern Great Basin became a vast, elevated volcanic tableland that persisted until dissection by Basin and Range faulting that began ca. 21–17 Ma. Based on this more detailed geologic framework, it is unlikely that Basin and Range extension was driven by Cretaceous crustal overthickening; rather, preexisting crustal structure was just one of several factors that that led to Basin and Range faulting after ca. 17 Ma—in addition to thermal weakening of the crust associated with Cenozoic magmatism, thermally supported elevation, and changing boundary conditions. Because these causal factors evolved long after crustal thickening ended, during final removal and fragmentation of the shallowly subducting Farallon slab, they are compatible with normal-thickness (~45–50 km) crust beneath the Great Basin prior to extension and do not require development of a strongly elevated, Altiplano-like region during Mesozoic shortening.

2022 ◽  
Vol 55 (1) ◽  
Ruth Birch ◽  
Thomas Benjamin Britton

Materials with an allotropic phase transformation can form microstructures where grains have orientation relationships determined by the transformation history. These microstructures influence the final material properties. In zirconium alloys, there is a solid-state body-centred cubic (b.c.c.) to hexagonal close-packed (h.c.p.) phase transformation, where the crystal orientations of the h.c.p. phase can be related to the parent b.c.c. structure via the Burgers orientation relationship (BOR). In the present work, a reconstruction code, developed for steels and which uses a Markov chain clustering algorithm to analyse electron backscatter diffraction maps, is adapted and applied to the h.c.p./b.c.c. BOR. This algorithm is released as open-source code (via github, as ParentBOR). The algorithm enables new post-processing of the original and reconstructed data sets to analyse the variants of the h.c.p. α phase that are present and understand shared crystal planes and shared lattice directions within each parent β grain; it is anticipated that this will assist in understanding the transformation-related deformation properties of the final microstructure. Finally, the ParentBOR code is compared with recently released reconstruction codes implemented in MTEX to reveal differences and similarities in how the microstructure is described.

Sign in / Sign up

Export Citation Format

Share Document