scholarly journals Repair of Voids in Multi-Labeled Triangular Mesh

2021 ◽  
Vol 11 (19) ◽  
pp. 9275
Author(s):  
Deyun Zhong ◽  
Benyu Li ◽  
Tiandong Shi ◽  
Zhaopeng Li ◽  
Liguan Wang ◽  
...  

In this paper, we propose a novel mesh repairing method for repairing voids from several meshes to ensure a desired topological correctness. The input to our method is several closed and manifold meshes without labels. The basic idea of the method is to search for and repair voids based on a multi-labeled mesh data structure and the idea of graph theory. We propose the judgment rules of voids between the input meshes and the method of void repairing based on the specified model priorities. It consists of three steps: (a) converting the input meshes into a multi-labeled mesh; (b) searching for quasi-voids using the breadth-first searching algorithm and determining true voids via the judgment rules of voids; (c) repairing voids by modifying mesh labels. The method can repair the voids accurately and only few invalid triangular facets are removed. In general, the method can repair meshes with one hundred thousand facets in approximately one second on very modest hardware. Moreover, it can be easily extended to process large-scale polygon models with millions of polygons. The experimental results of several data sets show the reliability and performance of the void repairing method based on the multi-labeled triangular mesh.

2014 ◽  
Vol 571-572 ◽  
pp. 497-501 ◽  
Author(s):  
Qi Lv ◽  
Wei Xie

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.


2020 ◽  
Vol 223 (3) ◽  
pp. 1837-1863
Author(s):  
M C Manassero ◽  
J C Afonso ◽  
F Zyserman ◽  
S Zlotnik ◽  
I Fomin

SUMMARY Simulation-based probabilistic inversions of 3-D magnetotelluric (MT) data are arguably the best option to deal with the nonlinearity and non-uniqueness of the MT problem. However, the computational cost associated with the modelling of 3-D MT data has so far precluded the community from adopting and/or pursuing full probabilistic inversions of large MT data sets. In this contribution, we present a novel and general inversion framework, driven by Markov Chain Monte Carlo (MCMC) algorithms, which combines (i) an efficient parallel-in-parallel structure to solve the 3-D forward problem, (ii) a reduced order technique to create fast and accurate surrogate models of the forward problem and (iii) adaptive strategies for both the MCMC algorithm and the surrogate model. In particular, and contrary to traditional implementations, the adaptation of the surrogate is integrated into the MCMC inversion. This circumvents the need of costly offline stages to build the surrogate and further increases the overall efficiency of the method. We demonstrate the feasibility and performance of our approach to invert for large-scale conductivity structures with two numerical examples using different parametrizations and dimensionalities. In both cases, we report staggering gains in computational efficiency compared to traditional MCMC implementations. Our method finally removes the main bottleneck of probabilistic inversions of 3-D MT data and opens up new opportunities for both stand-alone MT inversions and multi-observable joint inversions for the physical state of the Earth’s interior.


2020 ◽  
Vol 496 (1) ◽  
pp. 629-637
Author(s):  
Ce Yu ◽  
Kun Li ◽  
Shanjiang Tang ◽  
Chao Sun ◽  
Bin Ma ◽  
...  

ABSTRACT Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analysing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or data bases, match each item to determine which object it belongs to, and finally produce time series data sets. To support the high-performance parallel processing of large-scale data sets, AstroCatR uses the extract-transform-load (ETL) pre-processing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3× faster than methods using relational data base management systems at matching massive catalogues.


2020 ◽  
Author(s):  
Axel Lauer ◽  
Fernando Iglesias-Suarez ◽  
Veronika Eyring ◽  
the ESMValTool development team

<p>The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest version (2.0) of the ESMValTool has been developed as a large community effort to specifically target the increased data volume of the Coupled Model Intercomparison Project Phase 6 (CMIP6) and the related challenges posed by analysis and evaluation of output from multiple high-resolution and complex ESMs. For this, the core functionalities have been completely rewritten in order to take advantage of state-of-the-art computational libraries and methods to allow for efficient and user-friendly data processing. Common operations on the input data such as regridding or computation of multi-model statistics are now centralized in a highly optimized preprocessor written in Python. The diagnostic part of the ESMValTool includes a large collection of standard recipes for reproducing peer-reviewed analyses of many variables across atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation introduces the diagnostics newly implemented into ESMValTool v2.0 including an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of ESMs, new diagnostics for extreme events, regional model and impact evaluation and analysis of ESMs, as well as diagnostics for emergent constraints and analysis of future projections from ESMs. The new diagnostics are illustrated with examples using results from the well-established CMIP5 and the newly available CMIP6 data sets.</p>


2021 ◽  
Author(s):  
Murtadha Al-Habib ◽  
Yasser Al-Ghamdi

Abstract Extensive computing resources are required to leverage todays advanced geoscience workflows that are used to explore and characterize giant petroleum resources. In these cases, high-performance workstations are often unable to adequately handle the scale of computing required. The workflows typically utilize complex and massive data sets, which require advanced computing resources to store, process, manage, and visualize various forms of the data throughout the various lifecycles. This work describes a large-scale geoscience end-to-end interpretation platform customized to run on a cluster-based remote visualization environment. A team of computing infrastructure and geoscience workflow experts was established to collaborate on the deployment, which was broken down into separate phases. Initially, an evaluation and analysis phase was conducted to analyze computing requirements and assess potential solutions. A testing environment was then designed, implemented and benchmarked. The third phase used the test environment to determine the scale of infrastructure required for the production environment. Finally, the full-scale customized production environment was deployed for end users. During testing phase, aspects such as connectivity, stability, interactivity, functionality, and performance were investigated using the largest available geoscience datasets. Multiple computing configurations were benchmarked until optimal performance was achieved, under applicable corporate information security guidelines. It was observed that the customized production environment was able to execute workflows that were unable to run on local user workstations. For example, while conducting connectivity, stability and interactivity benchmarking, the test environment was operated for extended periods to ensure stability for workflows that require multiple days to run. To estimate the scale of the required production environment, varying categories of users’ portfolio were determined based on data type, scale and workflow. Continuous monitoring of system resources and utilization enabled continuous improvements to the final solution. The utilization of a fit-for-purpose, customized remote visualization solution may reduce or ultimately eliminate the need to deploy high-end workstations to all end users. Rather, a shared, scalable and reliable cluster-based solution can serve a much larger user community in a highly performant manner.


In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.


2014 ◽  
Author(s):  
R Daniel Kortschak ◽  
David L Adelson

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.


2020 ◽  
Author(s):  
Yannick Spreen ◽  
Maximilian Miller

Motivation: The applicability and reproducibility of bioinformatics methods and results often depend on the structure and software architecture of their development. Exponentially growing data sets require ever more optimization and performance with conventional computing capacities lacking this process. This creates a large overhead for software development in a research area which is primarily interested in solving complex biological problems rather than developing new, performant software solutions. In pure computer science, new structures in the field of web development have produced more efficient processes for container-based software solutions. The advantages of these structures have rarely been explored in a broader scientific scale. This is also the case with the trend of migrating computations from on premise resources to the cloud. Results: We created Bio-Node, a new platform for large scale bio data analysis utilizing cloud compute resources (publicly available at https://bio-node.de). Bio-Node enables building complex workflows using a sophisticated web interface. We applied Bio-Node to implement bioinformatic workflows for rapid metagenome function annotation. We further developed "Auto-Clustering", a workflow that automatically extracts the most suited clustering parameters for specific data types and subsequently enables to optimally segregate unknown samples of the same type. Compared to existing methods and approaches Bio-Node improves performance and costs of bioinformatics data analyses while providing an easier and faster development process with focus on reproducibility and reusability.


2013 ◽  
Vol 9 (4) ◽  
pp. 19-43 ◽  
Author(s):  
Bo Hu ◽  
Nuno Carvalho ◽  
Takahide Matsutsuka

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.


Sign in / Sign up

Export Citation Format

Share Document