VAPOR: A Visualization Package Tailored to Analyze Simulation Data in Earth System Science

Visualization is an essential tool for analysis of data and communication of findings in the sciences, and the Earth System Sciences (ESS) are no exception. However, within ESS, specialized visualization requirements and data models, particularly for those data arising from numerical models, often make general purpose visualization packages difficult, if not impossible, to use effectively. This paper presents VAPOR: a domain-specific visualization package that targets the specialized needs of ESS modelers, particularly those working in research settings where highly-interactive exploratory visualization is beneficial. We specifically describe VAPOR’s ability to handle ESS simulation data from a wide variety of numerical models, as well as a multi-resolution representation that enables interactive visualization on very large data while using only commodity computing resources. We also describe VAPOR’s visualization capabilities, paying particular attention to features for geo-referenced data and advanced rendering algorithms suitable for time-varying, 3D data. Finally, we illustrate VAPOR’s utility in the study of a numerically- simulated tornado. Our results demonstrate both ease-of-use and the rich capabilities of VAPOR in such a use case.

Download Full-text

VAPOR: A Visualization Package Tailored to Analyze Simulation Data in Earth System Science

10.20944/preprints201907.0280.v1 ◽

2019 ◽

Author(s):

Shaomeng Li ◽

Stanislaw Jaroszynski ◽

Scott Pearse ◽

Leigh Orf ◽

John Clyne

Keyword(s):

Numerical Models ◽

Large Data ◽

General Purpose ◽

Ease Of Use ◽

Earth System ◽

Simulation Data ◽

Earth System Science ◽

System Science ◽

3D Data ◽

The Rich

Visualization is an essential tool for analysis of data and communication of findings in the sciences, and the Earth System Science (ESS) are no exception. However, within ESS specialized visualization requirements and data models --- particularly for those data arising from numerical models --- often make general-purpose visualization packages difficult, if not impossible, to effectively use. This paper presents VAPOR: a domain-specific visualization package that targets the specialized needs of ESS modelers, particularly those working in research settings where highly interactive exploratory visualization is beneficial. We specifically describe VAPOR’s ability to handle ESS simulation data from a wide variety of numerical models, as well as a multi-resolution representation that enables interactive visualization on very large data while using only commodity computing resources. We also describe VAPOR’s visualization capabilities, paying particular attention to features for geo-referenced data and advanced rendering algorithms suitable for time-varying, 3D data. Finally, we illustrate VAPOR's utility in the study of a numerically simulated tornado. Our results demonstrate both ease-of-use and the rich capabilities of VAPOR in such a use case.

Download Full-text

Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification

Electronics ◽

10.3390/electronics10030270 ◽

2021 ◽

Vol 10 (3) ◽

pp. 270

Author(s):

Hanqian Wu ◽

Zhike Wang ◽

Feng Qing ◽

Shoushan Li

Keyword(s):

General Purpose ◽

Sentiment Classification ◽

Training Data ◽

Target Language ◽

Source Language ◽

Domain Specific ◽

Novel Approach ◽

The Rich ◽

Target Languages ◽

Cross Lingual

Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, we propose an approach for performing a Cross-Lingual Aspect Sentiment Classification (CLASC) task which leverages the rich resources in one language (source language) for aspect sentiment classification in a under-resourced language (target language). Specifically, we first build a bilingual lexicon for domain-specific training data to translate the aspect category annotated in the source-language corpus and then translate sentences from the source language to the target language via Machine Translation (MT) tools. However, most MT systems are general-purpose, it non-avoidably introduces translation ambiguities which would degrade the performance of CLASC. In this context, we propose a novel approach called Reinforced Transformer with Cross-Lingual Distillation (RTCLD) combined with target-sensitive adversarial learning to minimize the undesirable effects of translation ambiguities in sentence translation. We conduct experiments on different language combinations, treating English as the source language and Chinese, Russian, and Spanish as target languages. The experimental results show that our proposed approach outperforms the state-of-the-art methods on different target languages.

Download Full-text

Fengyun Meteorological Satellite Products for Earth System Science Applications

Advances in Atmospheric Sciences ◽

10.1007/s00376-021-0425-3 ◽

2021 ◽

Author(s):

Di Xian ◽

Peng Zhang ◽

Ling Gao ◽

Ruijing Sun ◽

Haizhen Zhang ◽

...

Keyword(s):

Data Sharing ◽

Satellite Data ◽

Prediction Models ◽

Weather Forecasting ◽

Numerical Models ◽

Weather Prediction ◽

Vegetation Indices ◽

Open Data ◽

Earth System ◽

Inversion Algorithm

AbstractFollowing the progress of satellite data assimilation in the 1990s, the combination of meteorological satellites and numerical models has changed the way scientists understand the earth. With the evolution of numerical weather prediction models and earth system models, meteorological satellites will play a more important role in earth sciences in the future. As part of the space-based infrastructure, the Fengyun (FY) meteorological satellites have contributed to earth science sustainability studies through an open data policy and stable data quality since the first launch of the FY-1A satellite in 1988. The capability of earth system monitoring was greatly enhanced after the second-generation polar orbiting FY-3 satellites and geostationary orbiting FY-4 satellites were developed. Meanwhile, the quality of the products generated from the FY-3 and FY-4 satellites is comparable to the well-known MODIS products. FY satellite data has been utilized broadly in weather forecasting, climate and climate change investigations, environmental disaster monitoring, etc. This article reviews the instruments mounted on the FY satellites. Sensor-dependent level 1 products (radiance data) and inversion algorithm-dependent level 2 products (geophysical parameters) are introduced. As an example, some typical geophysical parameters, such as wildfires, lightning, vegetation indices, aerosol products, soil moisture, and precipitation estimation have been demonstrated and validated by in-situ observations and other well-known satellite products. To help users access the FY products, a set of data sharing systems has been developed and operated. The newly developed data sharing system based on cloud technology has been illustrated to improve the efficiency of data delivery.

Download Full-text

Symbiotic general-purpose and domain-specific languages

2012 34th International Conference on Software Engineering (ICSE) ◽

10.1109/icse.2012.6227102 ◽

2012 ◽

Cited By ~ 7

Author(s):

Colin Atkinson ◽

Ralph Gerbig ◽

Bastian Kennel

Keyword(s):

General Purpose ◽

Domain Specific Languages ◽

Domain Specific

Download Full-text

ByShard

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476275 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2230-2243

Author(s):

Jelle Hellings ◽

Mohammad Sadoghi

Keyword(s):

Data Management ◽

Phase Locking ◽

Large Data ◽

General Purpose ◽

Management Systems ◽

Two Phase ◽

Data Management Systems ◽

Malicious Behavior ◽

Byzantine Failures ◽

Do So

The emergence of blockchains has fueled the development of resilient systems that can deal with Byzantine failures due to crashes, bugs, or even malicious behavior. Recently, we have also seen the exploration of sharding in these resilient systems, this to provide the scalability required by very large data-based applications. Unfortunately, current sharded resilient systems all use system-specific specialized approaches toward sharding that do not provide the flexibility of traditional sharded data management systems. To improve on this situation, we fundamentally look at the design of sharded resilient systems. We do so by introducing BYSHARD, a unifying framework for the study of sharded resilient systems. Within this framework, we show how two-phase commit and two-phase locking ---two techniques central to providing atomicity and isolation in traditional sharded databases---can be implemented efficiently in a Byzantine environment, this with a minimal usage of costly Byzantine resilient primitives. Based on these techniques, we propose eighteen multi-shard transaction processing protocols. Finally, we practically evaluate these protocols and show that each protocol supports high transaction throughput and provides scalability while each striking its own trade-off between throughput, isolation level, latency , and abort rate. As such, our work provides a strong foundation for the development of ACID-compliant general-purpose and flexible sharded resilient data management systems.

Download Full-text

Towards Matching of Domain-Specific Schemas Using General-Purpose External Background Knowledge

The Semantic Web: ESWC 2020 Satellite Events - Lecture Notes in Computer Science ◽

10.1007/978-3-030-62327-2_42 ◽

2020 ◽

pp. 270-279

Author(s):

Jan Philipp Portisch

Keyword(s):

Background Knowledge ◽

General Purpose ◽

Domain Specific

Download Full-text

Evaluating Word Similarity Measure of Embeddings Through Binary Classification

Journal of Computer Science Research ◽

10.30564/jcsr.v1i3.1268 ◽

2019 ◽

Vol 1 (3) ◽

Author(s):

A. Aziz Altowayan ◽

Lixin Tao

Keyword(s):

Similarity Measure ◽

Binary Classification ◽

General Purpose ◽

Feature Representation ◽

Entity Recognition ◽

Language Models ◽

Data Set ◽

Word Similarity ◽

Domain Specific ◽

Retrieval Rate

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.

Download Full-text

Implementing Visual Analytics Pipelines with Simulation Data

10.5772/intechopen.96152 ◽

2021 ◽

Author(s):

Taimur Khan ◽

Syed Samad Shakeel ◽

Afzal Gul ◽

Hamza Masud ◽

Achim Ebert

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Simulated Data ◽

Evaluation Study ◽

Ease Of Use ◽

Preliminary Evaluation ◽

Simulation Data ◽

Visual Data Analytics ◽

High Level ◽

Simulation Parameters

Visual analytics has been widely studied in the past decade both in academia and industry to improve data exploration, minimize the overall cost, and improve data analysis. In this chapter, we explore the idea of visual analytics in the context of simulation data. This would then provide us with the capability to not only explore our data visually but also to apply machine learning models in order to answer high-level questions with respect to scheduling, choosing optimal simulation parameters, finding correlations, etc. More specifically, we examine state-of-the-art tools to be able to perform these above-mentioned tasks. Further, to test and validate our methodology we followed the human-centered design process to build a prototype tool called ViDAS (Visual Data Analytics of Simulated Data). Our preliminary evaluation study illustrates the intuitiveness and ease-of-use of our approach with regards to visual analysis of simulated data.

Download Full-text

Domain-specific Evaluation Dataset Generator for Multilingual Text Analysis

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201912084 ◽

2019 ◽

pp. 140-147

Author(s):

Emrah Inan ◽

Vahab Mostafapour ◽

Fatif Tekbacak

Keyword(s):

Text Analysis ◽

General Purpose ◽

Entity Linking ◽

Named Entity ◽

Domain Specific ◽

Benchmark Datasets ◽

Concise Information ◽

Multilingual Text ◽

The Given ◽

Specific Evaluation

Web enables to retrieve concise information about specific entities including people, organizations, movies and their features. Additionally, large amount of Web resources generally lies on a unstructured form and it tackles to find critical information for specific entities. Text analysis approaches such as Named Entity Recognizer and Entity Linking aim to identify entities and link them to relevant entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific approaches due to lack of evaluation datasets for specific domains. This study presents WeDGeM that is a multilingual evaluation set generator for specific domains exploiting Wikipedia category pages and DBpedia hierarchy. Also, Wikipedia disambiguation pages are used to adjust the ambiguity level of the generated texts. Based on this generated test data, a use case for well-known Entity Linking systems supporting Turkish texts are evaluated in the movie domain.

Download Full-text

Bootstrap Domain-Specific Sentiment Classifiers from Unlabeled Corpora

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00020 ◽

2018 ◽

Vol 6 ◽

pp. 269-285 ◽

Cited By ~ 3

Author(s):

Andrius Mudinas ◽

Dell Zhang ◽

Mark Levene

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Two Phase ◽

Transductive Learning ◽

Domain Specific ◽

Sentiment Lexicon ◽

Supervised Learning Algorithms

There is often the need to perform sentiment classification in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classifier or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibility of building domain-specific sentiment classifiers with unlabeled documents only. Our investigation indicates that in the word embeddings learned from the unlabeled corpus of a given domain, the distributed word representations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Exploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific sentiment lexicon from just a few typical sentiment words (“seeds”). An important finding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophisticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment classification, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to assign positive/negative sentiment scores to unlabeled documents first, a nd t hen u ses those documents found to have clear sentiment signals as pseudo-labeled examples to train a document sentiment classifier v ia supervised learning algorithms (such as LSTM). On several benchmark datasets for document sentiment classification, our end-to-end pipelined approach which is overall unsupervised (except for a tiny set of seed words) outperforms existing unsupervised approaches and achieves an accuracy comparable to that of fully supervised approaches.

Download Full-text