scholarly journals MeteoIO 2.4.2: a preprocessing library for meteorological data

2014 ◽  
Vol 7 (3) ◽  
pp. 3595-3645 ◽  
Author(s):  
M. Bavay ◽  
T. Egger

Abstract. Using numerical models which require large meteorological data sets is sometimes difficult and problems can often be traced back to the Input/Output functionality. Complex models are usually developed by the environmental sciences community with a focus on the core modelling issues. As a consequence, the I/O routines that are costly to properly implement are often error-prone, lacking flexibility and robustness. With the increasing use of such models in operational applications, this situation ceases to be simply uncomfortable and becomes a major issue. The MeteoIO library has been designed for the specific needs of numerical models that require meteorological data. The whole task of data preprocessing has been delegated to this library, namely retrieving, filtering and resampling the data if necessary as well as providing spatial interpolations and parametrizations. The focus has been to design an Application Programming Interface (API) that (i) provides a uniform interface to meteorological data in the models; (ii) hides the complexity of the processing taking place; and (iii) guarantees a robust behaviour in case of format errors, erroneous or missing data. Moreover, in an operational context, this error handling should avoid unnecessary interruptions in the simulation process. A strong emphasis has been put on simplicity and modularity in order to make it extremely easy to support new data formats or protocols and to allow contributors with diverse backgrounds to participate. This library can also be used in the context of High Performance Computing in a parallel environment. Finally, it is released under an Open Source license and is available at http://models.slf.ch/p/meteoio. This paper gives an overview of the MeteoIO library from the point of view of conceptual design, architecture, features and computational performance. A scientific evaluation of the produced results is not given here since the scientific algorithms that are used have already been published elsewhere.

2014 ◽  
Vol 7 (6) ◽  
pp. 3135-3151 ◽  
Author(s):  
M. Bavay ◽  
T. Egger

Abstract. Using numerical models which require large meteorological data sets is sometimes difficult and problems can often be traced back to the Input/Output functionality. Complex models are usually developed by the environmental sciences community with a focus on the core modelling issues. As a consequence, the I/O routines that are costly to properly implement are often error-prone, lacking flexibility and robustness. With the increasing use of such models in operational applications, this situation ceases to be simply uncomfortable and becomes a major issue. The MeteoIO library has been designed for the specific needs of numerical models that require meteorological data. The whole task of data preprocessing has been delegated to this library, namely retrieving, filtering and resampling the data if necessary as well as providing spatial interpolations and parameterizations. The focus has been to design an Application Programming Interface (API) that (i) provides a uniform interface to meteorological data in the models, (ii) hides the complexity of the processing taking place, and (iii) guarantees a robust behaviour in the case of format errors, erroneous or missing data. Moreover, in an operational context, this error handling should avoid unnecessary interruptions in the simulation process. A strong emphasis has been put on simplicity and modularity in order to make it extremely easy to support new data formats or protocols and to allow contributors with diverse backgrounds to participate. This library is also regularly evaluated for computing performance and further optimized where necessary. Finally, it is released under an Open Source license and is available at http://models.slf.ch/p/meteoio. This paper gives an overview of the MeteoIO library from the point of view of conceptual design, architecture, features and computational performance. A scientific evaluation of the produced results is not given here since the scientific algorithms that are used have already been published elsewhere.


2018 ◽  
Author(s):  
John M Macdonald ◽  
Christopher M Lalansingh ◽  
Christopher I Cooper ◽  
Anqi Yang ◽  
Felix Lam ◽  
...  

AbstractBackgroundMost biocomputing pipelines are run on clusters of computers. Each type of cluster has its own API (application programming interface). That API defines how a program that is to run on the cluster must request the submission, content and monitoring of jobs to be run on the cluster. Sometimes, it is desirable to run the same pipeline on different types of cluster. This can happen in situations including when:different labs are collaborating, but they do not use the same type of clustera pipeline is released to other labs as open source or commercial softwarea lab has access to multiple types of cluster, and wants to choose between them for scaling, cost or other purposesa lab is migrating their infrastructure from one cluster type to anotherduring testing or travelling, it is often desired to run on a single computerHowever, since each type of cluster has its own API, code that runs jobs on one type of cluster needs to be re-written if it is desired to run that application on a different type of cluster. To resolve this problem, we created a software module to generalize the submission of pipelines across computing environments, including local compute, clouds and clusters.ResultsHPCI (High Performance Computing Interface) is a Perl module that provides the interface to a standardized generic cluster.When the HPCI module is used, it accepts a parameter to specify the cluster type. The HPCI module uses this to load a driver HPCD∷<cluster>. This is used to translate the abstract HPCI interface to the specific software interface.Simply by changing the cluster parameter, the same pipeline can be run on a different type of cluster with no other changes.ConclusionThe HPCI module assists in writing Perl programs that can be run in different lab environments, with different site configuration requirements and different types of hardware clusters. Rather than having to re-write portions of the program, it is only necessary to change a configuration file.Using HPCI, an application can manage collections of jobs to be runs, specify ordering dependencies, detect success or failure of jobs run and allow automatic retry of failed jobs (allowing for the possibility of a changed configuration such as when the original attempt specified an inadequate memory allotment).


Author(s):  
Diandian Zhang ◽  
Han Zhang ◽  
Jeronimo Castrillon ◽  
Torsten Kempf ◽  
Bart Vanthournout ◽  
...  

Efficient runtime resource management in multi-processor systems-on-chip (MPSoCs) for achieving high performance and low energy consumption is one of the key challenges for system designers. OSIP, an operating system application-specific instruction-set processor, together with its well-defined programming model, provides a promising solution. It delivers high computational performance to deal with dynamic task scheduling and mapping. Being programmable, it can easily be adapted to different systems. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, the authors highlight the vital importance of the communication architecture for OSIP-based systems and optimize the communication architecture. Furthermore, the effects of OSIP and the communication architecture are investigated jointly from the system point of view, based on a broad case study for a real life application (H.264) and a synthetic benchmark application.


2002 ◽  
Author(s):  
Christopher J. Freitas

The development of commodity-off-the-shelf computer hardware components has allowed for the trend in high performance computing away from computer-system vendor proprietary hardware. A Beowulf computer system is a high performance computer assembled from commodity-off-the-shelf hardware and uses application programming interface libraries and open source operating systems to create a unified computing environment. In this paper, a Beowulf computer system is described and a performance benchmarking exercise is presented. The simulation is a benchmark problem relevant to hydrocode simulations and specifically simulates the high-speed impact and penetration of a long rod. Through this simulation study and a comparison to similar simulations performed on other computer systems, the price/performance advantage of a Beowulf system is demonstrated.


2018 ◽  
Vol 2 ◽  
pp. e25828
Author(s):  
Chihjen Ko ◽  
Lex Wang

Herbaria in Taiwan face critical data challenges: Different taxonomic views prevent data exchange; There is a lack of development practices to keep up with standard and technological advances; Data is disconnected from researchers’ perspective, thus it is difficult to demonstrate the value of taxonomists’ activities, even though a few herbaria have their specimen catalogue partially exposed in Darwin Core. Different taxonomic views prevent data exchange; There is a lack of development practices to keep up with standard and technological advances; Data is disconnected from researchers’ perspective, thus it is difficult to demonstrate the value of taxonomists’ activities, even though a few herbaria have their specimen catalogue partially exposed in Darwin Core. In consultation with the Herbarium of the Taiwan Forestry Research Institute (TAIF), the Herbarium of the National Taiwan University (TAI) and the Herbarium of the Biodiversity Research Center, Academia Sinica (HAST), which together host most important collections of the vegetation on the island, we have planned the following activities to address data challenges: Investigate a new data model for scientific names that will accommodate different taxonomic views and create a web service for access to taxonomic data; Refactor existing herbarium systems to utilize the aforementioned service so the three herbaria can share and maintain a standardized name database; Create a layer of Application Programming Interface (API) to allow multiple types of accessing devices; Conduct behavioral research regarding various personas engaged in the curatorial workflow; Create a unified front-end that supports data management, data discovery, and data analysis activities with user experience improvements. Investigate a new data model for scientific names that will accommodate different taxonomic views and create a web service for access to taxonomic data; Refactor existing herbarium systems to utilize the aforementioned service so the three herbaria can share and maintain a standardized name database; Create a layer of Application Programming Interface (API) to allow multiple types of accessing devices; Conduct behavioral research regarding various personas engaged in the curatorial workflow; Create a unified front-end that supports data management, data discovery, and data analysis activities with user experience improvements. To manage these developments at various levels, while maximizing the contribution of participating parties, it is crucial to use a proven methodological framework. As the creative industry has been leading in the area of solution development, the concept of design thinking and design thinking process (Brown and Katz 2009) has come to our radar. Design thinking is a systematic approach to handling problems and generating new opportunities (Pal 2016). From requirement capture to actual implementation, it helps consolidate ideas and identify agreed-on key priorities by constantly iterating through a series of interactive divergence and convergence steps, namely the following: Empathize: A divergent step. We learn about our audience, which in this case includes curators and visitors of the herbarium systems, about what they do and how they interact with the system, and collate our findings. Define: A convergent step. We construct a point of view based on audience needs. Ideate: A divergent step. We brainstorm and come up with creative solutions, which might be novel or based on existing practice. Prototype: A convergent step. We build representations of the chosen idea from the previous step. Test: Use the prototype to test whether the idea works. Then refine from step 3 if problems were with the prototyping, or even step 1, if the point of view needs to be revisited. Empathize: A divergent step. We learn about our audience, which in this case includes curators and visitors of the herbarium systems, about what they do and how they interact with the system, and collate our findings. Define: A convergent step. We construct a point of view based on audience needs. Ideate: A divergent step. We brainstorm and come up with creative solutions, which might be novel or based on existing practice. Prototype: A convergent step. We build representations of the chosen idea from the previous step. Test: Use the prototype to test whether the idea works. Then refine from step 3 if problems were with the prototyping, or even step 1, if the point of view needs to be revisited. The benefits by adapting to this process are: Instead of “design for you”, we “design together”, which strengthens the sense of community and helps the communication of what the revision and refactoring will achieve; When put in context, increased awareness and understanding of biodiversity data standards, such as Darwin Core (DwC) and Access to Biological Collections Data (ABCD); As we lend the responsibility of process control to an external facilitator, we are able to focus during each step as a participant. Instead of “design for you”, we “design together”, which strengthens the sense of community and helps the communication of what the revision and refactoring will achieve; When put in context, increased awareness and understanding of biodiversity data standards, such as Darwin Core (DwC) and Access to Biological Collections Data (ABCD); As we lend the responsibility of process control to an external facilitator, we are able to focus during each step as a participant. We illustrate how the planned activities are conducted by the five iterative steps.


Author(s):  
Louis Nashih Uluwan Arif ◽  
Ali Ridho Barakbah ◽  
Amang Sudarsono ◽  
Renovita Edelani

Indonesia is a country that has the highest level of earthquake risk in the world. In the past 10 years, there have been ± 90,000 earthquake events recorded and always increasing along with the explosion of earthquake data occurs at any time. The process of collecting and analyzing earthquake data requires more effort and takes a long computational time. In this paper, we propose a new system to acquire, store, manage and process earthquake data in Indonesia in real-time, fast and dynamic by utilizing features in the Big Data Environment. This system improves computational performance in the process of managing and analyzing earthquake data in Indonesia by combining and integrating earthquake data from several providers to form a complete unity of earthquake data. An additional function is the existence of an API (Application Programming Interface) embedded in this system to provide access to the results of earthquake data analysis such as density, probability density function and seismic data association between provinces in Indonesia. The process in this system has been carried out in parallel and improved computing performance. This is evidenced by the computational time in the preprocessing process on a single-core master node, which requires 55.6 minutes, but a distributed computing process using 15 cores can speeds up with only 4.82 minutes.


Sign in / Sign up

Export Citation Format

Share Document