Adapting Reproducible Research Capabilities to Resilient Distributed Calculations

Author(s):  
Manuel Rodríguez-Pascual ◽  
Christos Kanellopoulos ◽  
Antonio Juan Rubio-Montero ◽  
Diego Darriba ◽  
Ognjen Prnjat ◽  
...  

Nowadays, computing calculations are becoming more and more demanding due to the huge pool of resources available. This demand must be satisfied in terms of computational efficiency and resilience, which is compromised in distributed and heterogeneous platforms. Not only this, data obtained are often either reused by other researchers or recalculated. In this work, a set of tools to overcome the problem of creating and executing fault tolerant distributed applications on dynamic environments is presented. Such a set also ensures the reproducibility of the performed experiments providing a portable, unattended and resilient framework that encapsulates the infrastructure-dependent operations away from the application developers and users, allowing performing experiments based on Open Access data repositories. In this way, users can seamlessly search and lately access datasets that can be automatically retrieved as input data into a code already integrated in the proposed workflow. Such a search is based on metadata standards and relies on Persistent Identifiers (PID) to assign specific repositories. The applications profit from Distributed Toolbox, a framework devoted to the creation and execution of distributed applications and includes tools for unattended cluster and grid execution, where a total fault tolerance is provided. By decoupling the definition of the remote tasks from its execution and control, the development, execution and maintenance of distributed applications is significantly simplified with respect to previous solutions, increasing their robustness and allowing running them on different computational platforms with little effort. The integration with Open Access databases and employment of PIDs for long-lasting references ensures that the data related to the experiments will persist, closing a complete research circle of data access/processing/storage/dissemination of results.

2020 ◽  
Vol 23 (1) ◽  
Author(s):  
Eder Ávila-Barrientos

El objetivo de este trabajo consiste en analizar los principios teórico-metodológicos relacionados con la descripción de los datos de investigación. Se realizó un análisis sobre el estado de la cuestión de los datos de investigación, en cual se abordan aspectos de su citación, descripción y sistematización. Se identificaron y analizaron los elementos de metadatos para la descripción de conjuntos de datos de investigación que se incluyen en el DataCite Metadata Schema, con el propósito de crear una propuesta de perfil descriptivo aplicable a estos conjuntos. Se estima que, si los datos de investigación se encuentran debidamente descritos, entonces se fomentará en mayor grado su accesibilidad y reutilización. Para ello, es necesario que las instituciones académicas y de investigación participen en la generación de políticas de acceso abierto a sus datos de investigación. The objective of this work is to analyze the theoretical-methodological principles related to the description and accessibility of research data. Hermeneutics and discourse analysis were applied to literature specialized in: research data; access and description of research data; data repositories. Metadata elements for the description of research datasets that are included in the DataCite Metadata Schema were identified and analyzed, in order to create a descriptive profile proposal for research data sets, which can be applied in the data repositories. If the research data is properly described, then its accessibility and reuse will be further promoted. To do this, it is necessary for academic and research institutions to participate in the generation of open access policies for their research data.


2010 ◽  
Vol 7 (3) ◽  
Author(s):  
Rasmus H. Fogh ◽  
Wayne Boucher ◽  
John M.C. Ionides ◽  
Wim F. Vranken ◽  
Tim J. Stevens ◽  
...  

SummaryIn recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces - APIs - currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.


2020 ◽  
Author(s):  
Isra Revenia

This article is made to know the destinantion and the administrasi functions of the school in order to assist the leader of an organazation in making decisions and doing the right thing, recording of such statements in addition to the information needs also pertains to the function of accountabilitty and control functions. Administrative administration is the activity of recording for everything that happens in the organization to be used as information for leaders. While the definition of administration is all processing activities that start from collecting (receiving), recording, processing, duplicating, minimizing and storing all the information of correspondence needed by the organization. Administration is as an activity to determine everything that happens in the organization, to be used as material for information by the leadership, which includes all activities ranging from manufacturing, managing, structuring to all the preparation of information needed by the organization.


1994 ◽  
Vol 30 (1) ◽  
pp. 167-175
Author(s):  
Alan H. Vicory ◽  
Peter A. Tennant

With the attainment of secondary treatment by virtually all municipal discharges in the United States, control of water pollution from combined sewer overflows (CSOs) has assumed a high priority. Accordingly, a national strategy was issued in 1989 which, in 1993, was expanded into a national policy on CSO control. The national policy establishes as an objective the attainment of receiving water quality standards, rather than a design storm/treatment technology based approach. A significant percentage of the CSOs in the U.S. are located along the Ohio River. The states along the Ohio have decided to coordinate their CSO control efforts through the Ohio River Valley Water Sanitation Commission (ORSANCO). With the Commission assigned the responsibility of developing a monitoring approach which would allow the definition of CSO impacts on the Ohio, research by the Commission found that very little information existed on the monitoring and assessment of large rivers for the determination of CSO impacts. It was therefore necessary to develop a strategy for coordinated efforts by the states, the CSO dischargers, and ORSANCO to identify and apply appropriate monitoring approaches. A workshop was held in June 1993 to receive input from a variety of experts. Taking into account this input, a strategy has been developed which sets forth certain approaches and concepts to be considered in assessing CSO impacts. In addition, the strategy calls for frequent sharing of findings in order that the data collection efforts by the several agencies can be mutually supportive and lead to technically sound answers regarding CSO impacts and control needs.


1996 ◽  
Vol 118 (3) ◽  
pp. 482-488 ◽  
Author(s):  
Sergio Bittanti ◽  
Fabrizio Lorito ◽  
Silvia Strada

In this paper, Linear Quadratic (LQ) optimal control concepts are applied for the active control of vibrations in helicopters. The study is based on an identified dynamic model of the rotor. The vibration effect is captured by suitably augmenting the state vector of the rotor model. Then, Kalman filtering concepts can be used to obtain a real-time estimate of the vibration, which is then fed back to form a suitable compensation signal. This design rationale is derived here starting from a rigorous problem position in an optimal control context. Among other things, this calls for a suitable definition of the performance index, of nonstandard type. The application of these ideas to a test helicopter, by means of computer simulations, shows good performances both in terms of disturbance rejection effectiveness and control effort limitation. The performance of the obtained controller is compared with the one achievable by the so called Higher Harmonic Control (HHC) approach, well known within the helicopter community.


Author(s):  
Mathias Stefan Roeser ◽  
Nicolas Fezans

AbstractA flight test campaign for system identification is a costly and time-consuming task. Models derived from wind tunnel experiments and CFD calculations must be validated and/or updated with flight data to match the real aircraft stability and control characteristics. Classical maneuvers for system identification are mostly one-surface-at-a-time inputs and need to be performed several times at each flight condition. Various methods for defining very rich multi-axis maneuvers, for instance based on multisine/sum of sines signals, already exist. A new design method based on the wavelet transform allowing the definition of multi-axis inputs in the time-frequency domain has been developed. The compact representation chosen allows the user to define fairly complex maneuvers with very few parameters. This method is demonstrated using simulated flight test data from a high-quality Airbus A320 dynamic model. System identification is then performed with this data, and the results show that aerodynamic parameters can still be accurately estimated from these fairly simple multi-axis maneuvers.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


2009 ◽  
Vol 27 (24) ◽  
pp. 4014-4020 ◽  
Author(s):  
Elizabeth Goss ◽  
Michael P. Link ◽  
Suanna S. Bruinooge ◽  
Theodore S. Lawrence ◽  
Joel E. Tepper ◽  
...  

Purpose The American Society of Clinical Oncology (ASCO) Cancer Research Committee designed a qualitative research project to assess the attitudes of cancer researchers and compliance officials regarding compliance with the US Privacy Rule and to identify potential strategies for eliminating perceived or real barriers to achieving compliance. Methods A team of three interviewers asked 27 individuals (13 investigators and 14 compliance officials) from 13 institutions to describe the anticipated approach of their institutions to Privacy Rule compliance in three hypothetical research studies. Results The interviews revealed that although researchers and compliance officials share the view that patients' cancer diagnoses should enjoy a high level of privacy protection, there are significant tensions between the two groups related to the proper standards for compliance necessary to protect patients. The disagreements are seen most clearly with regard to the appropriate definition of a “future research use” of protected health information in biospecimen and data repositories and the standards for a waiver of authorization for disclosure and use of such data. Conclusion ASCO believes that disagreements related to compliance and the resulting delays in certain projects and abandonment of others might be eased by additional institutional training programs and consultation on Privacy Rule issues during study design. ASCO also proposes the development of best practices documents to guide 1) creation of data repositories, 2) disclosure and use of data from such repositories, and 3) the design of survivorship and genetics studies.


2021 ◽  
pp. 43-58
Author(s):  
S. S. Yudachev ◽  
P. A. Monakhov ◽  
N. A. Gordienko

This article describes an attempt to create open source LabVIEW software, equivalent to data collection and control software. The proposed solution uses GNU Radio, OpenCV, Scilab, Xcos, and Comedi in Linux. GNU Radio provides a user-friendly graphical interface. Also, GNU Radio is a software-defined radio that conducts experiments in practice using software rather than the usual hardware implementation. Blocks for data propagation, code deletion with and without code tracking are created using the zero correlation zone code (ZCZ, a combination of ternary codes equal to 1, 0, and –1, which is specified in the program). Unlike MATLAB Simulink, GNU Radio is open source, i. e. free, and the concepts can be easily accessed by ordinary people without much programming experience using pre-written blocks. Calculations can be performed using OpenCV or Scilab and Xcos. Xcos is an application that is part of the Scilab mathematical modeling system, and it provides developers with the ability to design systems in the field of mechanics, hydraulics and electronics, as well as queuing systems. Xcos is a graphical interactive environment based on block modeling. The application is designed to solve problems of dynamic and situational modeling of systems, processes, devices, as well as testing and analyzing these systems. In this case, the modeled object (a system, device or process) is represented graphically by its functional parametric block diagram, which includes blocks of system elements and connections between them. The device drivers listed in Comedi are used for real-time data access. We also present an improved PyGTK-based graphical user interface for GNU Radio. English version of the article is available at URL: https://panor.ru/articles/industry-40-digital-technology-for-data-collection-and-management/65216.html


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-22
Author(s):  
Haoran Li ◽  
Chenyang Lu ◽  
Christopher D. Gill

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.


Sign in / Sign up

Export Citation Format

Share Document