intermediate data
Recently Published Documents


TOTAL DOCUMENTS

196
(FIVE YEARS 68)

H-INDEX

14
(FIVE YEARS 2)

2022 ◽  
Vol 40 (4) ◽  
pp. 1-32
Author(s):  
Jinze Wang ◽  
Yongli Ren ◽  
Jie Li ◽  
Ke Deng

Factorization models have been successfully applied to the recommendation problems and have significant impact to both academia and industries in the field of Collaborative Filtering ( CF ). However, the intermediate data generated in factorization models’ decision making process (or training process , footprint ) have been overlooked even though they may provide rich information to further improve recommendations. In this article, we introduce the concept of Convergence Pattern, which records how ratings are learned step-by-step in factorization models in the field of CF. We show that the concept of Convergence Patternexists in both the model perspective (e.g., classical Matrix Factorization ( MF ) and deep-learning factorization) and the training (learning) perspective (e.g., stochastic gradient descent ( SGD ), alternating least squares ( ALS ), and Markov Chain Monte Carlo ( MCMC )). By utilizing the Convergence Pattern, we propose a prediction model to estimate the prediction reliability of missing ratings and then improve the quality of recommendations. Two applications have been investigated: (1) how to evaluate the reliability of predicted missing ratings and thus recommend those ratings with high reliability. (2) How to explore the estimated reliability to adjust the predicted ratings to further improve the predication accuracy. Extensive experiments have been conducted on several benchmark datasets on three recommendation tasks: decision-aware recommendation, rating predicted, and Top- N recommendation. The experiment results have verified the effectiveness of the proposed methods in various aspects.


2021 ◽  
Vol 6 (2 (114)) ◽  
pp. 6-18
Author(s):  
Serhii Semenov ◽  
Liqiang Zhang ◽  
Weiling Cao ◽  
Serhii Bulba ◽  
Vira Babenko ◽  
...  

This paper has determined the relevance of the issue related to improving the accuracy of the results of mathematical modeling of the software security testing process. The fuzzy GERT-modeling methods have been analyzed. The necessity and possibility of improving the accuracy of the results of mathematical formalization of the process of studying software vulnerabilities under the conditions of fuzziness of input and intermediate data have been determined. To this end, based on the mathematical apparatus of fuzzy network modeling, a fuzzy GERT model has been built for investigating software vulnerabilities. A distinctive feature of this model is to take into consideration the probabilistic characteristics of transitions from state to state along with time characteristics. As part of the simulation, the following stages of the study were performed. To schematically describe the procedures for studying software vulnerabilities, a structural model of this process has been constructed. A "reference GERT model" has been developed for investigating software vulnerabilities. The process was described in the form of a standard GERT network. The algorithm of equivalent transformations of the GERT network has been improved, which differs from known ones by considering the capabilities of the extended range of typical structures of parallel branches between neighboring nodes. Analytical expressions are presented to calculate the average time spent in the branches and the probability of successful completion of studies in each node. The calculation of these probabilistic-temporal characteristics has been carried out in accordance with data on the simplified equivalent fuzzy GERT network for the process of investigating software vulnerabilities. Comparative studies were conducted to confirm the accuracy and reliability of the results obtained. The results of the experiment showed that in comparison with the reference model, the fuzziness of the input characteristic of the time of conducting studies of software vulnerabilities was reduced, which made it possible to improve the accuracy of the simulation results.


Author(s):  
Yassine Sabri ◽  
Aouad Siham

Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based data indexing methods are used to optimally query and access RS images, RS data products, and intermediate data. The core of the cloud service provides a parallel file system of large RS data and an interface for accessing RS data from time to time to improve localization of the data. It collects data and optimizes I/O performance. Our experimental analysis demonstrated the effectiveness of our method platform.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hang Chen ◽  
Zhengjun Liu ◽  
Camel Tanougast ◽  
Walter Blondel

AbstractAn asymmetric cryptosystem is presented for encrypting multiple images in gyrator transform domains. In the encryption approach, the devil’s spiral Fresnel lens variable pure phase mask is first designed for each image band to be encrypted by using devil’ mask, random spiral phase and Fresnel mask, respectively. Subsequently, a novel random devil’ spiral Fresnel transform in optical gyrator transform is implemented to achieved the intermediate output. Then, the intermediate data is divided into two masks by employing random modulus decomposition in the asymmetric process. Finally, a random permutation matrix is utilized to obtain the ciphertext of the intact algorithm. For the decryption approach, two divided masks (private key and ciphertext) need to be imported into the optical gyrator input plane simultaneously. Some numerical experiments are given to verify the effectiveness and capability of this asymmetric cryptosystem.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David Froelicher ◽  
Juan R. Troncoso-Pastoriza ◽  
Jean Louis Raisaro ◽  
Michel A. Cuendet ◽  
Joao Sa Sousa ◽  
...  

AbstractUsing real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.


Author(s):  
Eric Sokol

Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice.


2021 ◽  
Vol 11 (18) ◽  
pp. 8476
Author(s):  
June Choi ◽  
Jaehyun Lee ◽  
Jik-Soo Kim ◽  
Jaehwan Lee

In this paper, we present several optimization strategies that can improve the overall performance of the distributed in-memory computing system, “Apache Spark”. Despite its distributed memory management capability for iterative jobs and intermediate data, Spark has a significant performance degradation problem when the available amount of main memory (DRAM, typically used for data caching) is limited. To address this problem, we leverage an SSD (solid-state drive) to supplement the lack of main memory bandwidth. Specifically, we present an effective optimization methodology for Apache Spark by collectively investigating the effects of changing the capacity fraction ratios of the shuffle and storage spaces in the “Spark JVM Heap Configuration” and applying different “RDD Caching Policies” (e.g., SSD-backed memory caching). Our extensive experimental results show that by utilizing the proposed optimization techniques, we can improve the overall performance by up to 42%.


2021 ◽  
Vol 12 (5) ◽  
pp. 267-273
Author(s):  
P. M. Nikolaev ◽  

The use of parallel computing tools can significantly reduce the execution time of calculations in many engineer­ing tasks. One of the main difficulties in the development of multithreaded programs remains the organization of simultaneous access from different threads to shared data. The most common solution to this problem is to use locking facilities when accessing shared data. There are a number of tasks where data sharing is not needed, but you need to synchronize access to a limited resource, such as a temporary buffer. In such tasks, there is no data exchange between different threads, but there is an object that at a given time can be used by the code of only one thread. One such task is calculating the value of a B-spline. The software implementation of the functions for calculating B-splines, performed according to classical algorithms, requires the use of blocking objects when accessing the common array of intermediate data from different threads. This reduces the degree of parallelism and reduces the efficiency of computational programs using B-splines running on multiprocessor computing systems. The article discusses a way to improve the efficiency of calculating B-splines in parallel programming tasks by eliminating locks when accessing general modified data. A soft­ware implementation is presented in the form of a C++ class template, which provides placement of a temporary array used for calculating a B-spline into a local buffer of a given size with the possibility of increasing it if necessary. Using the developed template in conjunction with the threadlocal qualifier reduces the number of requests for increasing the buffer for high degree B-splines (larger than the initially specified buffer size). It is also possible to implement this scheme using the std::vector template of the C++ STL Standard Library. The results of the application of the developed class when calculating the values of B-splines in a multithreaded environment, showing a reduction in the calculation time in proportion to an increase in the number of computational processors, are presented. The methods of specifying arrays for storing intermediate calculation results considered in this article can be used in other parallel programming tasks.


2021 ◽  
Author(s):  
Ali Al-Haboobi ◽  
Gabor Kecskemeti

Scientific workflows have been an increasingly important research area of distributed systems (such as cloud computing). Researchers have shown an increased interest in the automated processing scientific applications such as workflows. Recently, Function as a Service (FaaS) has emerged as a novel distributed systems platform for processing non-interactive applications. FaaS has limitations in resource use (e.g., CPU and RAM) as well as state management. In spite of these, initial studies have already demonstrated using FaaS for processing scientific workflows. DEWE v3 executes workflows in this fashion, but it often suffers from duplicate data transfers while using FaaS. This behaviour is due to the handling of intermediate data dependencies after and before each function invocation. These data dependencies could fill the temporary storage of the function environment. Our approach alters the job dispatch algorithm of DEWE v3 to reduce data dependency transfers. The proposed algorithm schedules jobs with precedence requirements to primarily run in the same function invocation. We evaluate our proposed algorithm and the original algorithm with small- and large-scale Montage workflows. Our results show that the improved system can reduce the total workflow execution time of scientific workflows over DEWE v3 by about 10\% when using AWS Lambda.


Sign in / Sign up

Export Citation Format

Share Document