The graphical data representation subsystem of theSelena teaching/research system for automated design of remote data processing networks

1995 ◽  
Vol 74 (5) ◽  
pp. 1211-1213
Author(s):  
V. M. Gostev
2020 ◽  
Vol 53 (5-6) ◽  
pp. 255-273
Author(s):  
Wei Liang ◽  
Litao Lu ◽  
Hongyao Wang

2018 ◽  
Vol 210 ◽  
pp. 05016
Author(s):  
Mariusz Chmielewski ◽  
Damian Frąszczak ◽  
Dawid Bugajewski

This paper discusses experiences and architectural concepts developed and tested aimed at acquisition and processing of biomedical data in large scale system for elderly (patients) monitoring. Major assumptions for the research included utilisation of wearable and mobile technologies, supporting maximum number of inertial and biomedical data to support decision algorithms. Although medical diagnostics and decision algorithms have not been the main aim of the research, this preliminary phase was crucial to test capabilities of existing off-the-shelf technologies and functional responsibilities of system’s logic components. Architecture variants contained several schemes for data processing moving the responsibility for signal feature extraction, data classification and pattern recognition from wearable to mobile up to server facilities. Analysis of transmission and processing delays provided architecture variants pros and cons but most of all knowledge about applicability in medical, military and fitness domains. To evaluate and construct architecture, a set of alternative technology stacks and quantitative measures has been defined. The major architecture characteristics (high availability, scalability, reliability) have been defined imposing asynchronous processing of sensor data, efficient data representation, iterative reporting, event-driven processing, restricting pulling operations. Sensor data processing persist the original data on handhelds but is mainly aimed at extracting chosen set of signal features calculated for specific time windows – varying for analysed signals and the sensor data acquisition rates. Long term monitoring of patients requires also development of mechanisms, which probe the patient and in case of detecting anomalies or drastic characteristic changes tune the data acquisition process. This paper describes experiences connected with design of scalable decision support tool and evaluation techniques for architectural concepts implemented within the mobile and server software.


Author(s):  
Anandakumar H ◽  
Tamilselvan T ◽  
Nandni S ◽  
Subashree R ◽  
Vinodhini E

Big data stands for effective handling of large amount of data, research, mining, intelligence. In social media large amount of data uploaded every.Social media handle large amount of data like photo, video, songs and so many using big data. When it comes for big data, a large amount of data should be effectively handled. Big data face various challenges like clustering of data, visualizing, data representation, data processing, pattern mining, tracking of data and analysing behaviour of users. In this paper the Emoji in messages are decoded and Unicode will be set. Based on the Emoji the user interest can be understood in a better way. Then another part involves the replacement of repeated data by using the map Reduce algorithm. Mapping of data with key values used to reduce the size of storage.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S10) ◽  
Author(s):  
Tanveer Ahmad ◽  
Nauman Ahmed ◽  
Zaid Al-Ars ◽  
H. Peter Hofstee

Abstract Background Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads. Implementation We integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects. Results Our implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising. Availability The code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM.


Sign in / Sign up

Export Citation Format

Share Document