ПРОСТОРНО - ВРЕМЕНСКИ ТИПОВИ И АНАЛИЗЕ ПОДАТАКА У BIG DATA ПАРАДИГМИ

Модел за управљање великим серијама просторно-временских података имплементиран је на Apache Spark open-source платформи за складиштење и обраду великих серија података на дистрибуираним рачунарским системима формираним од комерцијално доступних радних станица. Алгоритми за обраду просторно-временских података су дефинисани према правилима Spark SQL програмског модела, а релационе операције на DataFrame-овима (специјализованим системом оквира података) коришћењем специфичног језика домена (domain – specific – language → DSL). Увођењем просторно-временских типова података омогућава се стандардизован приступ у Big Data парадигми.

Download Full-text

Big data platform development with a domain specific language for telecom industries

2013 High Capacity Optical Networks and Emerging/Enabling Technologies ◽

10.1109/honet.2013.6729768 ◽

2013 ◽

Cited By ~ 3

Author(s):

Cuneyt Senbalci ◽

Serkan Altuntas ◽

Zeki Bozkus ◽

Taner Arsan

Keyword(s):

Big Data ◽

Domain Specific Language ◽

Specific Language ◽

Domain Specific ◽

Data Platform ◽

Platform Development

Download Full-text

Cutevariant: a GUI-based desktop application to explore genetics variations

10.1101/2021.02.10.430619 ◽

2021 ◽

Author(s):

Sacha Schutz ◽

Pierre Marijon ◽

Tristan Montier ◽

Emmanuelle Genin

Keyword(s):

Open Source ◽

Relational Database ◽

Variant Calling ◽

Genomic Research ◽

Domain Specific Language ◽

Specific Language ◽

Domain Specific ◽

Desktop Application ◽

User Friendly ◽

Client Side

AbstractCutevariant is a user-friendly GUI based desktop application for genomic research designed to search for variations in DNA samples collected in annotated files and encoded in the Variant Calling Format. The application imports data into a local relational database wherefrom complex filter-queries can be built either from the intuitive GUI or using a Domain Specific Language (DSL). Cutevariant provides more features than any existing applications without compromising on performance. The plugin based architecture provides highly customizable features. Cutevariant is distributed as a multiplatform client-side software under an open source licence and is available at https://github.com/labsquare/Cutevariant. It has been designed from the beginning to be easily adopted by IT-agnostic end-users.

Download Full-text

SGL: A Domain-Specific Language for Large-Scale Analysis of Open-Source Code

2018 IEEE Cybersecurity Development (SecDev) ◽

10.1109/secdev.2018.00016 ◽

2018 ◽

Author(s):

Darius Foo ◽

Ming Yi Ang ◽

Jason Yeo ◽

Asankhaya Sharma

Keyword(s):

Open Source ◽

Large Scale ◽

Source Code ◽

Domain Specific Language ◽

Scale Analysis ◽

Specific Language ◽

Open Source Code ◽

Domain Specific ◽

Large Scale Analysis

Download Full-text

NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language

10.1101/367755 ◽

2018 ◽

Cited By ~ 2

Author(s):

Luis Pedro Coelho ◽

Renato Alves ◽

Paulo Monteiro ◽

Jaime Huerta-Cepas ◽

Ana Teresa Freitas ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Domain Specific Language ◽

Next Generation ◽

Specific Language ◽

Sequence Processing ◽

Domain Specific ◽

Link Type ◽

Fast Processing ◽

User Friendly

AbstractNGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility.Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. These pipelines can easily be customized and extended with other tools.NGLess and NG-meta-profiler are open source software (under the liberal MIT licence) and can be downloaded from http://ngless.embl.de or installed through bioconda.

Download Full-text

Efficient Search Mechanism from Large Scale Corpora for Domain-Specific Language Modeling in Speech Recognition

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8416.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1682-1689

Keyword(s):

Big Data ◽

Speech Recognition ◽

Language Model ◽

Search Space ◽

Document Retrieval ◽

Language Modeling ◽

Domain Specific Language ◽

Specific Language ◽

Recognition Process ◽

Domain Specific

With the Internet and the World Wide Web revolution, large corpora in variety of forms are germinating ceaselessly that can be manifested as big data. One obligatory area for the usage of such large corpora is language modeling for large vocabulary continuous speech recognition. Language modeling is an indispensable module in speech recognition architecture, which plays a vital role in reducing the search space during the recognition process. Additionally, the language model that is contiguous to the domain of the speech can dwindle the search space and escalate the recognition accuracy. In this paper, an efficient searching mechanism for domain-specific document retrieval from the large corpora has been elucidated using Elasticsearch which is a distributed and an efficient search engine for big data. This assisted us in tuning the language model in accordance with the domain and also by reducing the search time by more than 90% in comparison to conventional search and retrieval mechanism used in our earlier work. A word level and a phrase level retrieval process for creating domain-specific language model has been implemented. The evaluation of the system is performed on the basis of word error rate (WER) and perplexity (PPL) of the speech recognition system. The results shows nearly 10% decrease on WER and a major reduction in the PPL that helped in boosting the performance of the speech recognition process. From the results, it can be consummated that Elasticsearch is an efficient mechanism for domain specific document retrieval from large corpora rather than using topic modeling toolkits

Download Full-text