database size
Recently Published Documents


TOTAL DOCUMENTS

71
(FIVE YEARS 23)

H-INDEX

7
(FIVE YEARS 1)

Author(s):  
Sonal Tuteja ◽  
Rajeev Kumar

AbstractThe incorporation of heterogeneous data models into large-scale e-commerce applications incurs various complexities and overheads, such as redundancy of data, maintenance of different data models, and communication among different models for query processing. Graphs have emerged as data modelling techniques for large-scale applications with heterogeneous, schemaless, and relationship-centric data. Models exist for mapping different types of data to a graph; however, the unification of data from heterogeneous source models into a graph model has not received much attention. To address this, we propose a new framework in this study. The proposed framework first transforms data from various source models into graph models individually and then unifies them into a single graph. To justify the applicability of the proposed framework in e-commerce applications, we analyse and compare query performance, scalability, and database size of the unified graph with heterogeneous source data models for a predefined set of queries. We also access some qualitative measures, such as flexibility, completeness, consistency, and maturity for the proposed unified graph. Based on the experimental results, the unified graph outperforms heterogeneous source models for query performance and scalability; however, it falls behind for database size.


2021 ◽  
Vol 50 (4) ◽  
pp. 627-644
Author(s):  
Shariq Bashir ◽  
Daphne Teck Ching Lai

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.


2021 ◽  
Author(s):  
Laura Fancello ◽  
Thomas Burger

ABSTRACTBackgroundProteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases (i.e., built from reference protein databases only retaining proteins whose transcripts are detected in the sample-matched transcriptome), which were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach.ResultsFirst, we established that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative FDRs are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible.ConclusionIn summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce ambiguity of protein identifications.


2021 ◽  
Author(s):  
Ramy Shahin ◽  
Murad Akhundov ◽  
marsha chechik

Applying program analyses to Software Product Lines (SPLs) has been a fundamental research problem at the intersection<br>of Product Line Engineering and software analysis. Different attempts have been made to "lift" particular product-level analyses to run on the entire product line. In this paper, we tackle the class of Datalog-based analyses (e.g., pointer and taint analyses), study the theoretical aspects of lifting Datalog inference, and implement a lifted inference algorithm inside the Souffl  Datalog engine. We evaluate our implementation on a set of Java and C-language benchmark product lines. We show significant savings in processing time and fact database size (billions of times faster on one of the benchmarks) compared to brute-force analysis of each product individually.


2021 ◽  
Author(s):  
Ramy Shahin ◽  
Murad Akhundov ◽  
marsha chechik

Applying program analyses to Software Product Lines (SPLs) has been a fundamental research problem at the intersection<br>of Product Line Engineering and software analysis. Different attempts have been made to "lift" particular product-level analyses to run on the entire product line. In this paper, we tackle the class of Datalog-based analyses (e.g., pointer and taint analyses), study the theoretical aspects of lifting Datalog inference, and implement a lifted inference algorithm inside the Souffl  Datalog engine. We evaluate our implementation on a set of Java and C-language benchmark product lines. We show significant savings in processing time and fact database size (billions of times faster on one of the benchmarks) compared to brute-force analysis of each product individually.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 3965
Author(s):  
Alexander A. Kharlamov ◽  
Aleksei N. Raskhodchikov ◽  
Maria Pilgun

The article presents the results of the analysis of the adaptation of metropolis IT technologies to solve operational problems in extreme conditions during the COVID-19 pandemic. The material for the study was Russian-language data from social networks, microblogging, blogs, instant messengers, forums, reviews, video hosting services, thematic portals, online media, print media and TV related to the first wave of the COVID-19 pandemic in Russia. The data were collected between 1 March 2020 and 1 June 2020. The database size includes 85,493,717 characters. To analyze the content of social media, a multimodal approach was used involving neural network technologies, text analysis, sentiment-analysis and analysis of lexical associations. The transformation of old digital services and applications, as well as the emergence of new ones were analyzed in terms of the perception of digital communications by actors.


Author(s):  
Setiawan Budiman ◽  
Faisal Fadhila ◽  
Vian Ardiyansyah Saputro ◽  
Ema Utami ◽  
Khusnawi Khusnawi

Dunia digital sekarang sudah merambah pada seluruh aspek kehidupan. Salah satu yang berperan penting dalam digitalisasi adalah database. Seluruh perusahaan digital pasti akan mempunyai database untuk kepentingan usaha. Semakin lama database yang dikelola akan semakin banyak, oleh karena itu kita membutuhkan aplikasi database yang ideal untuk digunakan. Pada penelitian ini, kami akan melakukan perbandingan kecepatan proses data antara SQL (MySQL) dengan NoSQL (MongoDB) dengan menggunakan VPS webserver Apache dan bahasa program PHP yang umum digunakan oleh banyak perusahaan. Metodologi yang kami gunakan adalah menguji coba kecepatan SELECT, INSERT, DELETE dan UPDATE pada jumlah data mulai 1.000, 10.000, 100.000, 1.000.000, 2.000.000 dan 5.000.000. Khusus untuk SELECT dan INSERT, kami melakukan proses looping data seperti yang umum dilakukan pada program PHP (do while). Hasil dari penelitian ini kami dapatkan bahwa proses ujicoba dengan jutaan data akan terasa lebih cepat saat menggunakan MongoDB dibanding MySQL untuk proses SELECT INSERT dan UPDATE. Namun berbeda ketika menggunakan proses DELETE, MySQL memiliki waktu respon yang lebih baik dibandingkan dengan MongoDB. Hal ini disebabkan karena MongoDB menggunakan program PHP sebagai aplikasi yang menjalankan proses query. Digital world has been impact almost every aspect of life. Database is the most important thing at digitization. All digital company will have databases. After several period of usage, the database size will improve, therefore we need to use the ideal database system. At this paper, we will compare the speed of processing data between SQL (MySQL) vs NoSQL (MongoDB) which run on VPS Apache webserver using famous PHP script. The methodology that we use for speed measurement will use SELECT, INSERT, DELETE and UPDATE at big data start from 1.000, 10.000, 100.000, 1.000.000, 2.000.000 and 5.000.000. Especially for SELECT and INSERT, we do the looping procedure that usually use at PHP script (do while). The result from this paper declare that processing million of data will be faster on MongoDB if compare with MySQL when we use SELECT, INSERT and UPDATE. However, it is different when using the DELETE process, MySQL has a better response time compared to MongoDB. This is because MongoDB uses a PHP program as an application that runs the query process.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Wenju Xu ◽  
Baocang Wang ◽  
Rongxing Lu ◽  
Quanbo Qu ◽  
Yange Chen ◽  
...  

Private information retrieval (PIR) protocol is a powerful cryptographic tool and has received considerable attention in recent years as it can not only help users to retrieve the needed data from database servers but also protect them from being known by the servers. Although many PIR protocols have been proposed, it remains an open problem to design an efficient PIR protocol whose communication overhead is irrelevant to the database size N . In this paper, to answer this open problem, we present a new communication-efficient PIR protocol based on our proposed single-ciphertext fully homomorphic encryption (FHE) scheme, which supports unlimited computations with single variable over a single ciphertext even without access to the secret key. Specifically, our proposed PIR protocol is characterized by combining our single-ciphertext FHE with Lagrange interpolating polynomial technique to achieve better communication efficiency. Security analyses show that the proposed PIR protocol can efficiently protect the privacy of the user and the data in the database. In addition, both theoretical analyses and experimental evaluations are conducted, and the results indicate that our proposed PIR protocol is also more efficient and practical than previously reported ones. To the best of our knowledge, our proposed protocol is the first PIR protocol achieving O 1 communication efficiency on the user side, irrelevant to the database size N .


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249410
Author(s):  
Sajal Dash ◽  
Sarthok Rasique Rahman ◽  
Heather M. Hines ◽  
Wu-chun Feng

Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.


Bangla is a useful language to study nasal vowels because all the vowels have their corresponding nasal vowel counterpart. Vowel nasality generation is an important task for artificial nasality production in speech synthesizer. Various methods have been employed by many researchers for generating vowel nasality. Vowel nasality generation for a rule-basedspeech synthesizer has not been studied yet for Bangla. This study discusses several methods using full spectrum and partial spectrum for generating vowel nasality to use in a rule-basedBangla text to speech (TTS) system using demisyllable. In a demisyllable based Bangla TTS 1400 demisyllables are needed to be stored in database. Transforming the vowel part of a demisyllable into its nasal counterpart reduces the speech database size to 700 demisyllables. Comparative study of the e


Sign in / Sign up

Export Citation Format

Share Document