data partitions Latest Research Papers

The Internet of Things (IoT) security is one of the most important issues developers have to face. Data tampering must be prevented in IoT devices and some or all of the confidentiality, integrity, and authenticity of sensible data files must be assured in most practical IoT applications, especially when data are stored in removable devices such as microSD cards, which is very common. Software solutions are usually applied, but their effectiveness is limited due to the reduced resources available in IoT systems. This paper introduces a hardware-based security framework for IoT devices (Embedded LUKS) similar to the Linux Unified Key Setup (LUKS) solution used in Linux systems to encrypt data partitions. Embedded LUKS (E-LUKS) extends the LUKS capabilities by adding integrity and authentication methods, in addition to the confidentiality already provided by LUKS. E-LUKS uses state-of-the-art encryption and hash algorithms such as PRESENT and SPONGENT. Both are recognized as adequate solutions for IoT devices being PRESENT incorporated in the ISO/IEC 29192-2:2019 for lightweight block ciphers. E-LUKS has been implemented in modern XC7Z020 FPGA chips, resulting in a smaller hardware footprint compared to previous LUKS hardware implementations, a footprint of about a 10% of these LUKS implementations, making E-LUKS a great alternative to provide Full Disk Encryption (FDE) alongside authentication to a wide range of IoT devices.

Download Full-text

Sfaira accelerates data and model reuse in single cell genomics

Genome Biology ◽

10.1186/s13059-021-02452-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

David S. Fischer ◽

Leander Dony ◽

Martin König ◽

Abdul Moeed ◽

Luke Zappia ◽

...

Keyword(s):

Single Cell ◽

Data Sets ◽

Rna Seq ◽

Cell Type ◽

Training Models ◽

Public Data ◽

Data Partitions ◽

Cell Data ◽

Type Classification ◽

Different Levels

AbstractSingle-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

Download Full-text

Analisa Performa Klastering Data Besar pada Hadoop

Infotek : Jurnal Informatika dan Teknologi ◽

10.29408/jit.v4i2.3565 ◽

2021 ◽

Vol 4 (2) ◽

pp. 174-183

Author(s):

Hadian Mandala Putra ◽

◽

Taufik Akbar ◽

Ahwan Ahmadi ◽

Muhammad Iman Darmawan ◽

...

Keyword(s):

Big Data ◽

Relational Model ◽

Data Types ◽

Analysis Process ◽

A Value ◽

Silhouette Coefficient ◽

Data Partitions ◽

Hadoop Cluster ◽

Map Function ◽

Cluster Quality

Big Data is a collection of data with a large and complex size, consisting of various data types and obtained from various sources, overgrowing quickly. Some of the problems that will arise when processing big data, among others, are related to the storage and access of big data, which consists of various types of data with high complexity that are not able to be handled by the relational model. One technology that can solve the problem of storing and accessing big data is Hadoop. Hadoop is a technology that can store and process big data by distributing big data into several data partitions (data blocks). Problems arise when an analysis process requires all data spread out into one data entity, for example, in the data clustering process. One alternative solution is to do a parallel and scattered analysis, then perform a centralized analysis of the results of the scattered analysis. This study examines and analyzes two methods, namely K-Medoids Mapreduce and K-Modes without Mapreduce. The dataset used is a dataset about cars consisting of 3.5 million rows of data with 400MB distributed in a Hadoop Cluster (consisting of more than one engine). Hadoop has a MapReduce feature, consisting of 2 functions, namely map and reduce. The map function performs a selection to retrieve a key, value pairs, and returns a value in the form of a collection of key value pairs, and then the reduce function combines all key value pairs from several map functions. The results of the cluster quality evaluation are tested using the Silhouette Coefficient testing metric. The K-Medoids MapReduce algorithm for the car dataset gives a silhouette value of 0.99 with a total of 2 clusters.

Download Full-text

A methodology using classification for traffic prediction: Featuring the impact of COVID-19

Integrated Computer-Aided Engineering ◽

10.3233/ica-210663 ◽

2021 ◽

pp. 1-19

Author(s):

Stergios Liapis ◽

Konstantinos Christantonis ◽

Victor Chazan-Pantzalis ◽

Anastassios Manos ◽

Despina Elizabeth Filippidou ◽

...

Keyword(s):

Data Model ◽

Research Question ◽

Meteorological Conditions ◽

Traffic Prediction ◽

Time Intervals ◽

Traffic State ◽

Data Partitions ◽

The Impact ◽

Training Subset ◽

And Training

This paper presents a novel methodology using classification for day-ahead traffic prediction. It addresses the research question whether traffic state can be forecasted based on meteorological conditions, seasonality, and time intervals, as well as COVID-19 related restrictions. We propose reliable models utilizing smaller data partitions. Apart from feature selection, we incorporate new features related to movement restrictions due to COVID-19, forming a novel data model. Our methodology explores the desired training subset. Results showed that various models can be developed, with varying levels of success. The best outcome was achieved when factoring in all relevant features and training on a proposed subset. Accuracy improved significantly compared to previously published work.

Download Full-text

Comparative analyses of Mikania (Asteraceae: Eupatorieae) plastomes and impact of data partitioning and inference methods on phylogenetic relationships

Scientific Reports ◽

10.1038/s41598-021-92727-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Verônica A. Thode ◽

Caetano T. Oliveira ◽

Benoît Loeuille ◽

Carolina M. Siniscalchi ◽

José R. Pirani

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Trees ◽

Data Partitioning ◽

Gene Content ◽

Data Partition ◽

Variable Regions ◽

Multispecies Coalescent ◽

Data Partitions ◽

Inference Methods ◽

The Impact

AbstractWe assembled new plastomes of 19 species of Mikania and of Ageratina fastigiata, Litothamnus nitidus, and Stevia collina, all belonging to tribe Eupatorieae (Asteraceae). We analyzed the structure and content of the assembled plastomes and used the newly generated sequences to infer phylogenetic relationships and study the effects of different data partitions and inference methods on the topologies. Most phylogenetic studies with plastomes ignore that processes like recombination and biparental inheritance can occur in this organelle, using the whole genome as a single locus. Our study sought to compare this approach with multispecies coalescent methods that assume that different parts of the genome evolve at different rates. We found that the overall gene content, structure, and orientation are very conserved in all plastomes of the studied species. As observed in other Asteraceae, the 22 plastomes assembled here contain two nested inversions in the LSC region. The plastomes show similar length and the same gene content. The two most variable regions within Mikania are rpl32-ndhF and rpl16-rps3, while the three genes with the highest percentage of variable sites are ycf1, rpoA, and psbT. We generated six phylogenetic trees using concatenated maximum likelihood and multispecies coalescent methods and three data partitions: coding and non-coding sequences and both combined. All trees strongly support that the sampled Mikania species form a monophyletic group, which is further subdivided into three clades. The internal relationships within each clade are sensitive to the data partitioning and inference methods employed. The trees resulting from concatenated analysis are more similar among each other than to the correspondent tree generated with the same data partition but a different method. The multispecies coalescent analysis indicate a high level of incongruence between species and gene trees. The lack of resolution and congruence among trees can be explained by the sparse sampling (~ 0.45% of the currently accepted species) and by the low number of informative characters present in the sequences. Our study sheds light into the impact of data partitioning and methods over phylogenetic resolution and brings relevant information for the study of Mikania diversity and evolution, as well as for the Asteraceae family as a whole.

Download Full-text

Assessment of Cluster Tendency Methods for Visualizing the Data Partitions

Advanced Aspects of Engineering Research Vol. 11 ◽

10.9734/bpi/aaer/v11/8945d ◽

2021 ◽

pp. 33-40

Author(s):

M. Suleman Basha ◽

S. K. Mouleeswaran ◽

K. Rajendra Prasad

Keyword(s):

Data Partitions

Download Full-text

Systematic review of research design and reporting of imaging studies applying convolutional neural networks for radiological cancer diagnosis

European Radiology ◽

10.1007/s00330-021-07881-2 ◽

2021 ◽

Author(s):

Robert J. O’Shea ◽

Amy Rose Sharkey ◽

Gary J. R. Cook ◽

Vicky Goh

Keyword(s):

Neural Network ◽

Systematic Review ◽

Convolutional Neural Network ◽

Cancer Diagnosis ◽

Model Performance ◽

Network Models ◽

Imaging Studies ◽

Neural Network Models ◽

Eligibility Criteria ◽

Data Partitions

Abstract Objectives To perform a systematic review of design and reporting of imaging studies applying convolutional neural network models for radiological cancer diagnosis. Methods A comprehensive search of PUBMED, EMBASE, MEDLINE and SCOPUS was performed for published studies applying convolutional neural network models to radiological cancer diagnosis from January 1, 2016, to August 1, 2020. Two independent reviewers measured compliance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Compliance was defined as the proportion of applicable CLAIM items satisfied. Results One hundred eighty-six of 655 screened studies were included. Many studies did not meet the criteria for current design and reporting guidelines. Twenty-seven percent of studies documented eligibility criteria for their data (50/186, 95% CI 21–34%), 31% reported demographics for their study population (58/186, 95% CI 25–39%) and 49% of studies assessed model performance on test data partitions (91/186, 95% CI 42–57%). Median CLAIM compliance was 0.40 (IQR 0.33–0.49). Compliance correlated positively with publication year (ρ = 0.15, p = .04) and journal H-index (ρ = 0.27, p < .001). Clinical journals demonstrated higher mean compliance than technical journals (0.44 vs. 0.37, p < .001). Conclusions Our findings highlight opportunities for improved design and reporting of convolutional neural network research for radiological cancer diagnosis. Key Points • Imaging studies applying convolutional neural networks (CNNs) for cancer diagnosis frequently omit key clinical information including eligibility criteria and population demographics. • Fewer than half of imaging studies assessed model performance on explicitly unobserved test data partitions. • Design and reporting standards have improved in CNN research for radiological cancer diagnosis, though many opportunities remain for further progress.

Download Full-text

Stability and Structure of CART and SPAN Search Generated Data Partitions for the Analysis of Low Birth Weight

Journal of Data Science ◽

10.6339/jds.201201_10(1).0005 ◽

2021 ◽

Vol 10 (1) ◽

pp. 61-73

Author(s):

Roger J. Marshall ◽

Panagiota Kitsantas

Keyword(s):

Birth Weight ◽

Low Birth Weight ◽

Data Partitions

Download Full-text

Fuzzy logic aggregation of crisp data partitions as learning analytics in triage decisions

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113512 ◽

2020 ◽

Vol 158 ◽

pp. 113512

Author(s):

Giuseppe Pandolfo ◽

Antonio D’Ambrosio ◽

Lorella Cannavacciuolo ◽

Roberta Siciliano

Keyword(s):

Fuzzy Logic ◽

Learning Analytics ◽

Data Partitions ◽

Triage Decisions

Download Full-text

CORRELATIONS AND QUERY PROCESSING

International Journal of Advanced Research ◽

10.21474/ijar01/11726 ◽

2020 ◽

Vol 8 (9) ◽

pp. 811-816

Author(s):

Bhanu Shanker Prasad ◽

Keyword(s):

Query Processing ◽

Query Optimization ◽

Execution Time ◽

Statistical Characteristics ◽

Query Execution ◽

Execution Engine ◽

Join Queries ◽

Data Partitions ◽

Highly Correlated ◽

Horizontal Partitioning

It is known that optimization of join queries based on average selectivities is sub-optimal in highly correlated databases. Relations are naturally divided into partitions , each partition having substantially different statistical characteristics in such databases. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query , one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plan. The purpose of this paper is to present faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations.

Download Full-text

data partitions
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Embedded LUKS (E-LUKS): A Hardware Solution to IoT Security

Sfaira accelerates data and model reuse in single cell genomics

Analisa Performa Klastering Data Besar pada Hadoop

A methodology using classification for traffic prediction: Featuring the impact of COVID-19

Comparative analyses of Mikania (Asteraceae: Eupatorieae) plastomes and impact of data partitioning and inference methods on phylogenetic relationships

Assessment of Cluster Tendency Methods for Visualizing the Data Partitions

Systematic review of research design and reporting of imaging studies applying convolutional neural networks for radiological cancer diagnosis

Stability and Structure of CART and SPAN Search Generated Data Partitions for the Analysis of Low Birth Weight

Fuzzy logic aggregation of crisp data partitions as learning analytics in triage decisions

CORRELATIONS AND QUERY PROCESSING

Export Citation Format

data partitionsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Embedded LUKS (E-LUKS): A Hardware Solution to IoT Security

Sfaira accelerates data and model reuse in single cell genomics

Analisa Performa Klastering Data Besar pada Hadoop

A methodology using classification for traffic prediction: Featuring the impact of COVID-19

Comparative analyses of Mikania (Asteraceae: Eupatorieae) plastomes and impact of data partitioning and inference methods on phylogenetic relationships

Assessment of Cluster Tendency Methods for Visualizing the Data Partitions

Systematic review of research design and reporting of imaging studies applying convolutional neural networks for radiological cancer diagnosis

Stability and Structure of CART and SPAN Search Generated Data Partitions for the Analysis of Low Birth Weight

Fuzzy logic aggregation of crisp data partitions as learning analytics in triage decisions

CORRELATIONS AND QUERY PROCESSING

data partitions
Recently Published Documents