Statismo - A framework for PCA based statistical models

This paper describes the Statismo framework, which is a framework for PCA based statistical models.Statistical models are used to describe the variability of an object within a population, learned from a set of training samples. Originally developed to model shapes, statistical models are now increasingly used to model the variation in different kind of data, such as for example images, volumetric meshes or deformation fields. Statismo has been developed with the following main goals in mind: 1) To provide generic tools for learning different kinds of PCA based statistical models, such as shape, appearance or deformations models. 2) To make the exchange of such models easier among different research groups and to improve the reproducibility of the models. 3) To allow for easy integration of new methods for model building into the framework. To achieve the first goal, we have abstracted all the aspects that are specific to a given model and data representation, into a user defined class. This does not only make it possible to use Statismo to create different kinds of PCA models, but also allows Statismo to be used with any toolkit and data format. To facilitate data exchange, Statismo defines a storage format based on HDF5, which includes all the information necessary to use the model, as well as meta-data about the model creation, which helps to make model building reproducible. The last goal is achieved by providing a clear separation between data management, model building and model representation. In addition to the standard method for building PCA models, Statismo already includes two recently proposed algorithms for building conditional models, as well as convenience tools for facilitating cross-validation studies. Although Statismo has been designed to be independent of a particular toolkit, special efforts have been made to make it directly useful for VTK and ITK. Besides supporting model building for most data representations used by VTK and ITK, it also provides an ITK transform class, which allows for the integration of Statismo with the ITK registration framework. This leverages the efforts from the ITK project to readily access powerful methods for model fitting.

Download Full-text

Dynamic model updating (DMU) approach for statistical learning model building with missing data

BMC Bioinformatics ◽

10.1186/s12859-021-04138-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Rahi Jain ◽

Wei Xu

Keyword(s):

Missing Data ◽

Dynamic Model ◽

Statistical Models ◽

Missing Values ◽

Model Building ◽

Model Updating ◽

Biological Data ◽

Bayesian Regression ◽

Biological Research ◽

Original Dataset

Abstract Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.

Download Full-text

Recent Trends in JSON Filters

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217116 ◽

2021 ◽

pp. 87-93

Author(s):

Atul Jain ◽

ShashiKant Gupta

Keyword(s):

Programming Language ◽

Web Application ◽

Data Exchange ◽

Web Applications ◽

Data Format ◽

Data Exchange Format ◽

Exchange Format ◽

Recent Trends ◽

Client Side ◽

Extract Information

JavaScript Object Notation is a text-based data exchange format for structuring data between a server and web application on the client-side. It is basically a data format, so it is not limited to Ajax-style web applications and can be used with API’s to exchange or store information. However, the whole data never to be used by the system or application, It needs some extract of a piece of requirement that may vary person to person and with the changing of time. The searching and filtration from the JSON string are very typical so most of the studies give only basics operation to query the data from the JSON object. The aim of this paper to find out all the methods with different technology to search and filter with JSON data. It explains the extensive results of previous research on the JSONiq Flwor expression and compares it with the json-query module of npm to extract information from JSON. This research has the intention of achieving the data from JSON with some advanced operators with the help of a prototype in json-query package of NodeJS. Thus, the data can be filtered out more efficiently and accurately without the need for any other programming language dependency. The main objective is to filter the JSON data the same as the SQL language query.

Download Full-text

An efficient binary storage format for IFC building models using HDF5 hierarchical data format

Automation in Construction ◽

10.1016/j.autcon.2020.103134 ◽

2020 ◽

Vol 113 ◽

pp. 103134 ◽

Cited By ~ 1

Author(s):

Thomas Krijnen ◽

Jakob Beetz

Keyword(s):

Hierarchical Data ◽

Data Format ◽

Building Models ◽

Hierarchical Data Format ◽

Storage Format

Download Full-text

Development of an XML-Based Specification for Traffic Model Data Exchange

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1804-19 ◽

2002 ◽

Vol 1804 (1) ◽

pp. 144-150

Author(s):

Kenneth G. Courage ◽

Scott S. Washburn ◽

Jin-Tae Kim

Keyword(s):

Traffic Control ◽

Data Exchange ◽

Transportation Network ◽

Data Entry ◽

Data Representation ◽

Traffic Model ◽

Model Data ◽

Traffic Demand ◽

Data Formats ◽

Data Files

The proliferation of traffic software programs on the market has resulted in many very specialized programs, intended to analyze one or two specific items within a transportation network. Consequently, traffic engineers use multiple programs on a single project, which ironically has resulted in new inefficiency for the traffic engineer. Most of these programs deal with the same core set of data, for example, physical roadway characteristics, traffic demand levels, and traffic control variables. However, most of these programs have their own formats for saving data files. Therefore, these programs cannot share information directly or communicate with each other because of incompatible data formats. Thus, the traffic engineer is faced with manually reentering common data from one program into another. In addition to inefficiency, this also creates additional opportunities for data entry errors. XML is catching on rapidly as a means for exchanging data between two systems or users who deal with the same data but in different formats. Specific vocabularies have been developed for statistics, mathematics, chemistry, and many other disciplines. The traffic model markup language (TMML) is introduced as a resource for traffic model data representation, storage, rendering, and exchange. TMML structure and vocabulary are described, and examples of their use are presented.

Download Full-text

The NeXus data format

Journal of Applied Crystallography ◽

10.1107/s1600576714027575 ◽

2015 ◽

Vol 48 (1) ◽

pp. 301-305 ◽

Cited By ~ 61

Author(s):

Mark Könnecke ◽

Frederick A. Akeroyd ◽

Herbert J. Bernstein ◽

Aaron S. Brewster ◽

Stuart I. Campbell ◽

...

Keyword(s):

Experimental Data ◽

Data Exchange ◽

Scientific Data ◽

Use Case ◽

Data Format ◽

X Ray ◽

Specific Field ◽

Domain Specific ◽

International Group

NeXus is an effort by an international group of scientists to define a common data exchange and archival format for neutron, X-ray and muon experiments. NeXus is built on top of the scientific data format HDF5 and adds domain-specific rules for organizing data within HDF5 files, in addition to a dictionary of well defined domain-specific field names. The NeXus data format has two purposes. First, it defines a format that can serve as a container for all relevant data associated with a beamline. This is a very important use case. Second, it defines standards in the form of application definitions for the exchange of data between applications. NeXus provides structures for raw experimental data as well as for processed data.

Download Full-text

The electromagnetic data exchange: much more than a common data format

2nd European Conference on Antennas and Propagation (EuCAP 2007) ◽

10.1049/ic.2007.1235 ◽

2007 ◽

Cited By ~ 1

Author(s):

P.E. Frandsen ◽

M. Ghilardi ◽

F. Mioc ◽

M. Sabbadini ◽

F. Silvestri

Keyword(s):

Data Exchange ◽

Data Format ◽

Electromagnetic Data

Download Full-text

The HUPO Proteomics Standards Initiative Meeting: Towards Common Standards for Exchanging Proteomics Data

Comparative and Functional Genomics ◽

10.1002/cfg.232 ◽

2003 ◽

Vol 4 (1) ◽

pp. 16-19 ◽

Cited By ~ 19

Author(s):

Sandra Orchard ◽

Paul Kersey ◽

Henning Hermjakob ◽

Rolf Apweiler

Keyword(s):

Data Storage ◽

Mass Spectroscopy ◽

Protein Interactions ◽

Data Exchange ◽

Data Representation ◽

Protein Protein Interactions ◽

Proteomics Data ◽

Inaugural Meeting ◽

Minimum Requirements ◽

Definition Of

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification. Initially the fields of protein–protein interactions (PPI) and mass spectroscopy have been targeted and the inaugural meeting of the PSI addressed the questions of data storage and exchange in both of these areas. The PPI group rapidly reached consensus as to the minimum requirements for a data exchange model; an XML draft is now being produced. The mass spectroscopy group have achieved major advances in the definition of a required data model and working groups are currently taking these discussions further. A further meeting is planned in January 2003 to advance both these projects.

Download Full-text

The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

Comparative and Functional Genomics ◽

10.1002/cfg.374 ◽

2004 ◽

Vol 5 (2) ◽

pp. 184-189 ◽

Cited By ~ 3

Author(s):

H. Schoof ◽

R. Ernst ◽

K. F. X. Mayer

Keyword(s):

Data Exchange ◽

Data Representation ◽

Data Models ◽

Biological Data ◽

Plant Genome ◽

Data Sources ◽

Direct Access ◽

Biological Knowledge ◽

Database Integration ◽

Complex Data

The completion of theArabidopsisgenome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration ofArabidopsisand other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

Download Full-text

Statistical Models for the Twinning Rate

Acta geneticae medicae et gemellologiae twin research ◽

10.1017/s000156600000605x ◽

1987 ◽

Vol 36 (3) ◽

pp. 297-312 ◽

Cited By ~ 13

Author(s):

J.O. Fellman ◽

A.W. Eriksson

Keyword(s):

Linear Regression ◽

Statistical Models ◽

Maternal Age ◽

Regression Models ◽

Model Building ◽

Data Sets ◽

Linear Regression Models ◽

Linear Regression Technique ◽

Secular Decline ◽

Disaggregated Data

AbstractLinear regression models are used to explain the variations in the twinning rates. Data sets from different countries are analysed and maternal age, parity and marital status are the main regressors. The model building technique is also used in order to study the secular decline in the twinning rate. Linear regression technique makes it possible to compare the effect of different factors but the method requires sufficiently disaggregated data.

Download Full-text

Towards Building a Uniform Cloud Database Representation for Data Interchange

Annals of West University of Timisoara - Mathematics and Computer Science ◽

10.1515/awutm-2016-0011 ◽

2016 ◽

Vol 54 (2) ◽

pp. 3-11

Author(s):

Alina Andreica

Keyword(s):

Information Systems ◽

Data Exchange ◽

Design Principles ◽

Data Representation ◽

Canonical Representation ◽

Cloud Services ◽

Societal Benefits ◽

Data Interchange ◽

Cloud Database ◽

Generic Design

Abstract The paper proposes design principles for data representation and simplification in order to design cloud services for data exchange between various information systems. We use equivalence algorithms and canonical representation in the cloud database. The solution we describe brings important advantages in organizational / entity communication and cooperation, with important societal benefits and can be provided within cloud architectures. The generic design principles we apply bring important advantages in the design of the interchange services.

Download Full-text