scholarly journals Statismo - A framework for PCA based statistical models

2012 ◽  
Author(s):  
Marcel Lüthi ◽  
Remi Blanc ◽  
Thomas Albrecht ◽  
Tobias Gass ◽  
Orcun Goksel ◽  
...  

This paper describes the Statismo framework, which is a framework for PCA based statistical models.Statistical models are used to describe the variability of an object within a population, learned from a set of training samples. Originally developed to model shapes, statistical models are now increasingly used to model the variation in different kind of data, such as for example images, volumetric meshes or deformation fields. Statismo has been developed with the following main goals in mind: 1) To provide generic tools for learning different kinds of PCA based statistical models, such as shape, appearance or deformations models. 2) To make the exchange of such models easier among different research groups and to improve the reproducibility of the models. 3) To allow for easy integration of new methods for model building into the framework. To achieve the first goal, we have abstracted all the aspects that are specific to a given model and data representation, into a user defined class. This does not only make it possible to use Statismo to create different kinds of PCA models, but also allows Statismo to be used with any toolkit and data format. To facilitate data exchange, Statismo defines a storage format based on HDF5, which includes all the information necessary to use the model, as well as meta-data about the model creation, which helps to make model building reproducible. The last goal is achieved by providing a clear separation between data management, model building and model representation. In addition to the standard method for building PCA models, Statismo already includes two recently proposed algorithms for building conditional models, as well as convenience tools for facilitating cross-validation studies. Although Statismo has been designed to be independent of a particular toolkit, special efforts have been made to make it directly useful for VTK and ITK. Besides supporting model building for most data representations used by VTK and ITK, it also provides an ITK transform class, which allows for the integration of Statismo with the ITK registration framework. This leverages the efforts from the ITK project to readily access powerful methods for model fitting.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rahi Jain ◽  
Wei Xu

Abstract Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.


Author(s):  
Atul Jain ◽  
ShashiKant Gupta

JavaScript Object Notation is a text-based data exchange format for structuring data between a server and web application on the client-side. It is basically a data format, so it is not limited to Ajax-style web applications and can be used with API’s to exchange or store information. However, the whole data never to be used by the system or application, It needs some extract of a piece of requirement that may vary person to person and with the changing of time. The searching and filtration from the JSON string are very typical so most of the studies give only basics operation to query the data from the JSON object. The aim of this paper to find out all the methods with different technology to search and filter with JSON data. It explains the extensive results of previous research on the JSONiq Flwor expression and compares it with the json-query module of npm to extract information from JSON. This research has the intention of achieving the data from JSON with some advanced operators with the help of a prototype in json-query package of NodeJS. Thus, the data can be filtered out more efficiently and accurately without the need for any other programming language dependency. The main objective is to filter the JSON data the same as the SQL language query.


2002 ◽  
Vol 1804 (1) ◽  
pp. 144-150
Author(s):  
Kenneth G. Courage ◽  
Scott S. Washburn ◽  
Jin-Tae Kim

The proliferation of traffic software programs on the market has resulted in many very specialized programs, intended to analyze one or two specific items within a transportation network. Consequently, traffic engineers use multiple programs on a single project, which ironically has resulted in new inefficiency for the traffic engineer. Most of these programs deal with the same core set of data, for example, physical roadway characteristics, traffic demand levels, and traffic control variables. However, most of these programs have their own formats for saving data files. Therefore, these programs cannot share information directly or communicate with each other because of incompatible data formats. Thus, the traffic engineer is faced with manually reentering common data from one program into another. In addition to inefficiency, this also creates additional opportunities for data entry errors. XML is catching on rapidly as a means for exchanging data between two systems or users who deal with the same data but in different formats. Specific vocabularies have been developed for statistics, mathematics, chemistry, and many other disciplines. The traffic model markup language (TMML) is introduced as a resource for traffic model data representation, storage, rendering, and exchange. TMML structure and vocabulary are described, and examples of their use are presented.


2015 ◽  
Vol 48 (1) ◽  
pp. 301-305 ◽  
Author(s):  
Mark Könnecke ◽  
Frederick A. Akeroyd ◽  
Herbert J. Bernstein ◽  
Aaron S. Brewster ◽  
Stuart I. Campbell ◽  
...  

NeXus is an effort by an international group of scientists to define a common data exchange and archival format for neutron, X-ray and muon experiments. NeXus is built on top of the scientific data format HDF5 and adds domain-specific rules for organizing data within HDF5 files, in addition to a dictionary of well defined domain-specific field names. The NeXus data format has two purposes. First, it defines a format that can serve as a container for all relevant data associated with a beamline. This is a very important use case. Second, it defines standards in the form of application definitions for the exchange of data between applications. NeXus provides structures for raw experimental data as well as for processed data.


2003 ◽  
Vol 4 (1) ◽  
pp. 16-19 ◽  
Author(s):  
Sandra Orchard ◽  
Paul Kersey ◽  
Henning Hermjakob ◽  
Rolf Apweiler

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification. Initially the fields of protein–protein interactions (PPI) and mass spectroscopy have been targeted and the inaugural meeting of the PSI addressed the questions of data storage and exchange in both of these areas. The PPI group rapidly reached consensus as to the minimum requirements for a data exchange model; an XML draft is now being produced. The mass spectroscopy group have achieved major advances in the definition of a required data model and working groups are currently taking these discussions further. A further meeting is planned in January 2003 to advance both these projects.


2004 ◽  
Vol 5 (2) ◽  
pp. 184-189 ◽  
Author(s):  
H. Schoof ◽  
R. Ernst ◽  
K. F. X. Mayer

The completion of theArabidopsisgenome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration ofArabidopsisand other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.


1987 ◽  
Vol 36 (3) ◽  
pp. 297-312 ◽  
Author(s):  
J.O. Fellman ◽  
A.W. Eriksson

AbstractLinear regression models are used to explain the variations in the twinning rates. Data sets from different countries are analysed and maternal age, parity and marital status are the main regressors. The model building technique is also used in order to study the secular decline in the twinning rate. Linear regression technique makes it possible to compare the effect of different factors but the method requires sufficiently disaggregated data.


Author(s):  
Alina Andreica

Abstract The paper proposes design principles for data representation and simplification in order to design cloud services for data exchange between various information systems. We use equivalence algorithms and canonical representation in the cloud database. The solution we describe brings important advantages in organizational / entity communication and cooperation, with important societal benefits and can be provided within cloud architectures. The generic design principles we apply bring important advantages in the design of the interchange services.


Sign in / Sign up

Export Citation Format

Share Document