The Ersatz Brain Project

The Ersatz Brain Project develops programming techniques and software applications for a brain-like computing system. Its brain-like hardware architecture design is based on a select set of ideas taken from the anatomy of mammalian neo-cortex. In common with other such attempts it is based on a massively parallel, two-dimensional array of CPUs and their associated memory. The design used in this project: 1) Uses an approximation to cortical computation called the network of networks which holds that the basic computing unit in the cortex is not a single neuron but groups of neurons working together in attractor networks; 2) Assumes connections and data representations in cortex are sparse; 3) Makes extensive use of local lateral connections and topographic data representations, and 4) Scales in a natural way from small groups of neurons to the entire cortical regions. The resulting system computes effectively using techniques such as local data movement, sparse data representation, sparse connectivity, temporal coincidence, and the formation of discrete “module assemblies.” The authors discuss recent neuroscience in relation to their physiological assumptions and a set of experiments displaying what appear to be “concept-like” ensemble based cells in human cortex.

Download Full-text

Igniting Harmonized Digital Clinical Quality Measurement through Terminology, CQL, and FHIR

Applied Clinical Informatics ◽

10.1055/s-0039-3402755 ◽

2020 ◽

Vol 11 (01) ◽

pp. 023-033

Author(s):

Robert C. McClure ◽

Caroline L. Macumber ◽

Julia L. Skapik ◽

Anne Marie Smith

Keyword(s):

Care Delivery ◽

Quality Measurement ◽

Quality Measures ◽

Data Representation ◽

Evidence Based ◽

Data Mapping ◽

Local Data ◽

Clinical Quality ◽

Care Gaps ◽

Data Elements

Abstract Background Electronic clinical quality measures (eCQMs) seek to quantify the adherence of health care to evidence-based standards. This requires a high level of consistency to reduce the effort of data collection and ensure comparisons are valid. Yet, there is considerable variability in local data capture, in the use of data standards and in implemented documentation processes, so organizations struggle to implement quality measures and extract data reliably for comparison across patients, providers, and systems. Objective In this paper, we discuss opportunities for harmonization within and across eCQMs; specifically, at the level of the measure concept, the logical clauses or phrases, the data elements, and the codes and value sets. Methods The authors, experts in measure development, quality assurance, standards and implementation, reviewed measure structure and content to describe the state of the art for measure analysis and harmonization. Our review resulted in the identification of four measure component levels for harmonization. We provide examples for harmonization of each of the four measure components based on experience with current quality measurement programs including the Centers for Medicare and Medicaid Services eCQM programs. Results In general, there are significant issues with lack of harmonization across measure concepts, logical phrases, and data elements. This magnifies implementation problems, confuses users, and requires more elaborate data mapping and maintenance. Conclusion Comparisons using semantically equivalent data are needed to accurately measure performance and reduce workflow interruptions with the aim of reducing evidence-based care gaps. It comes as no surprise that electronic health record designed for purposes other than quality improvement and used within a fragmented care delivery system would benefit greatly from common data representation, measure harmony, and consistency. We suggest that by enabling measure authors and implementers to deliver consistent electronic quality measure content in four key areas; the industry can improve quality measurement.

Download Full-text

Transforming Graph Data for Statistical Relational Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3659 ◽

2012 ◽

Vol 45 ◽

pp. 363-441 ◽

Cited By ~ 19

Author(s):

R. A. Rossi ◽

L. K. McDowell ◽

D. W. Aha ◽

J. Neville

Keyword(s):

Relational Learning ◽

Statistical Relational Learning ◽

Data Representation ◽

Information Networks ◽

Biological Information ◽

Relational Data ◽

Symmetric Representation ◽

Graph Data ◽

Data Representations ◽

Relational Domains

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) system- atically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also dis- cuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.

Download Full-text

Data Layout and Data Representation Optimizations to Reduce Data Movement Keynote

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) ◽

10.1109/cgo51591.2021.9370325 ◽

2021 ◽

Author(s):

Mary Hall

Keyword(s):

Data Representation ◽

Data Layout ◽

Data Movement

Download Full-text

Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis

10.1101/2020.10.28.359828 ◽

2020 ◽

Author(s):

Serbulent Unsal ◽

Heval Ataş ◽

Muammer Albayrak ◽

Kemal Turhan ◽

Aybar C. Acar ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Complex Traits ◽

Representation Learning ◽

Data Representation ◽

Language Models ◽

Learning Methods ◽

Advantages And Disadvantages ◽

Data Representations ◽

Novel Methods

AbstractData-centric approaches have been utilized to develop predictive methods for elucidating uncharacterized aspects of proteins such as their functions, biophysical properties, subcellular locations and interactions. However, studies indicate that the performance of these methods should be further improved to effectively solve complex problems in biomedicine and biotechnology. A data representation method can be defined as an algorithm that calculates numerical feature vectors for samples in a dataset, to be later used in quantitative modelling tasks. Data representation learning methods do this by training and using a model that employs statistical and machine/deep learning algorithms. These novel methods mostly take inspiration from the data-driven language models that have yielded ground-breaking improvements in the field of natural language processing. Lately, these learned data representations have been applied to the field of protein informatics and have displayed highly promising results in terms of extracting complex traits of proteins regarding sequence-structure-function relations. In this study, we conducted a detailed investigation over protein representation learning methods, by first categorizing and explaining each approach, and then conducting benchmark analyses on; (i) inferring semantic similarities between proteins, (ii) predicting ontology-based protein functions, and (iii) classifying drug target protein families. We examine the advantages and disadvantages of each representation approach over the benchmark results. Finally, we discuss current challenges and suggest future directions. We believe the conclusions of this study will help researchers in applying machine/deep learning-based representation techniques on protein data for various types of predictive tasks. Furthermore, we hope it will demonstrate the potential of machine learning-based data representations for protein science and inspire the development of novel methods/tools to be utilized in the fields of biomedicine and biotechnology.

Download Full-text

Statismo - A framework for PCA based statistical models

The Insight Journal ◽

10.54294/4eli51 ◽

2012 ◽

Author(s):

Marcel Lüthi ◽

Remi Blanc ◽

Thomas Albrecht ◽

Tobias Gass ◽

Orcun Goksel ◽

...

Keyword(s):

Statistical Models ◽

Data Exchange ◽

Model Building ◽

Model Fitting ◽

Data Representation ◽

Data Format ◽

Training Samples ◽

Data Representations ◽

Easy Integration ◽

Storage Format

This paper describes the Statismo framework, which is a framework for PCA based statistical models.Statistical models are used to describe the variability of an object within a population, learned from a set of training samples. Originally developed to model shapes, statistical models are now increasingly used to model the variation in different kind of data, such as for example images, volumetric meshes or deformation fields. Statismo has been developed with the following main goals in mind: 1) To provide generic tools for learning different kinds of PCA based statistical models, such as shape, appearance or deformations models. 2) To make the exchange of such models easier among different research groups and to improve the reproducibility of the models. 3) To allow for easy integration of new methods for model building into the framework. To achieve the first goal, we have abstracted all the aspects that are specific to a given model and data representation, into a user defined class. This does not only make it possible to use Statismo to create different kinds of PCA models, but also allows Statismo to be used with any toolkit and data format. To facilitate data exchange, Statismo defines a storage format based on HDF5, which includes all the information necessary to use the model, as well as meta-data about the model creation, which helps to make model building reproducible. The last goal is achieved by providing a clear separation between data management, model building and model representation. In addition to the standard method for building PCA models, Statismo already includes two recently proposed algorithms for building conditional models, as well as convenience tools for facilitating cross-validation studies. Although Statismo has been designed to be independent of a particular toolkit, special efforts have been made to make it directly useful for VTK and ITK. Besides supporting model building for most data representations used by VTK and ITK, it also provides an ITK transform class, which allows for the integration of Statismo with the ITK registration framework. This leverages the efforts from the ITK project to readily access powerful methods for model fitting.

Download Full-text

GRID DEPLOYMENT OF BIOINFORMATICS APPLICATIONS: A CASE STUDY IN PROTEIN SIMILARITY DETERMINATION

Parallel Processing Letters ◽

10.1142/s0129626404001817 ◽

2004 ◽

Vol 14 (02) ◽

pp. 163-176 ◽

Cited By ~ 3

Author(s):

MATTEO COMIN ◽

CARLO FERRARI ◽

CONCETTINA GUERRA

Keyword(s):

Structural Similarity ◽

Data Representation ◽

Service Model ◽

Grid Infrastructure ◽

Data Set ◽

Structure Comparison ◽

Standard Tool ◽

Programming Techniques

In this paper we present a scenario for the grid immersion of the procedures that solve the protein structural similarity determination problem. The emphasis is on the way various computational components and data resources are tied together into a workflow to be executed on a grid. The grid deployment has been organized according to the bag-of-service model: a set of different modules (with their data set) is made available to the application designers. Each module deals with a specific subproblem using a proper protein data representation. At the design level, the process of task selection produces a first general workflow that establishes which subproblems need to be solved and their temporal relations. A further refinement requires to select a procedure for each previously identified task that solves it: the choice is made among different available methods and representations. The final outcome is an instance of the workflow ready for execution on a grid. Our approach to protein structure comparison is based on a combination of indexing and dynamic programming techniques to achieve fast and reliable matching. All the components have been implemented on a grid infrastructure using Globus, and the overall tool has been tested by choosing proteins from different fold classes. The obtained results are compared against SCOP, a standard tool for the classification of known proteins.

Download Full-text

PENGEMBANGAN ACADEMIC INFORMATION DASHBOARD EXECUTIVE (A-INDEX) DENGAN PENTAHO DATA INTEGRATION DAN QLIKVIEW

Telematika ◽

10.31315/telematika.v13i1.1716 ◽

2016 ◽

Vol 13 (1) ◽

pp. 17

Author(s):

Herry Sofyan ◽

Simon Pulung Nugroho

Keyword(s):

Open Source Software ◽

Data Representation ◽

Application Development ◽

Incremental Method ◽

Data Representations ◽

Student Profiles ◽

Student Graduation ◽

Academic Information ◽

Software Development Methodologies ◽

Free Open Source

Information Dashboard Executive (INDEX) is a visual representation of data in the form of dashboards that are used to get a snapshot of performance in every business process so as to facilitate the executives took a quick response. Pentaho is a BI application is free open source software (FOSS) and runs on top of the Java platform. QlikView is focused on simplifying decision making for business users across the organization. Processing needs to be able to optimize data analysis functions of PDPT is developing an interactive dashboard visualization data. The dashboard will be built using the data pentaho integration as a gateway connecting between database applications with Data PDPT and data visualization are developed by using QlikView. Software development methodologies in application development work is incremental method which is a combination of linear and iterative method with parallel modifications in the process the iterative process so that the project done faster.The results of this study are is the data representation of the modeling query is constructed able to describe the activity / student profiles in a certain semester. The data representations constructed include active distribution per class, per student graduation force distribution, distribution of student status, distribution provinces of origin of students per class, the distribution of the number of class participants, distribution of credits lecturers and distribution of subject.

Download Full-text

Efficient Local Data Movement in Shared-Memory Multiprocessor Systems

10.21236/ada604009 ◽

1987 ◽

Author(s):

Shin-Yuan Tzou ◽

David P. Anderson ◽

G. S. Graham

Keyword(s):

Shared Memory ◽

Multiprocessor Systems ◽

Local Data ◽

Data Movement ◽

Shared Memory Multiprocessor

Download Full-text

A COVID-19 Reopening Readiness Index: The Key to Opening up the Economy

10.1101/2020.05.22.20110577 ◽

2020 ◽

Cited By ~ 3

Author(s):

Eunju Suh ◽

Mahdi Alhaery

Keyword(s):

Economic Activity ◽

Healthcare Systems ◽

Data Driven ◽

Local Data ◽

Working Together ◽

Great Pressure ◽

State And Local ◽

Governmental Response ◽

Opening Up ◽

Made In

SummaryWith respect to reopening the economy as a result of the COVID-19 restrictions, governmental response and messaging have been inconsistent, and policies have varied by state as this is a uniquely polarizing topic. Considering the urgent need to return to normalcy, a method was devised to determine the degree of progress any state has made in containing the spread of COVID-19. Using various measures for each state including mortality, hospitalizations, testing capacity, number of infections and infection rate has allowed for the creation of a composite COVID -19 Reopening Readiness Index. This index can serve as a comprehensive reliable and simple-to-use metric to assess the level of containment in any state and to determine the level of risk in further opening. As states struggle to contain the outbreak and at the same time face great pressure in resuming economic activity, an index that provides a data-driven and objective insight is urgently needed.BackgroundWe are in the midst of a once-in-a-lifetime pandemic. All levels of society and governments are working together to “flatten the curve” of the infection and slow the spread of COVID-19. The universal goal is to mitigate its adverse effects on everyday life across the globe and to reduce the number of fatalities. While a vaccine is being developed, the aim is to limit the number of hospitalizations as not to overwhelm healthcare systems in any given city or country. It is well documented that certain regions and localities are more affected than others. It is imperative that containment efforts utilize state and local data at their disposal to understand the readiness of any given area prior to opening its economy, and the level of restrictions that are needed.

Download Full-text

AnScalable Matrix Computing Unit Architecture for FPGA,and SCUMO User Design Interface

Electronics ◽

10.3390/electronics8010094 ◽

2019 ◽

Vol 8 (1) ◽

pp. 94 ◽

Cited By ~ 4

Author(s):

Asgar Abbaszadeh ◽

Taras Iakymchuk ◽

Manuel Bataller-Mompeán ◽

Jose Francés-Villora ◽

Alfredo Rosado-Muñoz

Keyword(s):

Machine Learning Algorithms ◽

Circulant Matrices ◽

Clock Frequency ◽

Design Environment ◽

Data Movement ◽

Computing Unit ◽

Matrix Operations ◽

User Design ◽

The Individual ◽

Matrix Vector

High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations’ performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O ( N 2 ) , requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.

Download Full-text