Integrating Diverse Data Sets to Assess the Risks of Airborne Pollutants

Author(s):  
R. O. McClellan ◽  
R. G. Cuddihy ◽  
W. C. Griffith ◽  
J. L. Mauderly
2011 ◽  
Vol 84 (8) ◽  
Author(s):  
Tracy Holsclaw ◽  
Ujjaini Alam ◽  
Bruno Sansó ◽  
Herbie Lee ◽  
Katrin Heitmann ◽  
...  

2017 ◽  
Vol 6 (2) ◽  
pp. 12
Author(s):  
Abhith Pallegar

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor. 


2020 ◽  
Author(s):  
Annika Tjuka ◽  
Robert Forkel ◽  
Johann-Mattis List

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.


Author(s):  
Logeswari Shanmugam ◽  
Premalatha K.

Biomedical literature is the primary repository of biomedical knowledge in which PubMed is the most absolute database for collecting, organizing and analyzing textual knowledge. The high dimensionality of the natural language text makes the text data quite noisy and sparse in the vector space. Hence, the data preprocessing and feature selection are important processes for the text processing issues. Ontologies select the meaningful terms semantically associated with the concepts from a document to reduce the dimensionality of the original text. In this chapter, semantic-based indexing approaches are proposed with cognitive search which makes use of domain ontology to extract relevant information from big and diverse data sets for users.


Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
Yu-Cheng Chou ◽  
David Ko ◽  
Harry H. Cheng ◽  
Roger L. Davis ◽  
Bo Chen

Two challenging problems in the area of scientific computation are long computation time and large-scale, distributed, and diverse data sets. As the scale of science and engineering applications rapidly expands, these two problems become more manifest than ever. This paper presents the concept of Mobile Agent-based Computational Steering (MACS) for distributed simulation. The MACS allows users to apply new or modified algorithms to a running application by altering certain sections of the program code without the need of stopping the execution and recompiling the program code. The concept has been validated through an application for dynamic CFD data post processing. The validation results show that the MACS has a great potential to enhance productivity and data manageability of large-scale distributed computational systems.


Author(s):  
Brian Hoeschen ◽  
Darcy Bullock ◽  
Mark Schlappi

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.


2019 ◽  
Vol 47 (1) ◽  
pp. 88-96 ◽  
Author(s):  
Juli M. Bollinger ◽  
Abhi Sanka ◽  
Lena Dolman ◽  
Rachel G. Liao ◽  
Robert Cook-Deegan

Accessing BRCA1/2 data facilitates the detection of disease-associated variants, which is critical to informing clinical management of risks. BRCA1/2 data sharing is complex and many practices exist. We describe current BRCA1/2 data-sharing practices, in the United States and globally, and discuss obstacles and incentives to sharing, based on 28 interviews with personnel at U.S. and non-U.S. clinical laboratories and databases. Our examination of the BRCA1/2 data-sharing landscape demonstrates strong support for and robust sharing of BRCA1/2 data around the world, increasing global accesses to diverse data sets.


2020 ◽  
Author(s):  
Mark Naylor ◽  
Kirsty Bayliss ◽  
Finn Lindgren ◽  
Francesco Serafini ◽  
Ian Main

<p>Many earthquake forecasting approaches have developed bespokes codes to model and forecast the spatio-temporal eveolution of seismicity. At the same time, the statistics community have been working on a range of point process modelling codes. For example, motivated by ecological applications, inlabru models spatio-temporal point processes as a log-Gaussian Cox Process and is implemented in R. Here we present an initial implementation of inlabru to model seismicity. This fully Bayesian approach is computationally efficient because it uses a nested Laplace approximation such that posteriors are assumed to be Gaussian so that their means and standard deviations can be deterministically estimated rather than having to be constructed through sampling. Further, building on existing packages in R to handle spatial data, it can construct covariate maprs from diverse data-types, such as fault maps, in an intutitive and simple manner.</p><p>Here we present an initial application to the California earthqauke catalogue to determine the relative performance of different data-sets for describing the spatio-temporal evolution of seismicity.</p>


Sign in / Sign up

Export Citation Format

Share Document