scholarly journals The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas

2020 ◽  
Vol 2 (3) ◽  
pp. 379-416 ◽  
Author(s):  
Angelo A. Salatino ◽  
Thiviyan Thanapalasingam ◽  
Andrea Mannocci ◽  
Aliaksandr Birukou ◽  
Francesco Osborne ◽  
...  

Ontologies of research areas are important tools for characterizing, exploring, and analyzing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large data set of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a Web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualize sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data.

Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Esteban Moro ◽  
Dan Calacci ◽  
Xiaowen Dong ◽  
Alex Pentland

AbstractTraditional understanding of urban income segregation is largely based on static coarse-grained residential patterns. However, these do not capture the income segregation experience implied by the rich social interactions that happen in places that may relate to individual choices, opportunities, and mobility behavior. Using a large-scale high-resolution mobility data set of 4.5 million mobile phone users and 1.1 million places in 11 large American cities, we show that income segregation experienced in places and by individuals can differ greatly even within close spatial proximity. To further understand these fine-grained income segregation patterns, we introduce a Schelling extension of a well-known mobility model, and show that experienced income segregation is associated with an individual’s tendency to explore new places (place exploration) as well as places with visitors from different income groups (social exploration). Interestingly, while the latter is more strongly associated with demographic characteristics, the former is more strongly associated with mobility behavioral variables. Our results suggest that mobility behavior plays an important role in experienced income segregation of individuals. To measure this form of income segregation, urban researchers should take into account mobility behavior and not only residential patterns.


2019 ◽  
Author(s):  
Reto Sterchi ◽  
Pascal Haegeli ◽  
Patrick Mair

Abstract. While guides in mechanized skiing operations use a well-established terrain selection process to limit their exposure to avalanche hazard and keep the residual risk at an acceptable level, the relationship between the open/closed status of runs and environmental factors is complex and has so far only received limited attention from research. Using a large data set of over 25 000 operational run list codes from a mechanized skiing operation, we applied a general linear mixed effects model to explore the relationship between acceptable skiing terrain (i.e., status open) and avalanche hazard conditions. Our results show that the magnitude of the effect of avalanche hazard on run list codes depends on the type of terrain that is being assessed by the guiding team. Ski runs in severe alpine terrain with steep lines through large avalanche slopes are much more susceptible to increases in avalanche hazard than less severe terrain. However, our results also highlight the strong effects of recent skiing on the run coding and thus the importance of prior first-hand experience. Expressing these relationships numerically provides an important step towards the development of meaningful decision aids, which can assist commercial operations to manage their avalanche risk more effectively and efficiently.


2020 ◽  
Author(s):  
Markus Wiedemann ◽  
Bernhard S.A. Schuberth ◽  
Lorenzo Colli ◽  
Hans-Peter Bunge ◽  
Dieter Kranzlmüller

<p>Precise knowledge of the forces acting at the base of tectonic plates is of fundamental importance, but models of mantle dynamics are still often qualitative in nature to date. One particular problem is that we cannot access the deep interior of our planet and can therefore not make direct in situ measurements of the relevant physical parameters. Fortunately, modern software and powerful high-performance computing infrastructures allow us to generate complex three-dimensional models of the time evolution of mantle flow through large-scale numerical simulations.</p><p>In this project, we aim at visualizing the resulting convective patterns that occur thousands of kilometres below our feet and to make them "accessible" using high-end virtual reality techniques.</p><p>Models with several hundred million grid cells are nowadays possible using the modern supercomputing facilities, such as those available at the Leibniz Supercomputing Centre. These models provide quantitative estimates on the inaccessible parameters, such as buoyancy and temperature, as well as predictions of the associated gravity field and seismic wavefield that can be tested against Earth observations.</p><p>3-D visualizations of the computed physical parameters allow us to inspect the models such as if one were actually travelling down into the Earth. This way, convective processes that occur thousands of kilometres below our feet are virtually accessible by combining the simulations with high-end VR techniques.</p><p>The large data set used here poses severe challenges for real time visualization, because it cannot fit into graphics memory, while requiring rendering with strict deadlines. This raises the necessity to balance the amount of displayed data versus the time needed for rendering it.</p><p>As a solution, we introduce a rendering framework and describe our workflow that allows us to visualize this geoscientific dataset. Our example exceeds 16 TByte in size, which is beyond the capabilities of most visualization tools. To display this dataset in real-time, we reduce and declutter the dataset through isosurfacing and mesh optimization techniques.</p><p>Our rendering framework relies on multithreading and data decoupling mechanisms that allow to upload data to graphics memory while maintaining high frame rates. The final visualization application can be executed in a CAVE installation as well as on head mounted displays such as the HTC Vive or Oculus Rift. The latter devices will allow for viewing our example on-site at the EGU conference.</p>


2016 ◽  
Author(s):  
Aleksey V. Zimin ◽  
Daniela Puiu ◽  
Ming-Cheng Luo ◽  
Tingting Zhu ◽  
Sergey Koren ◽  
...  

AbstractLong sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and highly repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.


2007 ◽  
Vol 4 (5) ◽  
pp. 3639-3671 ◽  
Author(s):  
A. V. Borges ◽  
B. Tilbrook ◽  
N. Metzl ◽  
A. Lenton ◽  
B. Delille

Abstract. We compiled a large data-set from 22 cruises spanning from 1991 to 2003, of the partial pressure of CO2 (pCO2) in surface waters over the continental shelf (CS) and adjacent open ocean (43° to 46° S; 145° to 150° E), south of Tasmania. Sea surface temperature (SST) anomalies (as intense as 2°C) are apparent in the subtropical zone (STZ) and subAntarctic zone (SAZ). These SST anomalies also occur on the CS, and seem to be related to large-scale coupled atmosphere-ocean oscillations. Anomalies of pCO2 normalized to a constant temperature are negatively related to SST anomalies. A depressed winter-time vertical input of dissolved inorganic carbon (DIC) during phases of positive SST anomalies, related to a poleward shift of westerly winds, and a concomitant local decrease in wind stress are the likely cause of the negative relationship between pCO2 and SST anomalies. The observed trend is an increase of the sink for atmospheric CO2 associated with positive SST anomalies, although strongly modulated by inter-annual variability of wind speed. Assuming that phases of positive SST anomalies are indicative of the future evolution of regional ocean biogeochemistry under global warming, we show using a purely observational based approach that some provinces of the Southern Ocean could provide a potential negative feedback on increasing atmospheric CO2.


2019 ◽  
Author(s):  
Sylvain Lehmann ◽  
Christophe Hirtz ◽  
Jérôme Vialaret ◽  
Maxence Ory ◽  
Guillaume Gras Combes ◽  
...  

SummaryThe extraction of accurate physiological parameters from clinical samples provides a unique perspective to understand disease etiology and evolution, including under therapy. We introduce a new proteomics framework to map patient proteome dynamics in vivo, either proteome wide or in large targeted panels. We applied it to ventricular cerebrospinal fluid (CSF) and could determine the turnover parameters of almost 200 proteins, whereas a handful were known previously. We covered a large number of neuron biology- and immune system-related proteins including many biomarkers and drug targets. This first large data set unraveled a significant relationship between turnover and protein origin that relates to our ability to investigate the central nervous system physiology precisely in future studies. Our data constitute a reference in CSF biology as well as a repertoire of peptides for the community to design new proteome dynamics analyses. The disclosed methods apply to other fluids or tissues provided sequential sample collection can be performed.


2021 ◽  
Author(s):  
Jin Kim

This article presents Exploratory Only: an intuitive tool for conducting large-scale exploratory analyses easily and quickly. Available in three forms (as a web application, standalone program, and R Package) and launched as a point-and-click interface, Exploratory Only allows researchers to conduct all possible correlation, moderation, and mediation analyses among selected variables in their data set with minimal effort and time. Compared to a popular alternative, SPSS, Exploratory Only is shown to be orders of magnitude easier and faster at conducting exploratory analyses. The article demonstrates how to use Exploratory Only and discusses the caveat to using it. As long as researchers use Exploratory Only as intended—to discover novel hypotheses to investigate in follow-up studies, rather than to confirm nonexistent a priori hypotheses (i.e., p-hacking)—Exploratory Only can promote progress in behavioral science by encouraging more exploratory analyses and therefore more discoveries.


Author(s):  
Vikrant Tiwari ◽  
Nimisha Sharma

In the absence of the detailed COVID-19 epidemiological data or large benchmark studies, an effort has been made to explore and correlate the relation of parameters like environment, economic indicators, and the large scale exposure of different prevalent diseases, with COVID-19 spread and severity amongst the different countries affected by COVID-19. Data for environmental, socio-economic and others important infectious diseases were collected from reliable and open source resources like World Health Organization, World Bank, etc. Further, this large data set is utilized to understand the COVID-19 worldwide spread using simple statistical tools. Important observations that are made in this study are the high degree of resemblance in the pattern of temperature and humidity distribution among the cities severely affected by COVID-19. Further, It is surprising to see that in spite of the presence of many environmental parameters that are considered favorable (like clean air, clean water, EPI, etc.), many countries are suffering with the severe consequences of this disease. Lastly a noticeable segregation among the locations affected by different prevalent diseases (like Malaria, HIV, Tuberculosis, and Cholera) was also observed. Among the considered environmental factors, temperature, humidity and EPI should be an important parameter in understanding and modelling COVID-19 spreads. Further, contrary to intuition, countries with strong economies, good health infrastructure and cleaner environment suffered disproportionately higher with the severity of this disease. Therefore, policymaker should sincerely review their country preparedness toward the potential future contagious diseases, weather natural or manmade.


Author(s):  
Angelo A. Salatino ◽  
Thiviyan Thanapalasingam ◽  
Andrea Mannocci ◽  
Francesco Osborne ◽  
Enrico Motta

Sign in / Sign up

Export Citation Format

Share Document