open datasets
Recently Published Documents


TOTAL DOCUMENTS

147
(FIVE YEARS 115)

H-INDEX

7
(FIVE YEARS 5)

Author(s):  
Evgeny Shirinyan ◽  
Dessislava Petrova-Antonova

3D city models integrate heterogeneous urban data from multiple sources in a unified geospatial representation, combining both semantics and geometry. Although in the last decades, they are predominantly used for visualization, today they are used in a large range of tasks related to exploration, analysis, and management across multiple domains. The complexity of urban processes and the diversity of urban environment bring challenges to the implementation of 3D city models. To address such challenges, this paper presents the development process of a 3D city model of a single neighborhood in Sofia city based on CityGML 2.0 standard. The model represents the buildings in LOD1 with a focus on CityGML features of related to the buildings like building part, terrain intersection curve and address. Similar building models of 18 cities provided as open datasets are explored and compared in order to extract good modeling practices. As a result, workflows for generation of 3D building models in LOD1 are elaborated and improvements in the feature modeling are proposed. Two options of building model are examined: modeling of a building as a single solid and modeling of a building with separate building parts. Finally, the possibilities for visualization of the model in popular platforms such as ArcGIS Pro and Cesium Ion are explored.


2021 ◽  
Vol 3 ◽  
Author(s):  
Ahmed Al-Hindawi ◽  
Ahmed Abdulaal ◽  
Timothy M. Rawson ◽  
Saleh A. Alqahtani ◽  
Nabeela Mughal ◽  
...  

The SARS-CoV-2 virus, which causes the COVID-19 pandemic, has had an unprecedented impact on healthcare requiring multidisciplinary innovation and novel thinking to minimize impact and improve outcomes. Wide-ranging disciplines have collaborated including diverse clinicians (radiology, microbiology, and critical care), who are working increasingly closely with data-science. This has been leveraged through the democratization of data-science with the increasing availability of easy to access open datasets, tutorials, programming languages, and hardware which makes it significantly easier to create mathematical models. To address the COVID-19 pandemic, such data-science has enabled modeling of the impact of the virus on the population and individuals for diagnostic, prognostic, and epidemiological ends. This has led to two large systematic reviews on this topic that have highlighted the two different ways in which this feat has been attempted: one using classical statistics and the other using more novel machine learning techniques. In this review, we debate the relative strengths and weaknesses of each method toward the specific task of predicting COVID-19 outcomes.


Author(s):  
R. Jisha Raj ◽  
Smitha Dharan ◽  
T. T. Sunil

Cultural dances are practiced all over the world. The study of various gestures of the performer using computer vision techniques can help in better understanding of these dance forms and for annotation purposes. Bharatanatyam is a classical dance that originated in South India. Bharatanatyam performer uses hand gestures (mudras), facial expressions and body movements to communicate to the audience the intended meaning. According to Natyashastra, a classical text on Indian dance, there are 28 Asamyukta Hastas (single-hand gestures) and 23 Samyukta Hastas (Double-hand gestures) in Bharatanatyam. Open datasets on Bharatanatyam dance gestures are not presently available. An exhaustive open dataset comprising of various mudras in Bharatanatyam was created. The dataset consists of 15[Formula: see text]396 distinct single-hand mudra images and 13[Formula: see text]035 distinct double-hand mudra images. In this paper, we explore the dataset using various multidimensional visualization techniques. PCA, Kernel PCA, Local Linear Embedding, Multidimensional Scaling, Isomap, t-SNE and PCA–t-SNE combination are being investigated. The best visualization for exploration of the dataset is obtained using PCA–t-SNE combination.


2021 ◽  
Author(s):  
Kyle Aitken ◽  
Marina Garrett ◽  
Shawn Olsen ◽  
Stefan Mihalas

Neurons in sensory areas encode/represent stimuli. Surprisingly, recent studies have suggest that, even during persistent performance, these representations are not stable and change over the course of days and weeks. We examine stimulus representations from fluorescence recordings across hundreds of neurons in the visual cortex using in vivo two-photon calcium imaging and we corroborate previous studies finding that such representations change as experimental trials are repeated across days. This phenomenon has been termed "representational drift". In this study we geometrically characterize the properties of representational drift in the primary visual cortex of mice in two open datasets from the Allen Institute and propose a potential mechanism behind such drift. We observe representational drift both for passively presented stimuli, as well as for stimuli which are behaviorally relevant. Across experiments, the drift most often occurs along directions that have the most variance, leading to a significant turnover in the neurons used for a given representation. Interestingly, despite this significant change due to drift, linear classifiers trained to distinguish neuronal representations show little to no degradation in performance across days. The features we observe in the neural data are similar to properties of artificial neural networks where representations are updated by continual learning in the presence of dropout, i.e. a random masking of nodes/weights, but not other types of noise. Therefore, we conclude that a potential reason for the representational drift in biological networks is driven by an underlying dropout-like noise while continuously learning and that such a mechanism may be computational advantageous for the brain in the same way it is for artificial neural networks, e.g. preventing overfitting.


2021 ◽  
Vol 8 (1) ◽  
pp. 21
Author(s):  
Ritchie Heirmans ◽  
Olivier De Moor ◽  
Simon Verspeek ◽  
Sander De Vrieze ◽  
Bart Ribbens ◽  
...  

The aim of this research topic and paper is to investigate the application possibilities of vision technology in the textile industry. These include RGB, active thermography and hyperspectral imaging techniques. In the future, this approach will be supplemented by a machine learning algorithm (e.g., in Matlab or Python) to enable the detection of defects in textiles and to correctly categorize these defects. In the first place, the various options for building such a convolutional neural network are discussed. The focus was on the models used in the literature. Based on the effectiveness of these ML models and the feasibility to build them, choices can be made to determine the most suitable models. Sufficient samples are an important link to properly train a model. Because there is a shortage of open data, it is also discussed how samples obtained from the textile industry, were measured in the lab. At first, we will limit ourselves to the five most common defects. In a later phase of research, the results with this dataset and the open datasets are benchmarked against the results from the literature.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Meijing Li ◽  
Tianjie Chen ◽  
Keun Ho Ryu ◽  
Cheng Hao Jin

Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.


2021 ◽  
Vol 9 ◽  
Author(s):  
Alessandro Campanaro ◽  
Francesco Parisi

We present six datasets of saproxylic beetles collected between 2012 and 2018 in Central and Southern Italian forests. Saproxylics represent one of the main components in forest ecosystems in terms of diversity, species richness and functional traits and, for this reason, they are an important target group for studying the modification of forests over time. The datasets consist of annotated checklists and were published on Zenodo repository. Overall, 1,171 records are published, corresponding to 918 taxa (taxonomy at species or subspecies level). The taxa are scarcely shared amongst the areas, 80.2% of them are exclusive, indicating that the beetle communities are substantially different. In consideration of the biodiversity crisis we are passing through, which is especially dramatic for the insects, we want to promotecollaboration amongst researchers for making datasets available in open repositories. This will improve the possibility for researchers and forest managers of analysing the state of species distribution that could serve for long-term studies on the variation of insect communities. We encourage repeating species assessment in the same localities in order to evaluate the trends in insect communities over time and space.


2021 ◽  
pp. 1-16
Author(s):  
Hiromi Nakagawa ◽  
Yusuke Iwasawa ◽  
Yutaka Matsuo

Recent advancements in computer-assisted learning systems have caused an increase in the research in knowledge tracing, wherein student performance is predicted over time. Student coursework can potentially be structured as a graph. Incorporating this graph-structured nature into a knowledge tracing model as a relational inductive bias can improve its performance; however, previous methods, such as deep knowledge tracing, did not consider such a latent graph structure. Inspired by the recent successes of graph neural networks (GNNs), we herein propose a GNN-based knowledge tracing method, i.e., graph-based knowledge tracing. Casting the knowledge structure as a graph enabled us to reformulate the knowledge tracing task as a time-series node-level classification problem in the GNN. As the knowledge graph structure is not explicitly provided in most cases, we propose various implementations of the graph structure. Empirical validations on two open datasets indicated that our method could potentially improve the prediction of student performance and demonstrated more interpretable predictions compared to those of the previous methods, without the requirement of any additional information.


2021 ◽  
Author(s):  
honglin wen ◽  
Pierre Pinson ◽  
jinghuan ma ◽  
jie gu ◽  
Zhijiang Jin

We present a data-driven approach for probabilistic wind power forecasting based on conditional normalizing flow~(CNF). In contrast with the existing, this approach is distribution-free (as for non-parametric and quantile-based approaches) and can directly yield continuous probability densities, hence avoiding quantile crossing. It relies on a base distribution and a set of bijective mappings. Both the shape parameters of the base distribution and the bijective mappings are approximated with neural networks. Spline-based conditional normalizing flow is considered owing to its universal approximation capability. Over the training phase, the model sequentially maps input examples onto samples of base distribution, where parameters are estimated through maximum likelihood. To issue probabilistic forecasts, one eventually map samples of the base distribution into samples of a desired distribution. Case studies based on open datasets validate the effectiveness of the proposed model, and allows us to discuss its advantages and caveats with respect to the state of the art. Code will be released upon publication.


2021 ◽  
Author(s):  
honglin wen ◽  
Pierre Pinson ◽  
jinghuan ma ◽  
jie gu ◽  
Zhijiang Jin

We present a data-driven approach for probabilistic wind power forecasting based on conditional normalizing flow~(CNF). In contrast with the existing, this approach is distribution-free (as for non-parametric and quantile-based approaches) and can directly yield continuous probability densities, hence avoiding quantile crossing. It relies on a base distribution and a set of bijective mappings. Both the shape parameters of the base distribution and the bijective mappings are approximated with neural networks. Spline-based conditional normalizing flow is considered owing to its universal approximation capability. Over the training phase, the model sequentially maps input examples onto samples of base distribution, where parameters are estimated through maximum likelihood. To issue probabilistic forecasts, one eventually map samples of the base distribution into samples of a desired distribution. Case studies based on open datasets validate the effectiveness of the proposed model, and allows us to discuss its advantages and caveats with respect to the state of the art. Code will be released upon publication.


Sign in / Sign up

Export Citation Format

Share Document