multidimensional datasets
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 14)

H-INDEX

8
(FIVE YEARS 3)

2021 ◽  
Vol 12 (1) ◽  
pp. 331
Author(s):  
Szymon Bobek ◽  
Sławomir K. Tadeja ◽  
Łukasz Struski ◽  
Przemysław Stachura ◽  
Timoleon Kipouros ◽  
...  

We present a refinement of the Immersive Parallel Coordinates Plots (IPCP) system for Virtual Reality (VR). The evolved system provides data-science analytics built around a well-known method for visualization of multidimensional datasets in VR. The data-science analytics enhancements consist of importance analysis and a number of clustering algorithms including a novel SuMC (Subspace Memory Clustering) solution. These analytical methods were applied to both the main visualizations and supporting cross-dimensional scatter plots. They automate part of the analytical work that in the previous version of IPCP had to be done by an expert. We test the refined system with two sample datasets that represent the optimum solutions of two different multi-objective optimization studies in turbomachinery. The first one describes 54 data items with 29 dimensions (DS1), and the second 166 data items with 39 dimensions (DS2). We include the details of these methods as well as the reasoning behind selecting some methods over others.


2021 ◽  
Vol 11 (17) ◽  
pp. 8240
Author(s):  
Cid Mathew Santiago Adolfo ◽  
Hassan Chizari ◽  
Thu Yein Win ◽  
Salah Al-Majeed

With its potential, extensive data analysis is a vital part of biomedical applications and of medical practitioner interpretations, as data analysis ensures the integrity of multidimensional datasets and improves classification accuracy; however, with machine learning, the integrity of the sources is compromised when the acquired data pose a significant threat in diagnosing and analysing such information, such as by including noisy and biased samples in the multidimensional datasets. Removing noisy samples in dirty datasets is integral to and crucial in biomedical applications, such as the classification and prediction problems using artificial neural networks (ANNs) in the body’s physiological signal analysis. In this study, we developed a methodology to identify and remove noisy data from a dataset before addressing the classification problem of an artificial neural network (ANN) by proposing the use of the principal component analysis–sample reduction process (PCA–SRP) to improve its performance as a data-cleaning agent. We first discuss the theoretical background to this data-cleansing methodology in the classification problem of an artificial neural network (ANN). Then, we discuss how the PCA is used in data-cleansing techniques through a sample reduction process (SRP) using various publicly available biomedical datasets with different samples and feature sizes. Lastly, the cleaned datasets were tested through the following: PCA–SRP in ANN accuracy comparison testing, sensitivity vs. specificity testing, receiver operating characteristic (ROC) curve testing, and accuracy vs. additional random sample testing. The results show a significant improvement in the classification of ANNs using the developed methodology and suggested a recommended range of selectivity (Sc) factors for typical cleaning and ANN applications. Our approach successfully cleaned the noisy biomedical multidimensional datasets and yielded up to an 8% increase in accuracy with the aid of the Python language.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Federico Claudi ◽  
Adam L Tyson ◽  
Luigi Petrucco ◽  
Troy W Margrie ◽  
Ruben Portugues ◽  
...  

Three-dimensional (3D) digital brain atlases and high-throughput brain wide imaging techniques generate large multidimensional datasets that can be registered to a common reference frame. Generating insights from such datasets depends critically on visualization and interactive data exploration, but this a challenging task. Currently available software is dedicated to single atlases, model species or data types, and generating 3D renderings that merge anatomically registered data from diverse sources requires extensive development and programming skills. Here, we present brainrender: an open-source Python package for interactive visualization of multidimensional datasets registered to brain atlases. Brainrender facilitates the creation of complex renderings with different data types in the same visualization and enables seamless use of different atlas sources. High-quality visualizations can be used interactively and exported as high-resolution figures and animated videos. By facilitating the visualization of anatomically registered data, brainrender should accelerate the analysis, interpretation, and dissemination of brain-wide multidimensional data.


2020 ◽  
Vol 5 ◽  
pp. 96
Author(s):  
Johnny Hay ◽  
Eilidh Troup ◽  
Ivan Clark ◽  
Julian Pietsch ◽  
Tomasz Zieliński ◽  
...  

Tools and software that automate repetitive tasks, such as metadata extraction and deposition to data repositories, are essential for researchers to share Open Data, routinely. For research that generates microscopy image data, OMERO is an ideal platform for storage, annotation and publication according to open research principles. We present PyOmeroUpload, a Python toolkit for automatically extracting metadata from experiment logs and text files, processing images and uploading these payloads to OMERO servers to create fully annotated, multidimensional datasets. The toolkit comes packaged in portable, platform-independent Docker images that enable users to deploy and run the utilities easily, regardless of Operating System constraints. A selection of use cases is provided, illustrating the primary capabilities and flexibility offered with the toolkit, along with a discussion of limitations and potential future extensions. PyOmeroUpload is available from: https://github.com/SynthSys/pyOmeroUpload.


2020 ◽  
Vol 5 ◽  
pp. 96
Author(s):  
Johnny Hay ◽  
Eilidh Troup ◽  
Ivan Clark ◽  
Julian Pietsch ◽  
Tomasz Zieliński ◽  
...  

Tools and software that automate repetitive tasks, such as metadata extraction and deposition to data repositories, are essential for researchers to share Open Data, routinely. For research that generates microscopy image data, OMERO is an ideal platform for storage, annotation and publication according to open research principles. We present PyOmeroUpload, a Python toolkit for automatically extracting metadata from experiment logs and text files, processing images and uploading these payloads to OMERO servers to create fully annotated, multidimensional datasets. The toolkit comes packaged in portable, platform-independent Docker images that enable users to deploy and run the utilities easily, regardless of Operating System constraints. A selection of use cases is provided, illustrating the primary capabilities and flexibility offered with the toolkit, along with a discussion of limitations and potential future extensions. PyOmeroUpload is available from: https://github.com/SynthSys/pyOmeroUpload.


2020 ◽  
Vol 12 (8) ◽  
pp. 1270
Author(s):  
Peiqing Lou ◽  
Bolin Fu ◽  
Hongchang He ◽  
Ying Li ◽  
Tingyuan Tang ◽  
...  

Discriminating marsh vegetation is critical for the rapid assessment and management of wetlands. The study area, Honghe National Nature Reserve (HNNR), a typical freshwater wetland, is located in Northeast China. This study optimized the parameters (mtry and ntrees) of an object-based random forest (RF) algorithm to improve the applicability of marsh vegetation classification. Multidimensional datasets were used as the input variables for model training, then variable selection was performed on the variables to eliminate redundancy, which improved classification efficiency and overall accuracy. Finally, the performance of a new generation of Chinese high-spatial-resolution Gaofen-1 (GF-1) and Ziyuan-3 (ZY-3) satellite images for marsh vegetation classification was evaluated using the improved object-based RF algorithm with accuracy assessment. The specific conclusions of this study are as follows: (1) Optimized object-based RF classifications consistently produced more than 70.26% overall accuracy for all scenarios of GF-1 and ZY-3 at the 95% confidence interval. The performance of ZY-3 imagery applied to marsh vegetation mapping is lower than that of GF-1 imagery due to the coarse spatial resolution. (2) Parameter optimization of the object-based RF algorithm effectively improved the stability and classification accuracy of the algorithm. After parameter adjustment, scenario 3 for GF-1 data had the highest classification accuracy of 84% (ZY-3 is 74.72%) at the 95% confidence interval. (3) The introduction of multidimensional datasets improved the overall accuracy of marsh vegetation mapping, but with many redundant variables. Using three variable selection algorithms to remove redundant variables from the multidimensional datasets effectively improved the classification efficiency and overall accuracy. The recursive feature elimination (RFE)-based variable selection algorithm had the best performance. (4) Optical spectral bands, spectral indices, mean value of green and NIR bands in textural information, DEM, TWI, compactness, max difference, and shape index are valuable variables for marsh vegetation mapping. (5) GF-1 and ZY-3 images had higher classification accuracy for forest, cropland, shrubs, and open water.


Author(s):  
Federico Claudi ◽  
Adam L. Tyson ◽  
Luigi Petrucco ◽  
Troy W. Margrie ◽  
Ruben Portugues ◽  
...  

The recent development of high-resolution three-dimensional (3D) digital brain atlases and high-throughput brain wide imaging techniques has fueled the generation of large datasets that can be registered to a common reference frame. This registration facilitates integrating data from different sources and resolutions to assemble rich multidimensional datasets. Generating insights from these new types of datasets depends critically on the ability to easily visualize and explore the data in an interactive manner. This is, however, a challenging task. Currently available software is dedicated to single atlases, model species or data types, and generating 3D renderings that merge anatomically registered data from diverse sources requires extensive development and programming skills. To address this challenge, we have developed brainrender: a generic, open-source Python package for simultaneous and interactive visualization of multidimensional datasets registered to brain atlases. Brainrender has been designed to facilitate the creation of complex custom renderings and can be used programmatically or through a graphical user interface. It can easily render different data types in the same visualization, including user-generated data, and enables seamless use of different brain atlases using the same code base. In addition, brainrender generates high-quality visualizations that can be used interactively and exported as high-resolution figures and animated videos. By facilitating the visualization of anatomically registered data, brainrender should accelerate the analysis, interpretation, and dissemination of brain-wide multidimensional data.


2019 ◽  
Vol 19 (2) ◽  
Author(s):  
Violette Abergel ◽  
Kévin Jacquot ◽  
Livio De Luca ◽  
Philippe Veron

The digital technologies developed in recent decades have considerably enriched the survey and documentation practices in the field of cultural heritage. They now raise new issues and challenges, particularly in the management of multidimensional datasets, which require the development of new methods for the analysis, interpretation and sharing of heterogeneous data. In the case of rock art sites, additional challenges are added to this context, due to their nature and fragility. In many cases, digital data alone is not sufficient to meet contextualization, analysis or traceability needs. In this context, we propose to develop an application dedicated to rock art survey, allowing 3D annotation in augmented reality. This work is a part of an ongoing project about an information system dedicated to cultural heritage documentation. For this purpose, we propose a registration method based on a spatial resection calculation. We will also raise the perspectives that this opens up for heritage survey and documentation, in particular in terms of visualization enhancement.


2019 ◽  
Author(s):  
David Haan ◽  
Ruikang Tao ◽  
Verena Friedl ◽  
Ioannis Nikolaos Anastopoulos ◽  
Christopher K Wong ◽  
...  

Cancer genome projects have produced multidimensional datasets on thousands of samples. Yet, depending on the tumor type, 5-50% of samples have no known driving event. We introduce a semi-supervised method called Learning UnRealized Events (LURE) that uses a progressive label learning framework and minimum spanning analysis to predict cancer drivers based on their altered samples sharing a gene expression signature with the samples of a known event. We demonstrate the utility of the method on the TCGA dataset for which it produced a high-confidence result relating 53 new to 18 known mutation events including alterations in the same gene, family, and pathway. We give examples of predicted drivers involved in TP53, telomere maintenance, and MAPK/RTK signaling pathways. LURE identifies connections between genes with no known prior relationship, some of which may offer clues for targeting specific forms of cancer. Code and Supplemental Material are available on the LURE website https://sysbiowiki.soe.ucsc.edu/lure.


Sign in / Sign up

Export Citation Format

Share Document