scholarly journals Clustering of Biological Datasets in the Era of Big Data

2016 ◽  
Vol 13 (1) ◽  
pp. 52-81 ◽  
Author(s):  
Richard Röttger

SummaryClustering is a long-standing problem in computer science and is applied in virtually any scientific field for exploring the inherent structure of datasets. In biomedical research, clustering tools have been utilized in manifold areas, among many others in expression analysis, disease subtyping or protein research. A plethora of different approaches have been developed but there is only little guideline what approach is the optimal in what particular situation. Furthermore, a typical cluster analysis is an entire process with several highly interconnected steps; from preprocessing, proximity calculation, the actual clustering to evaluation and optimization. Only when all steps seamlessly work together, an optimal result can be achieved. This renders a cluster analyses tiresome and error-prone especially for non-experts. A mere trial-and-error approach renders increasingly infeasible when considering the tremendous growth of available datasets; thus, a strategic and thoughtful course of action is crucial for a cluster analysis. This manuscript provides an overview of the crucial steps and the most common techniques involved in conducting a state-of-the-art cluster analysis of biomedical datasets.

Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


2021 ◽  
pp. 088541222199424
Author(s):  
Mauro Francini ◽  
Lucia Chieffallo ◽  
Annunziata Palermo ◽  
Maria Francesca Viapiana

This work aims to reorganize theoretical and empirical research on smart mobility through the systematic literature review approach. The research goal is to reach an extended and shared definition of smart mobility using the cluster analysis. The article provides a summary of the state of the art that can have broader impacts in determining new angles for approaching research. In particular, the results will be a reference for future quantitative developments for the authors who are working on the construction of a territorial measurement model of the smartness degree, helping them in identifying performance indicators consistent with the definition proposed.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tian J. Ma ◽  
Rudy J. Garcia ◽  
Forest Danford ◽  
Laura Patrizi ◽  
Jennifer Galasso ◽  
...  

AbstractThe amount of data produced by sensors, social and digital media, and Internet of Things (IoTs) are rapidly increasing each day. Decision makers often need to sift through a sea of Big Data to utilize information from a variety of sources in order to determine a course of action. This can be a very difficult and time-consuming task. For each data source encountered, the information can be redundant, conflicting, and/or incomplete. For near-real-time application, there is insufficient time for a human to interpret all the information from different sources. In this project, we have developed a near-real-time, data-agnostic, software architecture that is capable of using several disparate sources to autonomously generate Actionable Intelligence with a human in the loop. We demonstrated our solution through a traffic prediction exemplar problem.


2021 ◽  
Vol 7 (6) ◽  
pp. 96
Author(s):  
Alessandro Rossi ◽  
Marco Barbiero ◽  
Paolo Scremin ◽  
Ruggero Carli

Industrial 3D models are usually characterized by a large number of hidden faces and it is very important to simplify them. Visible-surface determination methods provide one of the most common solutions to the visibility problem. This study presents a robust technique to address the global visibility problem in object space that guarantees theoretical convergence to the optimal result. More specifically, we propose a strategy that, in a finite number of steps, determines if each face of the mesh is globally visible or not. The proposed method is based on the use of Plücker coordinates that allows it to provide an efficient way to determine the intersection between a ray and a triangle. This algorithm does not require pre-calculations such as estimating the normal at each face: this implies the resilience to normals orientation. We compared the performance of the proposed algorithm against a state-of-the-art technique. Results showed that our approach is more robust in terms of convergence to the maximum lossless compression.


Author(s):  
BIN XU ◽  
YUAN YAN TANG ◽  
BIN FANG ◽  
ZHAO WEI SHANG

In this paper, a novel approach derived from image gradient domain called multi-scale gradient faces (MGF) is proposed to abstract multi-scale illumination-insensitive measure for face recognition. MGF applies multi-scale analysis on image gradient information, which can discover underlying inherent structure in images and keep the details at most while removing varying lighting. The proposed approach provides state-of-the-art performance on Extended YaleB and PIE: Recognition rates of 99.11% achieved on PIE database and 99.38% achieved on YaleB which outperforms most existing approaches. Furthermore, the experimental results on noised Yale-B validate that MGF is more robust to image noise.


Author(s):  
Diana Martinez-Mosquera ◽  
Sergio Luján-Mora ◽  
Luis H. Montoya L. ◽  
Rolando P. Reyes Ch. ◽  
Manolo Paredes Calderón

2014 ◽  
Vol 2014 ◽  
pp. 1-19 ◽  
Author(s):  
Mark J. van der Laan ◽  
Richard J. C. M. Starmans

This outlook paper reviews the research of van der Laan’s group on Targeted Learning, a subfield of statistics that is concerned with the construction of data adaptive estimators of user-supplied target parameters of the probability distribution of the data and corresponding confidence intervals, aiming at only relying on realistic statistical assumptions. Targeted Learning fully utilizes the state of the art in machine learning tools, while still preserving the important identity of statistics as a field that is concerned with both accurate estimation of the true target parameter value and assessment of uncertainty in order to make sound statistical conclusions. We also provide a philosophical historical perspective on Targeted Learning, also relating it to the new developments in Big Data. We conclude with some remarks explaining the immediate relevance of Targeted Learning to the current Big Data movement.


Author(s):  
Nurshazwani Muhamad Mahfuz ◽  
Marina Yusoff ◽  
Zakiah Ahmad

<div style="’text-align: justify;">Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data.  This review paper provides a review of single clustering methods that were applied in various domains.  The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.</div>


Sign in / Sign up

Export Citation Format

Share Document