scholarly journals bin3C : Exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes (MAGs)

2018 ◽  
Author(s):  
Matthew Z. DeMaere ◽  
Aaron E. Darling

AbstractMost microbes inhabiting the planet cannot be easily grown in the lab. Metagenomic techniques provide a means to study these organisms, and recent advances in the field have enabled the resolution of individual genomes from metagenomes, so-called Metagenome Assembled Genomes (MAGs). In addition to expanding the catalog of known microbial diversity, the systematic retrieval of MAGs stands as a tenable divide and conquer reduction of metagenome analysis to the simpler problem of single genome analysis. Many leading approaches to MAG retrieval depend upon time-series or transect data, whose effectiveness is a function of community complexity, target abundance and depth of sequencing. Without the need for time-series data, promising alternative methods are based upon the high-throughput sequencing technique called Hi-C.The Hi-C technique produces read-pairs which capture in-vivo DNA-DNA proximity interactions (contacts). The physical structure of the community modulates the signal derived from these interactions and a hierarchy of interaction rates exists (īntra-chromosomal > Inter-chromosomal > Inter-cellular).We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs from a single time-point. As a quantitative demonstration, next, we validate the method against the ground truth of a simulated human faecal microbiome. Lastly, we directly compare our method against a recently announced proprietary service ProxiMeta, which also performs MAG retrieval using Hi-C data.bin3C has been implemented as a simple open-source pipeline and makes use of the unsupervised community detection algorithm Infomap (https://github.com/cerebis/bin3C).

2021 ◽  
Vol 83 (3) ◽  
Author(s):  
Maria-Veronica Ciocanel ◽  
Riley Juenemann ◽  
Adriana T. Dawes ◽  
Scott A. McKinley

AbstractIn developmental biology as well as in other biological systems, emerging structure and organization can be captured using time-series data of protein locations. In analyzing this time-dependent data, it is a common challenge not only to determine whether topological features emerge, but also to identify the timing of their formation. For instance, in most cells, actin filaments interact with myosin motor proteins and organize into polymer networks and higher-order structures. Ring channels are examples of such structures that maintain constant diameters over time and play key roles in processes such as cell division, development, and wound healing. Given the limitations in studying interactions of actin with myosin in vivo, we generate time-series data of protein polymer interactions in cells using complex agent-based models. Since the data has a filamentous structure, we propose sampling along the actin filaments and analyzing the topological structure of the resulting point cloud at each time. Building on existing tools from persistent homology, we develop a topological data analysis (TDA) method that assesses effective ring generation in this dynamic data. This method connects topological features through time in a path that corresponds to emergence of organization in the data. In this work, we also propose methods for assessing whether the topological features of interest are significant and thus whether they contribute to the formation of an emerging hole (ring channel) in the simulated protein interactions. In particular, we use the MEDYAN simulation platform to show that this technique can distinguish between the actin cytoskeleton organization resulting from distinct motor protein binding parameters.


2017 ◽  
Vol 2017 ◽  
pp. 1-10
Author(s):  
Zhihua Li ◽  
Ziyuan Li ◽  
Ning Yu ◽  
Steven Wen

Physiological theories indicate that the deepest impression for time series data with respect to the human visual system is its extreme value. Based on this principle, by researching the strategies of extreme-point-based hierarchy segmentation, the hierarchy-segmentation-based data extraction method for time series, and the ideas of locality outlier, a novel outlier detection model and method for time series are proposed. The presented algorithm intuitively labels an outlier factor to each subsequence in time series such that the visual outlier detection gets relatively direct. The experimental results demonstrate the average advantage of the developed method over the compared methods and the efficient data reduction capability for time series, which indicates the promising performance of the proposed method and its practical application value.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Nhat-Duc Hoang ◽  
Anh-Duc Pham ◽  
Minh-Tu Cao

This research aims at establishing a novel hybrid artificial intelligence (AI) approach, named as firefly-tuned least squares support vector regression for time series prediction(FLSVRTSP). The proposed model utilizes the least squares support vector regression (LS-SVR) as a supervised learning technique to generalize the mapping function between input and output of time series data. In order to optimize the LS-SVR’s tuning parameters, theFLSVRTSPincorporates the firefly algorithm (FA) as the search engine. Consequently, the newly construction model can learn from historical data and carry out prediction autonomously without any prior knowledge in parameter setting. Experimental results and comparison have demonstrated that theFLSVRTSPhas achieved a significant improvement in forecasting accuracy when predicting both artificial and real-world time series data. Hence, the proposed hybrid approach is a promising alternative for assisting decision-makers to better cope with time series prediction.


2017 ◽  
Vol 29 (2) ◽  
pp. 353-363 ◽  
Author(s):  
Yoshimi Ui ◽  
◽  
Yutaka Akiba ◽  
Shohei Sugano ◽  
Ryosuke Imai ◽  
...  

[abstFig src='/00290002/09.jpg' width='300' text='Standard Lifilm configuration' ] In this study, we propose an excretion detection system, Lifi, which does not require sensors inside diapers, and we verify its capabilities. It consists of a sheet with strategically placed air intakes, a set of gas sensors, and a processing unit with a newly developed excretion detection algorithm. The gas sensor detects chemicals with odor in the excrement, such as hydrogen sulfide and urea. The time-series data from the gas sensor was used for the detection of not only excretion, but also of the presence/absence of the cared person on the bed. We examined two algorithms, one with a simple threshold and another based on the clustering of sensor data, obtained using the<span class=”bold”>k</span>-means method. The results from both algorithms were satisfactory and similar, once the algorithms were customized for each cared person. However, we adopted the clustering algorithm because it possesses a higher level of flexibility that can be explored and exploited. Lifi was conceived from an overwhelming and serious desire of caretakers to discover the excretion of bed-ridden cared persons, without opening their diapers. We believe that Lifi, along with the clustering algorithm, can help caretakers in this regard.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Taylor Chomiak ◽  
Neilen P. Rasiah ◽  
Leonardo A. Molina ◽  
Bin Hu ◽  
Jaideep S. Bains ◽  
...  

AbstractHere we introduce Local Topological Recurrence Analysis (LoTRA), a simple computational approach for analyzing time-series data. Its versatility is elucidated using simulated data, Parkinsonian gait, and in vivo brain dynamics. We also show that this algorithm can be used to build a remarkably simple machine-learning model capable of outperforming deep-learning models in detecting Parkinson’s disease from a single digital handwriting test.


2016 ◽  
Vol 4 (4) ◽  
pp. 485
Author(s):  
Haviluddin Haviluddin ◽  
Zainal Arifin ◽  
Awang Harsa Kridalaksana ◽  
Dedy Cahyadi

In this paper, a backpropagation neural network (BPNN) method with time series data have been explored. The BPNN method to predict the foreign tourist’s arrival to Indonesia datasets have been implemented. The foreign tourist’s arrival datasets were taken from the center agency on statistics (BPS) Indonesia. The experimental results showed that the BPNN method with two hidden layers were able to forecast foreign tourist’s arrival to Indonesia. Where, the mean square error (MSE) as forecasting accuracy has been indicated. In this study, the BPNN method is able and recommended to be alternative methods for predicting time series datasets. Also, the BPNN method showed that effective and easy to use. In other words, BPNN method is capable to producing good value of forecasting.Keywords - BPNN; foreign tourists; BPS; MSEPemanfaatan backpropagation neural network (BPNN) dengan data deret waktu telah digunakan dalam paper ini. Metode BPNN telah digunakan untuk memprediksi data kedatangan turis asing ke Indonesia, dimana data turis tersebut diambil dari badan pusat statistik Indonesia (BPS). Hasil pengujian menunjukkan bahwa metode BPNN dengan dua lapisan tersembunyi mampu memodelkan dan meramalkan data kedatangan turis asing ke Indonesia yang diindikasikan dengan nilai mean square error (MSE). Penelitian ini merekomendasikan bahwa metode BPNN mampu menjadi alternative metode dalam memprediksi data yang berjenis deret waktu karena metode BPNN efektif dan lebih mudah digunakan serta mampu menghasilkan akurasi nilai peramalan yang baik.


2016 ◽  
Vol 16 (12) ◽  
pp. 2603-2622
Author(s):  
Jun-Whan Lee ◽  
Sun-Cheon Park ◽  
Duk Kee Lee ◽  
Jong Ho Lee

Abstract. Timely detection of tsunamis with water level records is a critical but logistically challenging task because of outliers and gaps. Since tsunami detection algorithms require several hours of past data, outliers could cause false alarms, and gaps can stop the tsunami detection algorithm even after the recording is restarted. In order to avoid such false alarms and time delays, we propose the Tsunami Arrival time Detection System (TADS), which can be applied to discontinuous time series data with outliers. TADS consists of three algorithms, outlier removal, gap filling, and tsunami detection, which are designed to update whenever new data are acquired. After calibrating the thresholds and parameters for the Ulleung-do surge gauge located in the East Sea (Sea of Japan), Korea, the performance of TADS was discussed based on a 1-year dataset with historical tsunamis and synthetic tsunamis. The results show that the overall performance of TADS is effective in detecting a tsunami signal superimposed on both outliers and gaps.


Author(s):  
Bin Zhou ◽  
Shenghua Liu ◽  
Bryan Hooi ◽  
Xueqi Cheng ◽  
Jing Ye

Given a large-scale rhythmic time series containing mostly normal data segments (or `beats'), can we learn how to detect anomalous beats in an effective yet efficient way? For example, how can we detect anomalous beats from electrocardiogram (ECG) readings? Existing approaches either require excessively high amounts of labeled and balanced data for classification, or rely on less regularized reconstructions, resulting in lower accuracy in anomaly detection. Therefore, we propose BeatGAN, an unsupervised anomaly detection algorithm for time series data. BeatGAN outputs explainable results to pinpoint the anomalous time ticks of an input beat, by comparing them to adversarially generated beats. Its robustness is guaranteed by its regularization of reconstruction error using an adversarial generation approach, as well as data augmentation using time series warping. Experiments show that BeatGAN accurately and efficiently detects anomalous beats in ECG time series, and routes doctors' attention to anomalous time ticks, achieving accuracy of nearly 0.95 AUC, and very fast inference (2.6 ms per beat). In addition, we show that BeatGAN accurately detects unusual motions from multivariate motion-capture time series data, illustrating its generality.


2018 ◽  
Author(s):  
Tal Zinger ◽  
Pleuni S. Pennings ◽  
Adi Stern

1AbstractWith the advent of deep sequencing techniques, it is now possible to track the evolution of viruses with ever-increasing detail. Here we present FITS (Flexible Inference from Time-Series) – a computational framework that allows inference of either the fitness of a mutation, the mutation rate or the population size from genomic time-series sequencing data. FITS was designed first and foremost for analysis of either short-term Evolve & Resequence (E&R) experiments, or for rapidly recombining populations of viruses. We thoroughly explore the performance of FITS on noisy simulated data, and highlight its ability to infer meaningful information even in those circumstances. In particular FITS is able to categorize a mutation as Advantageous, Neutral or Deleterious. We next apply FITS to empirical data from an E&R experiment on poliovirus where parameters were determined experimentally and demonstrate extremely high accuracy in inference. We highlight the ease of use of FITS for step-wise or iterative inference of mutation rates, population size, and fitness values for each mutation sequenced, when deep sequencing data is available at multiple time-points.AvailabilityFITS is written in C++ and is available both with a highly user friendly graphical user interface but also as a command line program that allows parallel high throughput analyses. Source code, binaries (Windows and Mac) and complementary scripts, are available from GitHub at https://github.com/SternLabTAU/[email protected]


2019 ◽  
Author(s):  
Thinh N. Tran ◽  
Gary D. Bader

ABSTRACTSingle-cell RNA sequencing (scRNAseq) can map cell types, states and transitions during dynamic biological processes such as development and regeneration. Many trajectory inference methods have been developed to order cells by their progression through a dynamic process. However, when time series data is available, these methods do not consider the available time information when ordering cells and are instead designed to work only on a single scRNAseq data snapshot. We present Tempora, a novel cell trajectory inference method that orders cells using time information from time-series scRNAseq data. In performance comparison tests, Tempora accurately inferred developmental lineages in human skeletal myoblast differentiation and murine cerebral cortex development, beating state of the art methods. Tempora uses biological pathway information to help identify cell type relationships and can identify important time-dependent pathways to help interpret the inferred trajectory. Our results demonstrate the utility of time information to supervise trajectory inference for scRNA-seq based analysis.


Sign in / Sign up

Export Citation Format

Share Document