Reverse engineering large-scale genetic networks: synthetic versus real data

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

Download Full-text

Meta-Descent for Online, Continual Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013943 ◽

2019 ◽

Vol 33 ◽

pp. 3943-3950

Author(s):

Andrew Jacobsen ◽

Matthew Schlegel ◽

Cameron Linke ◽

Thomas Degris ◽

Adam White ◽

...

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Time Series Prediction ◽

Real Data ◽

Second Order ◽

Stochastic Gradient Descent ◽

Step Size ◽

Vector Approximation ◽

Prediction Problems ◽

Stationary Problems

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update—a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the stepsize parameters to minimize prediction error. These metadescent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental metadescent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Reverse engineering of genetic networks with Bayesian networks

Biochemical Society Transactions ◽

10.1042/bst0311516 ◽

2003 ◽

Vol 31 (6) ◽

pp. 1516-1518 ◽

Cited By ~ 39

Author(s):

D. Husmeier

Keyword(s):

Gene Expression ◽

Reverse Engineering ◽

Bayesian Networks ◽

Gene Expression Data ◽

Bayesian Learning ◽

Genetic Networks ◽

Biochemical Networks ◽

Receiver Operator Characteristic ◽

Expression Data ◽

Learning Paradigm

This paper provides a brief introduction to learning Bayesian networks from gene-expression data. The method is contrasted with other approaches to the reverse engineering of biochemical networks, and the Bayesian learning paradigm is briefly described. The article demonstrates an application to a simple synthetic toy problem and evaluates the inference performance in terms of ROC (receiver operator characteristic) curves.

Download Full-text

Microwave Dielectric Property Retrieval from Open-Ended Coaxial Probe Response with Deep Learning

10.36227/techrxiv.16992394.v1 ◽

2021 ◽

Author(s):

Cemanur Aydinalp ◽

Sulayman Joof ◽

Mehmet Nuri Akinci ◽

Ibrahim Akduman ◽

Tuba Yilmaz

Keyword(s):

Deep Learning ◽

Dielectric Property ◽

Large Scale ◽

Real Data ◽

Learning Model ◽

Data Generation ◽

Retrieval Technique ◽

Design Flexibility ◽

A New Technique ◽

Deep Learning Model

In the manuscript, we propose a new technique for determination of Debye parameters, representing the dielectric properties of materials, from the reflection coefficient response of open-ended coaxial probes. The method retrieves the Debye parameters using a deep learning model designed through utilization of numerically generated data. Unlike real data, using synthetically generated input and output data for training purposes provides representation of a wide variety of materials with rapid data generation. Furthermore, the proposed method provides design flexibility and can be applied to any desired probe with intended dimensions and material. Next, we experimentally verified the designed deep learning model using measured reflection coefficients when the probe was terminated with five different standard liquids, four mixtures,and a gel-like material.and compared the results with the literature. Obtained mean percent relative error was ranging from 1.21±0.06 to 10.89±0.08. Our work also presents a large-scale statistical verification of the proposed dielectric property retrieval technique.

Download Full-text

Rapid Deployment and Application Exploration of the Mobile Shelter Laboratories Under T He Outbreak of COVID-19 Epidemic

10.21203/rs.3.rs-968627/v1 ◽

2021 ◽

Author(s):

Yi Luan ◽

Rui Ding ◽

Wenshen Gu ◽

Xiaofan Zhang ◽

Xinliang Chen ◽

...

Keyword(s):

Large Scale ◽

Mainland China ◽

Reference Value ◽

Guangdong Province ◽

Real Data ◽

Practical Experience ◽

Rapid Expansion ◽

Nucleic Acid Detection ◽

Short Period ◽

Detection Capabilities

Abstract Since the end of 2019, the COVID-19 epidemic has swept the world. With the widespread spread of the COVID-19 and the continuous emergence of mutated strains, the situation for the prevention and control of the COVID-19 epidemic remains severe. On May 21, 2021, Guangzhou City, Guangdong Province, notified the discovery of a new locally confirmed case. Guangzhou became the first city in mainland China to compete with the delta mutant strain. As a local hospital with strong nucleic acid detection capabilities, Sun Yat-sen University Sun Yat-sen Memorial Hospital took the lead in launching the construction and deployment of the Mobile Shelter Laboratories and large-scale screening work in Foshan and Zhanjiang, Guangdong Province. Through summarizing "practical" experience, observation and comparison data analysis, we use real data to verify a feasible solution for rapid expansion of detection capabilities in a short period of time. We hope that these experiences will have certain reference value for other countries or regions, especially the underdeveloped areas of medical and health care.

Download Full-text

Experimental Framework to Simulate Rescue Operations after a Natural Disaster

Journal of Computer Science and Technology ◽

10.24215/16666038.20.e07 ◽

2020 ◽

Vol 20 (2) ◽

pp. e07

Author(s):

Luis Veas Castillo ◽

Gabriel Ovando-Leon ◽

Gabriel Astudillo ◽

Veronica Gil-Costa ◽

Mauricio Marín

Keyword(s):

Natural Disaster ◽

Capacity Planning ◽

Large Scale ◽

Web Search ◽

Computational Simulation ◽

Real Data ◽

Data Repository ◽

Emergency Situations ◽

Research Areas ◽

Computational Systems

Computational simulation is a powerful tool for performance evaluation of computational systems. It is useful to make capacity planning of data center clusters, to obtain profiling reports of software applications and to detect bottlenecks. It has been used in different research areas like large scale Web search engines, natural disaster evacuations, computational biology, human behavior and tendency, among many others. However, properly tuning the parameters of the simulators, defining the scenarios to be simulated and collecting the data traces is not an easy task. It is an incremental process which requires constantly comparing the estimated metrics and the flow of simulated actions against real data. In this work, we present an experimental framework designed for the development of large scale simulations of two applications used upon the occurrence of a natural disaster strikes. The first one is a social application aimed to register volunteers and manage emergency campaigns and tasks. The second one is a benchmark application a data repository named MongoDB. The applications are deployed in a distributed platform which combines different technologies like a Proxy, a Containers Orchestrator, Containers and a NoSQL Database. We simulate both applications and the architecture platform. We validate our simulators using real traces collected during simulacrums of emergency situations.

Download Full-text

Large‐scale three‐dimensional seismic models and their interpretive significance

Geophysics ◽

10.1190/1.1442933 ◽

1990 ◽

Vol 55 (9) ◽

pp. 1166-1182 ◽

Cited By ~ 21

Author(s):

Irshad R. Mufti

Keyword(s):

Finite Difference ◽

Wave Field ◽

Large Scale ◽

Three Dimensional ◽

Synthetic Data ◽

Real Data ◽

The Real ◽

Need To Evaluate ◽

Zero Offset ◽

Seismic Models

Finite‐difference seismic models are commonly set up in 2-D space. Such models must be excited by a line source which leads to different amplitudes than those in the real data commonly generated from a point source. Moreover, there is no provision for any out‐of‐plane events. These problems can be eliminated by using 3-D finite‐difference models. The fundamental strategy in designing efficient 3-D models is to minimize computational work without sacrificing accuracy. This was accomplished by using a (4,2) differencing operator which ensures the accuracy of much larger operators but requires many fewer numerical operations as well as significantly reduced manipulation of data in the computer memory. Such a choice also simplifies the problem of evaluating the wave field near the subsurface boundaries of the model where large operators cannot be used. We also exploited the fact that, unlike the real data, the synthetic data are free from ambient noise; consequently, one can retain sufficient resolution in the results by optimizing the frequency content of the source signal. Further computational efficiency was achieved by using the concept of the exploding reflector which yields zero‐offset seismic sections without the need to evaluate the wave field for individual shot locations. These considerations opened up the possibility of carrying out a complete synthetic 3-D survey on a supercomputer to investigate the seismic response of a large‐scale structure located in Oklahoma. The analysis of results done on a geophysical workstation provides new insight regarding the role of interference and diffraction in the interpretation of seismic data.

Download Full-text

The Usefulness of Video Learning Analytics in Small Scale E-Learning Scenarios

Applied Sciences ◽

10.3390/app112110366 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10366

Author(s):

César Córcoles ◽

Germán Cobo ◽

Ana-Elena Guerrero-Roldán

Keyword(s):

Learning Process ◽

Teaching And Learning ◽

Large Scale ◽

Real Data ◽

Small Scale ◽

E Learning ◽

Learning Scenarios ◽

Real Scenario ◽

Teaching Learning ◽

Learning Data

A variety of tools are available to collect, process and analyse learning data obtained from the clickstream generated by students watching learning resources in video format. There is also some literature on the uses of such data in order to better understand and improve the teaching-learning process. Most of the literature focuses on large scale learning scenarios, such as MOOCs, where videos are watched hundreds or thousands of times. We have developed a solution to collect clickstream analytics data applicable to smaller scenarios, much more common in primary, secondary and higher education, where videos are watched tens or hundreds of times, and to analyse whether the solution is useful to teachers to improve the learning process. We have deployed it in a real scenario and collected real data. Furthermore, we have processed and presented the data visually to teachers for those scenarios and have collected and analysed their perception of their usefulness. We conclude that the collected data are perceived as useful by teachers to improve the teaching and learning process.

Download Full-text