scholarly journals Data-driven symbol detection via model-based machine learning

2020 ◽  
Vol 20 (3) ◽  
pp. 283-317
Author(s):  
Nariman Farsad ◽  
Nir Shlezinger ◽  
Andrea J. Goldsmith ◽  
Yonina C. Eldar
2021 ◽  
Author(s):  
Nariman Farsad ◽  
Nir Shlezinger ◽  
Andrea J. Goldsmith ◽  
Yonina C. Eldar

2019 ◽  
Author(s):  
Giulio Caravagna ◽  
Timon Heide ◽  
Marc Williams ◽  
Luis Zapata ◽  
Daniel Nichol ◽  
...  

AbstractThe vast majority of cancer next-generation sequencing data consist of bulk samples composed of mixtures of cancer and normal cells. To study tumor evolution, subclonal reconstruction approaches based on machine learning are used to separate subpopulation of cancer cells and reconstruct their ancestral relationships. However, current approaches are entirely data-driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in subclonal reconstruction if tumor evolution is not accounted for, and that those errors increase when multiple samples are taken from the same tumor. To address this issue, we present a novel approach for model-based subclonal reconstruction that combines data-driven machine learning with evolutionary theory. Using public, synthetic and newly generated data, we show the method is more robust and accurate than current techniques in both single-sample and multi-region sequencing data. With careful data curation and interpretation, we show how the method allows minimizing the confounding factors that affect non-evolutionary methods, leading to a more accurate recovery of the evolutionary history of human tumors.


2021 ◽  
Author(s):  
Xupeng He ◽  
Weiwei Zhu ◽  
Ryan Santoso ◽  
Marwa Alsinan ◽  
Hyung Kwak ◽  
...  

Abstract The permeability of fractures, including natural and hydraulic, are essential parameters for the modeling of fluid flow in conventional and unconventional fractured reservoirs. However, traditional analytical cubic law (CL-based) models used to estimate fracture permeability show unsatisfactory performance when dealing with different dynamic complexities of fractures. This work presents a data-driven, physics-included model based on machine learning as an alternative to traditional methods. The workflow for the development of the data-driven model includes four steps. Step 1: Identify uncertain parameters and perform Latin Hypercube Sampling (LHS). We first identify the uncertain parameters which affect the fracture permeability. We then generate training samples using LHS. Step 2: Perform training simulations and collect inputs and outputs. In this step, high-resolution simulations with parallel computing for the Navier-Stokes equations (NSEs) are run for each of the training samples. We then collect the inputs and outputs from the simulations. Step 3: Construct an optimized data-driven surrogate model. A data-driven model based on machine learning is then built to model the nonlinear mapping between the inputs and outputs collected from Step 2. Herein, Artificial Neural Network (ANN) coupling with Bayesian optimization algorithm is implemented to obtain the optimized surrogate model. Step 4: Validate the proposed data-driven model. In this step, we conduct blind validation on the proposed model with high-fidelity simulations. We further test the developed surrogate model with newly generated fracture cases with a broad range of roughness and tortuosity under different Reynolds numbers. We then compare its performance to the reference NSEs solutions. Results show that the developed data-driven model delivers good accuracy exceeding 90% for all training, validation, and test samples. This work introduces an integrated workflow for developing a data-driven, physics-included model using machine learning to estimate fracture permeability under complex physics (e.g., inertial effect). To our knowledge, this technique is introduced for the first time for the upscaling of rock fractures. The proposed model offers an efficient and accurate alternative to the traditional upscaling methods that can be readily implemented in reservoir characterization and modeling workflows.


Author(s):  
Tong Lin ◽  
Leiming Hu ◽  
Shawn Litster ◽  
Levent Burak Kara

Abstract This paper presents a set of data-driven methods for predicting nitrogen concentration in proton exchange membrane fuel cells (PEMFCs). The nitrogen that accumulates in the anode channel is a critical factor giving rise to significant inefficiency in fuel cells. While periodically purging the gases in the anode channel is a common strategy to combat nitrogen accumulation, such open-loop strategies also create sub-optimal purging decisions. Instead, an accurate prediction of nitrogen concentration can help devise optimal purging strategies. However, model based approaches such as CFD simulations for nitrogen prediction are often unavailable for long-stack fuel cells due to the complexity of the chemical environment, or are inherently slow preventing them from being used for real-time nitrogen prediction on deployed fuel cells. As one step toward addressing this challenge, we explore a set of data-driven techniques for learning a regression model from the input parameters to the nitrogen build-up using a model-based fuel cell simulator as an offline data generator. This allows the trained machine learning system to make fast decisions about nitrogen concentration during deployment based on other parameters that can be obtained through sensors. We describe the various methods we explore, compare the outcomes, and provide future directions in utilizing machine learning for fuel cell physics modeling in general.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2085
Author(s):  
Xue-Bo Jin ◽  
Ruben Jonhson Robert RobertJeremiah ◽  
Ting-Li Su ◽  
Yu-Ting Bai ◽  
Jian-Lei Kong

State estimation is widely used in various automated systems, including IoT systems, unmanned systems, robots, etc. In traditional state estimation, measurement data are instantaneous and processed in real time. With modern systems’ development, sensors can obtain more and more signals and store them. Therefore, how to use these measurement big data to improve the performance of state estimation has become a hot research issue in this field. This paper reviews the development of state estimation and future development trends. First, we review the model-based state estimation methods, including the Kalman filter, such as the extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc. Particle filters and Gaussian mixture filters that can handle mixed Gaussian noise are discussed, too. These methods have high requirements for models, while it is not easy to obtain accurate system models in practice. The emergence of robust filters, the interacting multiple model (IMM), and adaptive filters are also mentioned here. Secondly, the current research status of data-driven state estimation methods is introduced based on network learning. Finally, the main research results for hybrid filters obtained in recent years are summarized and discussed, which combine model-based methods and data-driven methods. This paper is based on state estimation research results and provides a more detailed overview of model-driven, data-driven, and hybrid-driven approaches. The main algorithm of each method is provided so that beginners can have a clearer understanding. Additionally, it discusses the future development trends for researchers in state estimation.


Agronomy ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 35
Author(s):  
Xiaodong Huang ◽  
Beth Ziniti ◽  
Michael H. Cosh ◽  
Michele Reba ◽  
Jinfei Wang ◽  
...  

Soil moisture is a key indicator to assess cropland drought and irrigation status as well as forecast production. Compared with the optical data which are obscured by the crop canopy cover, the Synthetic Aperture Radar (SAR) is an efficient tool to detect the surface soil moisture under the vegetation cover due to its strong penetration capability. This paper studies the soil moisture retrieval using the L-band polarimetric Phased Array-type L-band SAR 2 (PALSAR-2) data acquired over the study region in Arkansas in the United States. Both two-component model-based decomposition (SAR data alone) and machine learning (SAR + optical indices) methods are tested and compared in this paper. Validation using independent ground measurement shows that the both methods achieved a Root Mean Square Error (RMSE) of less than 10 (vol.%), while the machine learning methods outperform the model-based decomposition, achieving an RMSE of 7.70 (vol.%) and R2 of 0.60.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


Sign in / Sign up

Export Citation Format

Share Document