scholarly journals Foam-like phantoms for comparing tomography algorithms

2022 ◽  
Vol 29 (1) ◽  
Author(s):  
Daniël M. Pelt ◽  
Allard A. Hendriksen ◽  
Kees Joost Batenburg

Tomographic algorithms are often compared by evaluating them on certain benchmark datasets. For fair comparison, these datasets should ideally (i) be challenging to reconstruct, (ii) be representative of typical tomographic experiments, (iii) be flexible to allow for different acquisition modes, and (iv) include enough samples to allow for comparison of data-driven algorithms. Current approaches often satisfy only some of these requirements, but not all. For example, real-world datasets are typically challenging and representative of a category of experimental examples, but are restricted to the acquisition mode that was used in the experiment and are often limited in the number of samples. Mathematical phantoms are often flexible and can sometimes produce enough samples for data-driven approaches, but can be relatively easy to reconstruct and are often not representative of typical scanned objects. In this paper, we present a family of foam-like mathematical phantoms that aims to satisfy all four requirements simultaneously. The phantoms consist of foam-like structures with more than 100000 features, making them challenging to reconstruct and representative of common tomography samples. Because the phantoms are computer-generated, varying acquisition modes and experimental conditions can be simulated. An effectively unlimited number of random variations of the phantoms can be generated, making them suitable for data-driven approaches. We give a formal mathematical definition of the foam-like phantoms, and explain how they can be generated and used in virtual tomographic experiments in a computationally efficient way. In addition, several 4D extensions of the 3D phantoms are given, enabling comparisons of algorithms for dynamic tomography. Finally, example phantoms and tomographic datasets are given, showing that the phantoms can be effectively used to make fair and informative comparisons between tomography algorithms.

2019 ◽  
Vol 6 (1) ◽  
pp. 189-197 ◽  
Author(s):  
Cheng He ◽  
Ye Tian ◽  
Handing Wang ◽  
Yaochu Jin

Abstract Many real-world optimization applications have more than one objective, which are modeled as multiobjective optimization problems. Generally, those complex objective functions are approximated by expensive simulations rather than cheap analytic functions, which have been formulated as data-driven multiobjective optimization problems. The high computational costs of those problems pose great challenges to existing evolutionary multiobjective optimization algorithms. Unfortunately, there have not been any benchmark problems reflecting those challenges yet. Therefore, we carefully select seven benchmark multiobjective optimization problems from real-world applications, aiming to promote the research on data-driven evolutionary multiobjective optimization by suggesting a set of benchmark problems extracted from various real-world optimization applications.


2021 ◽  
Author(s):  
Thibaut Pressat-Laffouilhère ◽  
Clément Massonnaud ◽  
Hélène Bréard ◽  
Margaux Lefebvre ◽  
Bruno Falissard ◽  
...  

Abstract Background: Although data-driven methods for selecting covariates in multivariable models ignore confusion, mediation and collision, they are still used in causal inference. This study, through three real-world datasets, shows the impact of data-driven methods on causal inference. Methods: A research question leading to multivariate model was raised for each of three real-world datasets. Three covariate selection methods were compared on their performances to correctly answer the question: Augmented Backward Elimination with BIC criterion and “change-in-estimate” threshold set at 0.05, Backward Elimination with BIC criterion and a knowledge-based method relying on causal diagrams. The covariates were classified as indispensable, prohibited and optional, considering the potential bias they could cause on the estimate. For each dataset and sample size (N=75, 300 and 3,000), 10,000 Monte Carlo samples were drawn. Percentages of inclusion of each covariate in models were computed. Coverages of Wald’s 95% confidence interval of exposure effects were computed with two different theoretical values (the analysed method, the knowledge-based method).Results: Even with the largest sample size (n=3,000), data-driven methods were not reproducible, with 8.6% to 53% of covariates included in 20% to 80% of experiences. Prohibited covariates could be included in more than 80% of experiences and indispensable covariates missed in more than 80% of experiences even with n=3,000. With the largest sample sizes, coverages of the theoretical knowledge-based value by data-driven methods ranged from 0% to 83.7%; coverages of the theoretical value of the same data-driven method ranged from 73.2% to 91.1% and were asymmetrical. Conclusion: In conclusion, data-driven methods should not be used in causal inference.


2019 ◽  
Vol 116 (51) ◽  
pp. 25405-25411 ◽  
Author(s):  
Niklas Pfister ◽  
Stefan Bauer ◽  
Jonas Peters

Learning kinetic systems from data is one of the core challenges in many fields. Identifying stable models is essential for the generalization capabilities of data-driven inference. We introduce a computationally efficient framework, called CausalKinetiX, that identifies structure from discrete time, noisy observations, generated from heterogeneous experiments. The algorithm assumes the existence of an underlying, invariant kinetic model, a key criterion for reproducible research. Results on both simulated and real-world examples suggest that learning the structure of kinetic systems benefits from a causal perspective. The identified variables and models allow for a concise description of the dynamics across multiple experimental settings and can be used for prediction in unseen experiments. We observe significant improvements compared to well-established approaches focusing solely on predictive performance, especially for out-of-sample generalization.


2020 ◽  
Author(s):  
Adrien Foucart ◽  
Olivier Debeir ◽  
Christine Decaestecker

Abstract In digital pathology, image segmentation algorithms are usually ranked on clean, benchmark datasets. However, annotations in digital pathology are hard, time-consuming and by nature imperfect. We expand on the SNOW (Semi-, Noisy and/or Weak) supervision concept introduced in an earlier work to characterize such data supervision imperfections. We analyse the effects of SNOW supervision on typical DCNNs, and explore learning strategies to counteract those effects. We apply those lessons to the real-world task of artefact detection in whole-slide imaging. Our results show that SNOW supervision has an important impact on the performances of DCNNs and that relying on benchmarks and challenge datasets may not always be relevant for assessing algorithm performance. We show that a learning strategy adapted to SNOW supervision, such as “Generative Annotations", can greatly improve the results of DCNNs on real-world datasets.


Author(s):  
Maxime Peyrard ◽  
Robert West

Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences. Evaluating the performance of causal discovery algorithms should ideally involve comparing the inferred models to ground-truth models available for benchmark datasets, which in turn requires a notion of distance between causal models. While such distances have been proposed previously, they are limited by focusing on graphical properties of the causal models being compared. Here, we overcome this limitation by defining distances derived from the causal distributions induced by the models, rather than exclusively from their graphical structure. Pearl and Mackenzie [2018] have arranged the properties of causal models in a hierarchy called the ``ladder of causation'' spanning three rungs: observational, interventional, and counterfactual. Following this organization, we introduce a hierarchy of three distances, one for each rung of the ladder. Our definitions are intuitively appealing as well as efficient to compute approximately. We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available.


2021 ◽  
Vol 21 (3) ◽  
pp. 1-17
Author(s):  
Wu Chen ◽  
Yong Yu ◽  
Keke Gai ◽  
Jiamou Liu ◽  
Kim-Kwang Raymond Choo

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
pp. 016555152199980
Author(s):  
Yuanyuan Lin ◽  
Chao Huang ◽  
Wei Yao ◽  
Yifei Shao

Attraction recommendation plays an important role in tourism, such as solving information overload problems and recommending proper attractions to users. Currently, most recommendation methods are dedicated to improving the accuracy of recommendations. However, recommendation methods only focusing on accuracy tend to recommend popular items that are often purchased by users, which results in a lack of diversity and low visibility of non-popular items. Hence, many studies have suggested the importance of recommendation diversity and proposed improved methods, but there is room for improvement. First, the definition of diversity for different items requires consideration for domain characteristics. Second, the existing algorithms for improving diversity sacrifice the accuracy of recommendations. Therefore, the article utilises the topic ‘features of attractions’ to define the calculation method of recommendation diversity. We developed a two-stage optimisation model to enhance recommendation diversity while maintaining the accuracy of recommendations. In the first stage, an optimisation model considering topic diversity is proposed to increase recommendation diversity and generate candidate attractions. In the second stage, we propose a minimisation misclassification cost optimisation model to balance recommendation diversity and accuracy. To assess the performance of the proposed method, experiments are conducted with real-world travel data. The results indicate that the proposed two-stage optimisation model can significantly improve the diversity and accuracy of recommendations.


2021 ◽  
Vol 13 (7) ◽  
pp. 168781402110277
Author(s):  
Yankai Hou ◽  
Zhaosheng Zhang ◽  
Peng Liu ◽  
Chunbao Song ◽  
Zhenpo Wang

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 680
Author(s):  
Hanyang Lin ◽  
Yongzhao Zhan ◽  
Zizheng Zhao ◽  
Yuzhong Chen ◽  
Chen Dong

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.


Sign in / Sign up

Export Citation Format

Share Document