Semi-Automated Machine Learning Video Annotation for Gastroenterologists

A semi-automatic tool for fast and accurate annotation of endoscopic videos utilizing trained object detection models is presented. A novel workflow is implemented and the preliminary results suggest that the annotation process is nearly twice as fast with our novel tool compared to the current state of the art.

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text

Evolutionary Machine Learning for Multi-Objective Class Solutions in Medical Deformable Image Registration

Algorithms ◽

10.3390/a12050099 ◽

2019 ◽

Vol 12 (5) ◽

pp. 99 ◽

Cited By ~ 2

Author(s):

Kleopatra Pirpinia ◽

Peter A. N. Bosman ◽

Jan-Jakob Sonke ◽

Marcel van Herk ◽

Tanja Alderliesten

Keyword(s):

Machine Learning ◽

Image Registration ◽

State Of The Art ◽

Deformable Image Registration ◽

Optimization Approach ◽

High Quality ◽

Trade Off ◽

Multi Objective ◽

Current State ◽

Image Artefacts

Current state-of-the-art medical deformable image registration (DIR) methods optimize a weighted sum of key objectives of interest. Having a pre-determined weight combination that leads to high-quality results for any instance of a specific DIR problem (i.e., a class solution) would facilitate clinical application of DIR. However, such a combination can vary widely for each instance and is currently often manually determined. A multi-objective optimization approach for DIR removes the need for manual tuning, providing a set of high-quality trade-off solutions. Here, we investigate machine learning for a multi-objective class solution, i.e., not a single weight combination, but a set thereof, that, when used on any instance of a specific DIR problem, approximates such a set of trade-off solutions. To this end, we employed a multi-objective evolutionary algorithm to learn sets of weight combinations for three breast DIR problems of increasing difficulty: 10 prone-prone cases, 4 prone-supine cases with limited deformations and 6 prone-supine cases with larger deformations and image artefacts. Clinically-acceptable results were obtained for the first two problems. Therefore, for DIR problems with limited deformations, a multi-objective class solution can be machine learned and used to compute straightforwardly multiple high-quality DIR outcomes, potentially leading to more efficient use of DIR in clinical practice.

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

A Graph-based Evolutionary Algorithm for Automated Machine Learning

10.37686/ser.v1i2.77 ◽

2020 ◽

Author(s):

Fei Qi ◽

Zhaohui Xia ◽

Gaoyang Tang ◽

Hang Yang ◽

Yu Song ◽

...

Keyword(s):

Machine Learning ◽

Evolutionary Algorithm ◽

Parameter Optimization ◽

State Of The Art ◽

The State ◽

Complex Structures ◽

Architecture Evolution ◽

Automated Machine Learning ◽

Art Performance

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.

Download Full-text

Detecting Emotions in English and Arabic Tweets

Information ◽

10.3390/info10030098 ◽

2019 ◽

Vol 10 (3) ◽

pp. 98 ◽

Cited By ~ 4

Author(s):

Tariq Ahmad ◽

Allan Ramsay ◽

Hanady Ahmed

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Current State ◽

Optimal Thresholds ◽

Alternative Approach

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.

Download Full-text

30. Radial Velocities (Vitesses Radiales)

Transactions of the International Astronomical Union ◽

10.1017/s0251107x00007227 ◽

1988 ◽

Vol 20 (01) ◽

pp. 355-362 ◽

Cited By ~ 1

Author(s):

J. Andersen ◽

D. W. Latham ◽

A. Florsch ◽

E. Maurice ◽

M. Mayor ◽

...

Keyword(s):

Cross Correlation ◽

State Of The Art ◽

Present Report ◽

Radial Velocities ◽

Developmental Phase ◽

Preliminary Results ◽

Reduction Techniques ◽

Current State ◽

Correlation Techniques ◽

General Adoption

The present report on the activities of IAU Commission 30, covering the triennium June l, 1984 through June 1, 1987, will be somewhat different from its recent predecessors in both content and style. Over the preceding decade or so, the reports mainly emphasized the dramatic improvements in observing efficiency, achieved primarily through the general adoption of cross-correlation techniques, combined with modern detectors attached to either specialized spectrometers or to existing, more conventional instruments. A great surge of observational activity followed, directed towards a variety of astrophysical problems, some of which are of a more classical nature, but many of which are in entirely new classes of research. At the time of the previous reports, most of the major observational projects were still underway, even if some preliminary results were emerging. The proceedings of IAU Colloquium No. 88,Stellar Radial Velocities(L. Davis Press, 1985) contains a collection of papers on instrumentation and reduction techniques as well as on ongoing observing programs which remains a very useful source of references to this developmental phase as well as to the current state of the art.

Download Full-text

Developing the platform model for problem solving of automated machine learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/2094/3/032049 ◽

2021 ◽

Vol 2094 (3) ◽

pp. 032049

Author(s):

V A Chastikova ◽

S A Zherlitsyn

Keyword(s):

Machine Learning ◽

Information Exchange ◽

Complete Solution ◽

Learning Systems ◽

Distribution Model ◽

System Operation ◽

Current State ◽

Automated Machine Learning ◽

Testing Module ◽

The Given

Abstract The article discusses the current state of technologies for automated machine learning. The development trends and the nature of the distribution model - MLaaS - are defined. There is highlighted a number of problems of automating the machine learning process, such as: excessive simplification and specialization of tools, vagueness of implemented processes, lack of flexibility in the infrastructure hardware, using closed algorithms. As a partial or complete solution to them, we have proposed the architecture, consisting of separate modules: models, hybridizer, learning algorithms module, testing module, user support module, and a theoretical framework. The main feature of the given architecture is its modularity, transparency and encapsulation of components. Each module is described as a separate element, implemented as an independent microservice. The paper describes the benefits of applying the given approach to the implementation of automated machine learning systems, the need to implement the given or similar standards. For each of the modules, its purposes, the tasks it solves and the implemented functionality, as well as the data necessary for the functioning and their sources are described. A general diagram showing the flows of information exchange between modules is presented. The main scenarios for the resulting system operation, as well as ways of interacting with it and the result of its operation - the generated model - are described.

Download Full-text

Population-based Ensemble Learning with Tree Structures for Classification

10.26686/wgtn.17136296 ◽

2021 ◽

Author(s):

◽

Benjamin Evans

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

State Of The Art ◽

Population Based ◽

Black Box ◽

Tree Structures ◽

Great Performance ◽

Automated Machine Learning ◽

Population Based Search ◽

Machine Learning Models

<p>Ensemble learning is one of the most powerful extensions for improving upon individual machine learning models. Rather than a single model being used, several models are trained and the predictions combined to make a more informed decision. Such combinations will ideally overcome the shortcomings of any individual member of the ensemble. Most ma- chine learning competition winners feature an ensemble of some sort, and there is also sound theoretical proof to the performance of certain ensem- bling schemes. The benefits of ensembling are clear in both theory and practice. Despite the great performance, ensemble learning is not a trivial task. One of the main difficulties is designing appropriate ensembles. For exam- ple, how large should an ensemble be? What members should be included in an ensemble? How should these members be weighted? Our first contribution addresses these concerns using a strongly-typed population- based search (genetic programming) to construct well-performing ensem- bles, where the entire ensemble (members, hyperparameters, structure) is automatically learnt. The proposed method was found, in general, to be significantly better than all base members and commonly used compari- son methods trialled. With automatically designed ensembles, there is a range of applica- tions, such as competition entries, forecasting and state-of-the-art predic- tions. However, often these applications also require additional prepro- cessing of the input data. Above the ensemble considers only the original training data, however, in many machine learning scenarios a pipeline is required (for example performing feature selection before classification). For the second contribution, a novel automated machine learning method is proposed based on ensemble learning. This method uses a random population-based search of appropriate tree structures, and as such is em- barrassingly parallel, an important consideration for automated machine learning. The proposed method is able to achieve equivalent or improved results over the current state-of-the-art methods and does so in a fraction of the time (six times as fast). Finally, while complex ensembles offer great performance, one large limitation is the interpretability of such ensembles. For example, why does a forest of 500 trees predict a particular class for a given instance? In an effort to explain the behaviour of complex models (such as ensem- bles), several methods have been proposed. However, these approaches tend to suffer at least one of the following limitations: overly complex in the representation, local in their application, limited to particular fea- ture types (i.e. categorical only), or limited to particular algorithms. For our third contribution, a novel model agnostic method for interpreting complex black-box machine learning models is proposed. The method is based on strongly-typed genetic programming and overcomes the afore- mentioned limitations. Multi-objective optimisation is used to generate a Pareto frontier of simple and explainable models which approximate the behaviour of much more complex methods. We found the resulting rep- resentations are far simpler than existing approaches (an important con- sideration for interpretability) while providing equivalent reconstruction performance. Overall, this thesis addresses two of the major limitations of existing ensemble learning, i.e. the complex construction process and the black- box models that are often difficult to interpret. A novel application of ensemble learning in the field of automated machine learning is also pro- posed. All three methods have shown at least equivalent or improved performance than existing methods.</p>

Download Full-text

A large-scale comparative study on peptide encodings for biomedical classification

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab039 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Sebastian Spänig ◽

Siba Mohsen ◽

Georges Hattab ◽

Anne-Christin Hauschild ◽

Dominik Heider

Keyword(s):

Machine Learning ◽

Large Scale ◽

State Of The Art ◽

Multiple Datasets ◽

Wide Range ◽

Fixed Parameter ◽

Additional Sequence ◽

Automated Machine Learning ◽

Distinct Peptide ◽

Comprehensive Study

Abstract Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards.

Download Full-text

Solution of Levinthal’s Paradox and a Physical Theory of Protein Folding Times

Biomolecules ◽

10.3390/biom10020250 ◽

2020 ◽

Vol 10 (2) ◽

pp. 250

Author(s):

Dmitry N. Ivankov ◽

Alexei V. Finkelstein

Keyword(s):

Machine Learning ◽

Free Energy ◽

Protein Folding ◽

Physical Theory ◽

State Of The Art ◽

Energy Minimum ◽

Folding Kinetics ◽

Reasonable Time ◽

Current State ◽

Conceptual Aspect

“How do proteins fold?” Researchers have been studying different aspects of this question for more than 50 years. The most conceptual aspect of the problem is how protein can find the global free energy minimum in a biologically reasonable time, without exhaustive enumeration of all possible conformations, the so-called “Levinthal’s paradox.” Less conceptual but still critical are aspects about factors defining folding times of particular proteins and about perspectives of machine learning for their prediction. We will discuss in this review the key ideas and discoveries leading to the current understanding of folding kinetics, including the solution of Levinthal’s paradox, as well as the current state of the art in the prediction of protein folding times.

Download Full-text