Automatic classification of canine thoracic radiographs using deep learning

AbstractThe interpretation of thoracic radiographs is a challenging and error-prone task for veterinarians. Despite recent advancements in machine learning and computer vision, the development of computer-aided diagnostic systems for radiographs remains a challenging and unsolved problem, particularly in the context of veterinary medicine. In this study, a novel method, based on multi-label deep convolutional neural network (CNN), for the classification of thoracic radiographs in dogs was developed. All the thoracic radiographs of dogs performed between 2010 and 2020 in the institution were retrospectively collected. Radiographs were taken with two different radiograph acquisition systems and were divided into two data sets accordingly. One data set (Data Set 1) was used for training and testing and another data set (Data Set 2) was used to test the generalization ability of the CNNs. Radiographic findings used as non mutually exclusive labels to train the CNNs were: unremarkable, cardiomegaly, alveolar pattern, bronchial pattern, interstitial pattern, mass, pleural effusion, pneumothorax, and megaesophagus. Two different CNNs, based on ResNet-50 and DenseNet-121 architectures respectively, were developed and tested. The CNN based on ResNet-50 had an Area Under the Receive-Operator Curve (AUC) above 0.8 for all the included radiographic findings except for bronchial and interstitial patterns both on Data Set 1 and Data Set 2. The CNN based on DenseNet-121 had a lower overall performance. Statistically significant differences in the generalization ability between the two CNNs were evident, with the CNN based on ResNet-50 showing better performance for alveolar pattern, interstitial pattern, megaesophagus, and pneumothorax.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

Sequential Sampling for Estimation and Classification of the Incidence of Hop Powdery Mildew II: Cone Sampling

Plant Disease ◽

10.1094/pdis-91-8-1013 ◽

2007 ◽

Vol 91 (8) ◽

pp. 1013-1020 ◽

Cited By ~ 8

Author(s):

David H. Gent ◽

William W. Turechek ◽

Walter F. Mahaffee

Keyword(s):

Powdery Mildew ◽

Binomial Distribution ◽

Disease Incidence ◽

Sequential Sampling ◽

Model Construction ◽

Data Sets ◽

Data Set ◽

Sampling Plans ◽

Simulated Sampling

Sequential sampling models for estimation and classification of the incidence of powdery mildew (caused by Podosphaera macularis) on hop (Humulus lupulus) cones were developed using parameter estimates of the binary power law derived from the analysis of 221 transect data sets (model construction data set) collected from 41 hop yards sampled in Oregon and Washington from 2000 to 2005. Stop lines, models that determine when sufficient information has been collected to estimate mean disease incidence and stop sampling, for sequential estimation were validated by bootstrap simulation using a subset of 21 model construction data sets and simulated sampling of an additional 13 model construction data sets. Achieved coefficient of variation (C) approached the prespecified C as the estimated disease incidence, [Formula: see text], increased, although achieving a C of 0.1 was not possible for data sets in which [Formula: see text] < 0.03 with the number of sampling units evaluated in this study. The 95% confidence interval of the median difference between [Formula: see text] of each yard (achieved by sequential sampling) and the true p of the original data set included 0 for all 21 data sets evaluated at levels of C of 0.1 and 0.2. For sequential classification, operating characteristic (OC) and average sample number (ASN) curves of the sequential sampling plans obtained by bootstrap analysis and simulated sampling were similar to the OC and ASN values determined by Monte Carlo simulation. Correct decisions of whether disease incidence was above or below prespecified thresholds (pt) were made for 84.6 or 100% of the data sets during simulated sampling when stop lines were determined assuming a binomial or beta-binomial distribution of disease incidence, respectively. However, the higher proportion of correct decisions obtained by assuming a beta-binomial distribution of disease incidence required, on average, sampling 3.9 more plants per sampling round to classify disease incidence compared with the binomial distribution. Use of these sequential sampling plans may aid growers in deciding the order in which to harvest hop yards to minimize the risk of a condition called “cone early maturity” caused by late-season infection of cones by P. macularis. Also, sequential sampling could aid in research efforts, such as efficacy trials, where many hop cones are assessed to determine disease incidence.

Download Full-text

Variational inference using approximate likelihood under the coalescent with recombination

Genome Research ◽

10.1101/gr.273631.120 ◽

2021 ◽

pp. gr.273631.120

Author(s):

Xinhao Liu ◽

Huw A Ogilvie ◽

Luay Nakhleh

Keyword(s):

Simulated Data ◽

Variational Inference ◽

Divide And Conquer ◽

Data Sets ◽

Transition Rates ◽

Data Set ◽

Population Sizes ◽

Novel Method ◽

Approximate Likelihood ◽

Promising Avenue

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.

Download Full-text

Massive Data Classification of Neural Responses

Advances in Medical Technologies and Clinical Practice - Biomedical Diagnostics and Clinical Technologies ◽

10.4018/978-1-60566-280-0.ch009 ◽

2010 ◽

pp. 278-298

Author(s):

Pedro Tomás ◽

IST TU Lisbon ◽

Aleksandar Ilic ◽

Leonel Sousa

Keyword(s):

Execution Time ◽

Data Parallelism ◽

Data Sets ◽

Neural Responses ◽

Neuronal Responses ◽

Data Set ◽

Web Interfaces ◽

Mass Classification ◽

Neuronal Code

When analyzing the neuronal code, neuroscientists usually perform extra-cellular recordings of neuronal responses (spikes). Since the size of the microelectrodes used to perform these recordings is much larger than the size of the cells, responses from multiple neurons are recorded by each micro-electrode. Thus, the obtained response must be classified and evaluated, in order to identify how many neurons were recorded, and to assess which neuron generated each spike. A platform for the mass-classification of neuronal responses is proposed in this chapter, employing data-parallelism for speeding up the classification of neuronal responses. The platform is built in a modular way, supporting multiple web-interfaces, different back-end environments for parallel computing or different algorithms for spike classification. Experimental results on the proposed platform show that even for an unbalanced data set of neuronal responses the execution time was reduced of about 45%. For balanced data sets, the platform may achieve a reduction in execution time equal to the inverse of the number of back-end computational elements.

Download Full-text

Bagging Approach for Medical Plants Recognition Based on Their DNA Sequences

International Journal of Social Ecology and Sustainable Development ◽

10.4018/ijsesd.2018100103 ◽

2018 ◽

Vol 9 (4) ◽

pp. 45-60

Author(s):

Mohamed Elhadi Rahmani ◽

Abdelmalek Amine ◽

Reda Mohamed Hamou

Keyword(s):

Dna Sequences ◽

Majority Vote ◽

Data Sets ◽

Data Set ◽

Drug Production ◽

Medical Plants

Many drugs in modern medicines originate from plants and the first step in drug production, is the recognition of plants needed for this purpose. This article presents a bagging approach for medical plants recognition based on their DNA sequences. In this work, the authors have developed a system that recognize DNA sequences of 14 medical plants, first they divided the 14-class data set into bi class sub-data sets, then instead of using an algorithm to classify the 14-class data set, they used the same algorithm to classify the sub-data sets. By doing so, they have simplified the problem of classification of 14 plants into sub-problems of bi class classification. To construct the subsets, the authors extracted all possible pairs of the 14 classes, so they gave each class more chances to be well predicted. This approach allows the study of the similarity between DNA sequences of a plant with each other plants. In terms of results, the authors have obtained very good results in which the accuracy has been doubled (from 45% to almost 80%). Classification of a new sequence was completed according to majority vote.

Download Full-text

Classification with Local Clustering in Imbalanced Data Sets

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.151 ◽

2011 ◽

Vol 219-220 ◽

pp. 151-155 ◽

Cited By ~ 2

Author(s):

Hua Ji ◽

Hua Xiang Zhang

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Imbalanced Data Sets ◽

Local Clustering ◽

Rare Class ◽

Novel Method ◽

The Cost

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.

Download Full-text

A Novel Method to Detect Bias in Short Read NGS Data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0025 ◽

2017 ◽

Vol 14 (3) ◽

Cited By ~ 1

Author(s):

Jamie Alnasir ◽

Hugh P. Shanahan

Keyword(s):

Biological Significance ◽

Gc Content ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Short Read ◽

Novel Method ◽

Type Data ◽

Ngs Data

AbstractDetecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.

Download Full-text

Modified Deep Neural Networks for Dog Breeds Identification

10.20944/preprints201812.0232.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Aydin Ayanzadeh ◽

Sahand Vahidnia

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Fine Tuning ◽

Test Accuracy ◽

Data Sets ◽

Data Set

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.

Download Full-text

Supernova Host Galaxy Association and Photometric Classification of over 10,000 Light Curves from the Zwicky Transient Facility

Research Notes of the AAS ◽

10.3847/2515-5172/ac416e ◽

2021 ◽

Vol 5 (12) ◽

pp. 283

Author(s):

Braden Garretson ◽

Dan Milisavljevic ◽

Jack Reynolds ◽

Kathryn E. Weil ◽

Bhagya Subrayan ◽

...

Keyword(s):

Value Added ◽

Light Curves ◽

Host Galaxy ◽

Massive Data ◽

Data Sets ◽

Data Set ◽

Scale Modeling ◽

Final Data ◽

Type Ia

Abstract Here we present a catalog of 12,993 photometrically-classified supernova-like light curves from the Zwicky Transient Facility, along with candidate host galaxy associations. By training a random forest classifier on spectroscopically classified supernovae from the Bright Transient Survey, we achieve an accuracy of 80% across four supernova classes resulting in a final data set of 8208 Type Ia, 2080 Type II, 1985 Type Ib/c, and 720 SLSN. Our work represents a pathfinder effort to supply massive data sets of supernova light curves with value-added information that can be used to enable population-scale modeling of explosion parameters and investigate host galaxy environments.

Download Full-text

Classification of Microchannel Flame Regimes Based on Convolutional Neural Networks

10.1115/power2021-64437 ◽

2021 ◽

Author(s):

Seyed Navid Roohani Isfahani ◽

Vinicius M. Sauer ◽

Ingmar Schoegl

Keyword(s):

Data Augmentation ◽

Dynamic Range ◽

High Dynamic Range ◽

Data Sets ◽

Data Set ◽

Testing Data ◽

Experimental Approaches ◽

Transition Points ◽

Combustion Regimes

Abstract Micro-combustion has shown significant potential to study and characterize the combustion behavior of hydrocarbon fuels. Among several experimental approaches based on this method, the most prominent one employs an externally heated micro-channel. Three distinct combustion regimes are reported for this device namely, weak flames, flames with repetitive extinction and ignition (FREI), and normal flames, which are formed at low, moderate, and high flow rate ranges, respectively. Within each flame regime, noticeable differences exist in both shape and luminosity where transition points can be used to obtain insights into fuel characteristics. In this study, flame images are obtained using a monochrome camera equipped with a 430 nm bandpass filter to capture the chemiluminescence signal emitted by the flame. Sequences of conventional flame photographs are taken during the experiment, which are computationally merged to generate high dynamic range (HDR) images. In a highly diluted fuel/oxidizer mixture, it is observed that FREI disappear and are replaced by a gradual and direct transition between weak and normal flames which makes it hard to identify different combustion regimes. To resolve the issue, a convolutional neural network (CNN) is introduced to classify the flame regime. The accuracy of the model is calculated to be 99.34, 99.66, and 99.83% for “training”, “validation”, and “testing” data-sets, respectively. This level of accuracy is achieved by conducting a grid search to acquire optimized parameters for CNN. Furthermore, a data augmentation technique based on different experimental scenarios is used to generate flame images to increase the size of the data-set.

Download Full-text