Keeping it classy: classification of live fish and ghost PIT tags detected with a mobile PIT tag interrogation system using an innovative analytical approach

The ability of passive integrated transponder (PIT) tag data to improve demographic parameter estimates has led to the rapid advancement of PIT tag systems. However, ghost tags create uncertainty about detected tag status (i.e., live fish or ghost tag) when using mobile interrogation systems. We developed a method to differentiate between live fish and ghost tags using a random forest classification model with a novel data input structure based on known fate PIT tag detections in the San Juan River (New Mexico, Colorado, and Utah, USA). We used our model to classify detected tags with an overall error rate of 6.8% (1.6% ghost tags error rate and 21.8% live fish error rate). The important variables for classification were related to distance moved and response to monsoonal flood flows; however, habitat variables did not appear to influence model accuracy. Our results and approach allow the use of mobile detection data with confidence and allow for greater accuracy in movement, distribution, and habitat use studies, potentially helping identify influential management actions that would improve our ability to conserve and recover endangered fish.

Download Full-text

Fully Automated Detection of Paramagnetic Rims in Multiple Sclerosis Lesions on 3T Susceptibility-Based MR Imaging

10.1101/2020.08.31.276238 ◽

2020 ◽

Author(s):

Carolyn Lou ◽

Pascal Sati ◽

Martina Absinta ◽

Kelly Clark ◽

Jordan D. Dworkin ◽

...

Keyword(s):

Multiple Sclerosis ◽

Severe Disease ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Classification Model ◽

List Type ◽

Training Set ◽

Random Forest Classification ◽

Automated Method ◽

Forest Classification

AbstractBackground and PurposeThe presence of a paramagnetic rim around a white matter lesion has recently been shown to be a hallmark of a particular pathological type of multiple sclerosis (MS) lesion. Increased prevalence of these paramagnetic rim lesions (PRLs) is associated with a more severe disease course in MS. The identification of these lesions is time-consuming to perform manually. We present a method to automatically detect PRLs on 3T T2*-phase images.MethodsT1-weighted, T2-FLAIR, and T2*-phase MRI of the brain were collected at 3T for 19 subjects with MS. The images were then processed with lesion segmentation, lesion center detection, lesion labelling, and lesion-level radiomic feature extraction. A total of 877 lesions were identified, 118 (13%) of which contained a paramagnetic rim. We divided our data into a training set (15 patients, 673 lesions) and a testing set (4 patients, 204 lesions). We fit a random forest classification model on the training set and assessed our ability to classify lesions as PRL on the test set.ResultsThe number of PRLs per subject identified via our automated lesion labelling method was highly correlated with the gold standard count of PRLs per subject, r = 0.91 (95% CI [0.79, 0.97]). The classification algorithm using radiomic features can classify a lesion as PRL or not with an area under the curve of 0.80 (95% CI [0.67, 0.86]).ConclusionThis study develops a fully automated technique for the detection of paramagnetic rim lesions using standard T1 and FLAIR sequences and a T2*phase sequence obtained on 3T MR images.HighlightsA fully automated method for both the identification and classification of paramagnetic rim lesions is proposed.Radiomic features in conjunction with machine learning algorithms can accurately classify paramagnetic rim lesions.Challenges for classification are largely driven by heterogeneity between lesions, including equivocal rim signatures and lesion location.

Download Full-text

Epigenetic Analyses of Alcohol Consumption in Combustible and Non-Combustible Nicotine Product Users

Epigenomes ◽

10.3390/epigenomes5030018 ◽

2021 ◽

Vol 5 (3) ◽

pp. 18

Author(s):

Kelsey Dawes ◽

Luke Sampson ◽

Rachel Reimer ◽

Shelly Miller ◽

Robert Philibert ◽

...

Keyword(s):

Alcohol Consumption ◽

Self Report ◽

Classification Model ◽

Public And Private ◽

Random Forest Classification ◽

Epigenetic Biomarkers ◽

Forest Classification ◽

Heavy Alcohol Consumption ◽

Relationship Of ◽

The Relationship

Alcohol and tobacco use are highly comorbid and exacerbate the associated morbidity and mortality of either substance alone. However, the relationship of alcohol consumption to the various forms of nicotine-containing products is not well understood. To improve this understanding, we examined the relationship of alcohol consumption to nicotine product use using self-report, cotinine, and two epigenetic biomarkers specific for smoking (cg05575921) and drinking (Alcohol T Scores (ATS)) in n = 424 subjects. Cigarette users had significantly higher ATS values than the other groups (p < 2.2 × 10−16). Using the objective biomarkers, the intensity of nicotine and alcohol consumption was correlated in both the cigarette and smokeless users (R = −0.66, p = 3.1 × 10−14; R2 = 0.61, p = 1.97 × 10−4). Building upon this idea, we used the objective nicotine biomarkers and age to build and test a Balanced Random Forest classification model for heavy alcohol consumption (ATS > 2.35). The model performed well with an AUC of 0.962, 89.3% sensitivity, and 85% specificity. We conclude that those who use non-combustible nicotine products drink significantly less than smokers, and cigarette and smokeless users drink more with heavier nicotine use. These findings further highlight the lack of informativeness of self-reported alcohol consumption and suggest given the public and private health burden of alcoholism, further research into whether using non-combustible nicotine products as a mode of treatment for dual users should be considered.

Download Full-text

A Random Forest Classification Model for Transmission Line Image Processing

2020 15th International Conference on Computer Science & Education (ICCSE) ◽

10.1109/iccse49874.2020.9201900 ◽

2020 ◽

Author(s):

Zhang Bingzhen ◽

Qiao Xiaoming ◽

Yang Hemeng ◽

Zhou Zhubo

Keyword(s):

Image Processing ◽

Random Forest ◽

Transmission Line ◽

Classification Model ◽

Random Forest Classification ◽

Line Image ◽

Forest Classification

Download Full-text

Intelligent identification of effective reservoirs based on the random forest classification model

Journal of Hydrology ◽

10.1016/j.jhydrol.2020.125324 ◽

2020 ◽

Vol 591 ◽

pp. 125324 ◽

Cited By ~ 1

Author(s):

Jieyu Li ◽

Ping-an Zhong ◽

Minzhi Yang ◽

Feilin Zhu ◽

Juan Chen ◽

...

Keyword(s):

Random Forest ◽

Classification Model ◽

Random Forest Classification ◽

Forest Classification

Download Full-text

Nutrient Diagnosis of Eucalyptus at the Factor-Specific Level Using Machine Learning and Compositional Methods

Plants ◽

10.3390/plants9081049 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1049 ◽

Cited By ~ 3

Author(s):

Betania Vahl de Paula ◽

Wagner Squizani Arruda ◽

Léon Etienne Parent ◽

Elias Frank de Araujo ◽

Gustavo Brunetto

Keyword(s):

Area Under The Curve ◽

Local Scale ◽

Classification Model ◽

High Yield ◽

Specific Level ◽

Random Forest Classification ◽

Yield Level ◽

Forest Classification ◽

Factor Interactions ◽

Log Ratio

Brazil is home to 30% of the world’s Eucalyptus trees. The seedlings are fertilized at plantation to support biomass production until canopy closure. Thereafter, fertilization is guided by state standards that may not apply at the local scale where myriads of growth factors interact. Our objective was to customize the nutrient diagnosis of young Eucalyptus trees down to factor-specific levels. We collected 1861 observations across eight clones, 48 soil types, and 148 locations in southern Brazil. Cutoff diameter between low- and high-yielding specimens at breast height was set at 4.3 cm. The random forest classification model returned a relatively uninformative area under the curve (AUC) of 0.63 using tissue compositions only, and an informative AUC of 0.78 after adding local features. Compared to nutrient levels from quartile compatibility intervals of nutritionally balanced specimens at high-yield level, state guidelines appeared to be too high for Mg, B, Mn, and Fe and too low for Cu and Zn. Moreover, diagnosis using concentration ranges collapsed in the multivariate Euclidean hyper-space by denying nutrient interactions. Factor-specific diagnosis detected nutrient imbalance by computing the Euclidean distance between centered log-ratio transformed compositions of defective and successful neighbors at a local scale. Downscaling regional nutrient standards may thus fail to account for factor interactions at a local scale. Documenting factors at a local scale requires large datasets through close collaboration between stakeholders.

Download Full-text

Humboldtian Diagnosis of Peach Tree (Prunus persica) Nutrition Using Machine-Learning and Compositional Methods

Agronomy ◽

10.3390/agronomy10060900 ◽

2020 ◽

Vol 10 (6) ◽

pp. 900 ◽

Cited By ~ 3

Author(s):

Debora Leitzke Betemps ◽

Betania Vahl de Paula ◽

Serge-Étienne Parent ◽

Simone P. Galarça ◽

Newton A. Mayer ◽

...

Keyword(s):

Machine Learning ◽

Euclidean Distance ◽

Prunus Persica ◽

Nutrient Status ◽

Classification Model ◽

Data Sets ◽

Local Conditions ◽

Random Forest Classification ◽

Forest Classification ◽

Euclidean Distances

Regional nutrient ranges are commonly used to diagnose plant nutrient status. In contrast, local diagnosis confronts unhealthy to healthy compositional entities in comparable surroundings. Robust local diagnosis requires well-documented data sets processed by machine learning and compositional methods. Our objective was to customize nutrient diagnosis of peach (Prunus persica) trees at local scale. We collected 472 observations from commercial orchards and fertilizer trials across eleven cultivars of Prunus persica and six rootstocks in the state of Rio Grande do Sul (RS), Brazil. The random forest classification model returned an area under curve exceeding 0.80 and classification accuracy of 80% about yield cutoff of 16 Mg ha−1. Centered log ratios (clr) of foliar defective compositions have appropriate geometry to compute Euclidean distances from closest successful compositions in “enchanting islands”. Successful specimens closest to defective specimens as shown by Euclidean distance allowed reaching trustful fruit yields using site-specific corrective measures. Comparing tissue composition of low-yielding orchards to that of the closest successful neighbors in two major Brazilian peach-producing regions, regional diagnosis differed from local diagnosis, indicating that regional standards may fail to fit local conditions. Local diagnosis requires well-documented Humboldtian data sets that can be acquired through ethical collaboration between researchers and stakeholders.

Download Full-text

Random Forest classification model of basal stem rot disease caused by Ganoderma boninense in oil palm plantations

International Journal of Remote Sensing ◽

10.1080/01431161.2017.1331474 ◽

2017 ◽

Vol 38 (16) ◽

pp. 4683-4699 ◽

Cited By ~ 12

Author(s):

Heri Santoso ◽

Hiroshi Tani ◽

Xiufeng Wang

Keyword(s):

Random Forest ◽

Oil Palm ◽

Classification Model ◽

Stem Rot ◽

Ganoderma Boninense ◽

Basal Stem Rot ◽

Random Forest Classification ◽

Forest Classification ◽

Rot Disease

Download Full-text

Analysis of Frequency Bands of Uterine Electromyography Signals for the Detection of Preterm Birth

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210165 ◽

2021 ◽

Author(s):

Vinothini Selvaraju ◽

P.A. Karthick ◽

Ramakrishnan Swaminathan

Keyword(s):

Preterm Birth ◽

Random Forest ◽

Nearest Neighbor ◽

Peak Frequency ◽

Classification Model ◽

K Nearest Neighbor ◽

Frequency Bands ◽

Random Forest Classification ◽

Forest Classification ◽

Uterine Electromyography

In this work, an attempt has been made to analyze the influence of the frequencies bands in uterine electromyography (uEMG) signals on the detection of preterm birth. The signals recorded from the women’s abdomen during pregnancy are considered in this study. The signals are subjected to preprocessing using digital bandpass Butterworth filter and decomposed into different frequency bands namely, 0.3-1.0 Hz (F1), 1.0-2.0 Hz (F2) and 2.0-3.0Hz (F3). Spectral features namely, peak magnitude, peak frequency, mean frequency and median frequency are extracted from the power spectrum. Classification models namely, k-nearest neighbor, support vector machine and random forest are employed to distinguish the term and preterm conditions. The results show that the features extracted from these frequency bands are able to differentiate term and preterm condition. Particularly, the frequency band F3 performs better than other frequency bands. The features associated with these frequencies along with random forest classification model achieves a maximum accuracy of 75.2%. Thus, these measures could be used to accurately detect the preterm birth well in advance.

Download Full-text

A Pinnacle Technique for Detection of COVID-19 Fake News in Social Media

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8176.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 256-261

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

General Public ◽

Classification Model ◽

Fake News ◽

Random Forest Classification ◽

Forest Classification ◽

The World ◽

The Impact

Today the world is gripped with fear of the most infectious disease which was caused by a newly discovered virus namely corona and thus termed as COVID-19. This is a large group of viruses which severely affects humans. The world bears testimony to its contagious nature and rapidity of spreading the illness. 50l people got infected and 30l people died due to this pandemic all around the world. This made a wide impact for people to fear the epidemic around them. The death rate of male is more compared to female. This Pandemic news has caught the attention of the world and gained its momentum in almost all the media platforms. There was an array of creating and spreading of true as well as fake news about COVID-19 in the social media, which has become popular and a major concern to the general public who access it. Spreading such hot news in social media has become a new trend in acquiring familiarity and fan base. At the time it is undeniable that spreading of such fake news in and around creates lots of confusion and fear to the public. To stop all such rumors detection of fake news has become utmost important. To effectively detect the fake news in social media the emerging machine learning classification algorithms can be an appropriate method to frame the model. In the context of the COVID-19 pandemic, we investigated and implemented by collecting the training data and trained a machine learning model by using various machine learning algorithms to automatically detect the fake news about the Corona Virus. The machine learning algorithm used in this investigation is Naïve Bayes classifier and Random forest classification algorithm for the best results. A separate model for each classifier is created after the data preparation and feature extraction Techniques. The results obtained are compared and examined accurately to evaluate the accurate model. Our experiments on a benchmark dataset with random forest classification model showed a promising results with an overall accuracy of 94.06%. This experimental evaluation will prevent the general public to keep themselves out of their fear and to know and understand the impact of fast-spreading as well as misleading fake news.

Download Full-text

Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i6.3547 ◽

2021 ◽

Vol 5 (6) ◽

pp. 1083-1089

Author(s):

Nur Ghaniaviyanto Ramadhan

Keyword(s):

Nearest Neighbor ◽

Online News ◽

Classification Model ◽

Support Vector ◽

The Internet ◽

K Nearest Neighbor ◽

K Value ◽

Random Forest Classification ◽

Forest Classification ◽

Survey Results

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.

Download Full-text