Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning

Frontiers in Plant Science ◽

10.3389/fpls.2021.806407 ◽

2022 ◽

Vol 12 ◽

Author(s):

Barnaby E. Walker ◽

Allan Tucker ◽

Nicky Nicolson

Keyword(s):

Large Scale ◽

New Technologies ◽

Representation Learning ◽

Natural World ◽

Machine Learning Techniques ◽

Herbarium Specimens ◽

Learning Techniques ◽

Rich Information ◽

The Rich ◽

Computational Resources

The mobilization of large-scale datasets of specimen images and metadata through herbarium digitization provide a rich environment for the application and development of machine learning techniques. However, limited access to computational resources and uneven progress in digitization, especially for small herbaria, still present barriers to the wide adoption of these new technologies. Using deep learning to extract representations of herbarium specimens useful for a wide variety of applications, so-called “representation learning,” could help remove these barriers. Despite its recent popularity for camera trap and natural world images, representation learning is not yet as popular for herbarium specimen images. We investigated the potential of representation learning with specimen images by building three neural networks using a publicly available dataset of over 2 million specimen images spanning multiple continents and institutions. We compared the extracted representations and tested their performance in application tasks relevant to research carried out with herbarium specimens. We found a triplet network, a type of neural network that learns distances between images, produced representations that transferred the best across all applications investigated. Our results demonstrate that it is possible to learn representations of specimen images useful in different applications, and we identify some further steps that we believe are necessary for representation learning to harness the rich information held in the worlds’ herbaria.

Download Full-text

Assessing the Accuracy of Fault Interpretation using Machine Learning Techniques when Risking Faults for CO2 Storage Site Assessment

Interpretation ◽

10.1190/int-2021-0077.1 ◽

2021 ◽

pp. 1-55

Author(s):

Emma A. H. Michie ◽

Behzad Alaei ◽

Alvar Braathen

Keyword(s):

Deep Learning ◽

New Technologies ◽

Co2 Storage ◽

Fault Reactivation ◽

Machine Learning Techniques ◽

Close Similarity ◽

Storage Site ◽

Site Assessment ◽

Learning Techniques ◽

Fault Interpretation

Generating an accurate model of the subsurface for the purpose of assessing the feasibility of a CO2 storage site is crucial. In particular, how faults are interpreted is likely to influence the predicted capacity and integrity of the reservoir; whether this is through identifying high risk areas along the fault, where fluid is likely to flow across the fault, or by assessing the reactivation potential of the fault with increased pressure, causing fluid to flow up the fault. New technologies allow users to interpret faults effortlessly, and in much quicker time, utilizing methods such as Deep Learning. These Deep Learning techniques use knowledge from Neural Networks to allow end-users to compute areas where faults are likely to occur. Although these new technologies may be attractive due to reduced interpretation time, it is important to understand the inherent uncertainties in their ability to predict accurate fault geometries. Here, we compare Deep Learning fault interpretation versus manual fault interpretation, and can see distinct differences to those faults where significant ambiguity exists due to poor seismic resolution at the fault; we observe an increased irregularity when Deep Learning methods are used over conventional manual interpretation. This can result in significant differences between the resulting analyses, such as fault reactivation potential. Conversely, we observe that well-imaged faults show a close similarity between the resulting fault surfaces when both Deep Learning and manual fault interpretation methods are employed, and hence we also observe a close similarity between any attributes and fault analyses made.

Download Full-text

Detection of Botnet Based Attacks on Network

Handbook of Research on Network Forensics and Analysis Techniques - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-4100-4.ch007 ◽

2018 ◽

pp. 101-116

Author(s):

Prachi

Keyword(s):

Large Scale ◽

Flow Analysis ◽

Traffic Analysis ◽

High Accuracy ◽

Machine Learning Techniques ◽

Botnet Detection ◽

Learning Techniques ◽

Proposed Model ◽

Benchmark Datasets ◽

Traffic Flow Analysis

This chapter describes how with Botnets becoming more and more the leading cyber threat on the web nowadays, they also serve as the key platform for carrying out large-scale distributed attacks. Although a substantial amount of research in the fields of botnet detection and analysis, bot-masters inculcate new techniques to make them more sophisticated, destructive and hard to detect with the help of code encryption and obfuscation. This chapter proposes a new model to detect botnet behavior on the basis of traffic analysis and machine learning techniques. Traffic analysis behavior does not depend upon payload analysis so the proposed technique is immune to code encryption and other evasion techniques generally used by bot-masters. This chapter analyzes the benchmark datasets as well as real-time generated traffic to determine the feasibility of botnet detection using traffic flow analysis. Experimental results clearly indicate that a proposed model is able to classify the network traffic as a botnet or as normal traffic with a high accuracy and low false-positive rates.

Download Full-text

Standing out in a networked communication context: Toward a network contingency model of public attention

New Media & Society ◽

10.1177/1461444820939445 ◽

2020 ◽

pp. 146144482093944

Author(s):

Aimei Yang ◽

Adam J Saffer

Keyword(s):

Social Media ◽

Large Scale ◽

Cost Effective ◽

Machine Learning Techniques ◽

Public Attention ◽

Combine Data ◽

Contingency Model ◽

Learning Techniques ◽

Community Strategy ◽

Communication Context

Social media can offer strategic communicators cost-effective opportunities to reach millions of individuals. However, in practice it can be difficult to be heard in these crowded digital spaces. This study takes a strategic network perspective and draws from recent research in network science to propose the network contingency model of public attention. This model argues that in the networked social-mediated environment, an organization’s ability to attract public attention on social media is contingent on its ability to fit its network position with the network structure of the communication context. To test the model, we combine data mining, social network analysis, and machine-learning techniques to analyze a large-scale Twitter discussion network. The results of our analysis of Twitter discussion around the refugee crisis in 2016 suggest that in high core-periphery network contexts, “star” positions were most influential whereas in low core-periphery network contexts, a “community” strategy is crucial to attracting public attention.

Download Full-text

Survey of Machine Learning Techniques in Drug Discovery

Current Drug Metabolism ◽

10.2174/1389200219666180820112457 ◽

2019 ◽

Vol 20 (3) ◽

pp. 185-193 ◽

Cited By ~ 47

Author(s):

Natalie Stephenson ◽

Emily Shane ◽

Jessica Chase ◽

Jason Rowland ◽

David Ries ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

State Of The Art ◽

New Drugs ◽

Machine Learning Techniques ◽

Current Status ◽

Learning Techniques ◽

Pharmaceutical Industries ◽

Current Stage

Background:Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery.Methods:We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery.Results:Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.Conclusion:The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.

Download Full-text

Author Correction: Chemically intuited, large-scale screening of MOFs by machine learning techniques

npj Computational Materials ◽

10.1038/s41524-017-0051-x ◽

2017 ◽

Vol 3 (1) ◽

Cited By ~ 4

Author(s):

Giorgos Borboudakis ◽

Taxiarchis Stergiannakos ◽

Maria Frysali ◽

Emmanuel Klontzas ◽

Ioannis Tsamardinos ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Large Scale Screening

Download Full-text

Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into Applications of Bioinformatics and Machine Learning from Studies of Degenerative Cervical Myelopathy

10.5772/intechopen.98713 ◽

2021 ◽

Author(s):

Kalum J. Ost ◽

David W. Anderson ◽

David W. Cadotte

Keyword(s):

Machine Learning ◽

Precision Medicine ◽

New Technologies ◽

Machine Learning Techniques ◽

Massive Datasets ◽

Learning Framework ◽

Learning Techniques ◽

Machine Learning Approach ◽

Spinal Cord Disorders ◽

Degenerative Cervical Myelopathy

With the common adoption of electronic health records and new technologies capable of producing an unprecedented scale of data, a shift must occur in how we practice medicine in order to utilize these resources. We are entering an era in which the capacity of even the most clever human doctor simply is insufficient. As such, realizing “personalized” or “precision” medicine requires new methods that can leverage the massive amounts of data now available. Machine learning techniques provide one important toolkit in this venture, as they are fundamentally designed to deal with (and, in fact, benefit from) massive datasets. The clinical applications for such machine learning systems are still in their infancy, however, and the field of medicine presents a unique set of design considerations. In this chapter, we will walk through how we selected and adjusted the “Progressive Learning framework” to account for these considerations in the case of Degenerative Cervical Myeolopathy. We additionally compare a model designed with these techniques to similar static models run in “perfect world” scenarios (free of the clinical issues address), and we use simulated clinical data acquisition scenarios to demonstrate the advantages of our machine learning approach in providing personalized diagnoses.

Download Full-text

COMPARATIVE ANALYSIS AND EVALUATION OF THE APPLICATION OF DEEP LEARNING TECHNIQUES TO CYBERSECURITY DATASETS

DYNA INGENIERIA E INDUSTRIA ◽

10.6036/10007 ◽

2021 ◽

Vol 96 (5) ◽

pp. 528-533

Author(s):

XAVIER LARRIVA NOVO ◽

MARIO VEGA BARBAS ◽

VICTOR VILLAGRA ◽

JULIO BERROCAL

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

New Technologies ◽

Short Term Memory ◽

Machine Learning Techniques ◽

Short Term ◽

Term Memory ◽

Learning Techniques ◽

Long Short Term Memory

Cybersecurity has stood out in recent years with the aim of protecting information systems. Different methods, techniques and tools have been used to make the most of the existing vulnerabilities in these systems. Therefore, it is essential to develop and improve new technologies, as well as intrusion detection systems that allow detecting possible threats. However, the use of these technologies requires highly qualified cybersecurity personnel to analyze the results and reduce the large number of false positives that these technologies presents in their results. Therefore, this generates the need to research and develop new high-performance cybersecurity systems that allow efficient analysis and resolution of these results. This research presents the application of machine learning techniques to classify real traffic, in order to identify possible attacks. The study has been carried out using machine learning tools applying deep learning algorithms such as multi-layer perceptron and long-short-term-memory. Additionally, this document presents a comparison between the results obtained by applying the aforementioned algorithms and algorithms that are not deep learning, such as: random forest and decision tree. Finally, the results obtained are presented, showing that the long-short-term-memory algorithm is the one that provides the best results in relation to precision and logarithmic loss.

Download Full-text

A large, open source dataset of stroke anatomical brain images and manual lesion segmentations

10.1101/179614 ◽

2017 ◽

Cited By ~ 2

Author(s):

Sook-Lei Liew ◽

Julia M. Anglin ◽

Nick W. Banks ◽

Matt Sondag ◽

Kaori L. Ito ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Stroke Recovery ◽

Machine Learning Techniques ◽

Lesion Segmentation ◽

Brain Images ◽

Learning Techniques ◽

Segmentation Methods ◽

Segmentation Algorithms

AbstractStroke is the leading cause of adult disability worldwide, with up to two-thirds of individuals experiencing long-term disabilities. Large-scale neuroimaging studies have shown promise in identifying robust biomarkers (e.g., measures of brain structure) of long-term stroke recovery following rehabilitation. However, analyzing large rehabilitation-related datasets is problematic due to barriers in accurate stroke lesion segmentation. Manually-traced lesions are currently the gold standard for lesion segmentation on T1-weighted MRIs, but are labor intensive and require anatomical expertise. While algorithms have been developed to automate this process, the results often lack accuracy. Newer algorithms that employ machine-learning techniques are promising, yet these require large training datasets to optimize performance. Here we present ATLAS (Anatomical Tracings of Lesions After Stroke), an open-source dataset of 304 T1-weighted MRIs with manually segmented lesions and metadata. This large, diverse dataset can be used to train and test lesion segmentation algorithms and provides a standardized dataset for comparing the performance of different segmentation methods. We hope ATLAS release 1.1 will be a useful resource to assess and improve the accuracy of current lesion segmentation methods.

Download Full-text