“Now, I Want to Teach It for Real!”: Introducing Machine Learning as a Scientific Discovery Tool for K-12 Teachers

Author(s):  
Xiaofei Zhou ◽  
Jingwan Tang ◽  
Michael Daley ◽  
Saad Ahmad ◽  
Zhen Bai
Author(s):  
Francis J Alexander ◽  
James Ang ◽  
Jenna A Bilbrey ◽  
Jan Balewski ◽  
Tiernan Casey ◽  
...  

Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition to highly visible successes in machine-based natural language translation, playing the game Go, and self-driving cars, these new technologies also have profound implications for computational and experimental science and engineering, as well as for the exascale computing systems that the Department of Energy (DOE) is developing to support those disciplines. Not only do these learning technologies open up exciting opportunities for scientific discovery on exascale systems, they also appear poised to have important implications for the design and use of exascale computers themselves, including high-performance computing (HPC) for ML and ML for HPC. The overarching goal of the ExaLearn co-design project is to provide exascale ML software for use by Exascale Computing Project (ECP) applications, other ECP co-design centers, and DOE experimental facilities and leadership class computing facilities.


BioScience ◽  
2020 ◽  
Vol 70 (7) ◽  
pp. 610-620 ◽  
Author(s):  
Katelin D Pearson ◽  
Gil Nelson ◽  
Myla F J Aronson ◽  
Pierre Bonnet ◽  
Laura Brenskelle ◽  
...  

Abstract Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens—preserved plant material curated in natural history collections—but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth.


2018 ◽  
Vol 11 (4) ◽  
pp. 582-585
Author(s):  
Meghan Lowery ◽  
Joel Nadler ◽  
Dan J. Putka

The focal article (Lapierre et al., 2018) highlights many good suggestions but only briefly mentions partnering with an academically trained internal industrial and organizational (I-O) practitioner. We believe beginning a partnership with a similarly trained ally well-versed through training in academic language and through experience in “business speak” will yield a stronger end result. The appreciation for an internal I-O practitioner should not go overlooked; when an academic partners with the right practitioner in the right environment, the partnership can be mutually beneficial and more rewarding than other options. For instance, recently we collaborated to set up a partnership for scientific discovery and mutual interest that involved 12 teams representing 14 different institutions spanning academe and practice to conduct a machine learning competition. This partnership enabled many academics and practitioners access to a complex organizational dataset in order to contribute to both an organization and the Society for Industrial and Organizational Psychology (SIOP) community (see Putka et al., 2018).


Author(s):  
Francesco Gagliardi

The author introduces a machine learning system for cluster analysis to take on the problem of syndrome discovery in the clinical domain. A syndrome is a set of typical clinical features (a prototype) that appear together often enough to suggest they may represent a single, unknown, disease. The discovery of syndromes and relative taxonomy formation is therefore the critical early phase of the process of scientific discovery in the medical domain. The system proposed discovers syndromes following Eleanor Rosch’s prototype theory on how the human mind categorizes and forms taxonomies, and thereby to understand how humans perform these activities and to automate or assist the process of scientific discovery. The system implemented can be considered a scientific discovery support system as it can discover unknown syndromes to the advantage of subsequent clinical practices and research activities.


2014 ◽  
Vol 81 (1) ◽  
pp. 17-30 ◽  
Author(s):  
Ryan A. LaCroix ◽  
Troy E. Sandberg ◽  
Edward J. O'Brien ◽  
Jose Utrilla ◽  
Ali Ebrahim ◽  
...  

ABSTRACTAdaptive laboratory evolution (ALE) has emerged as an effective tool for scientific discovery and addressing biotechnological needs. Much of ALE's utility is derived from reproducibly obtained fitness increases. Identifying causal genetic changes and their combinatorial effects is challenging and time-consuming. Understanding how these genetic changes enable increased fitness can be difficult. A series of approaches that address these challenges was developed and demonstrated usingEscherichia coliK-12 MG1655 on glucose minimal media at 37°C. By keepingE. coliin constant substrate excess and exponential growth, fitness increases up to 1.6-fold were obtained compared to the wild type. These increases are comparable to previously reported maximum growth rates in similar conditions but were obtained over a shorter time frame. Across the eight replicate ALE experiments performed, causal mutations were identified using three approaches: identifying mutations in the same gene/region across replicate experiments, sequencing strains before and after computationally determined fitness jumps, and allelic replacement coupled with targeted ALE of reconstructed strains. Three genetic regions were most often mutated: the global transcription generpoB, an 82-bp deletion between the metabolicpyrEgene andrph, and an IS element between the DNA structural genehnsandtdk. Model-derived classification of gene expression revealed a number of processes important for increased growth that were missed using a gene classification system alone. The methods described here represent a powerful combination of technologies to increase the speed and efficiency of ALE studies. The identified mutations can be examined as genetic parts for increasing growth rate in a desired strain and for understanding rapid growth phenotypes.


Author(s):  
Bryan Wilder ◽  
Eric Horvitz ◽  
Ece Kamar

A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks. To date, systems aimed at complementing the skills of people have employed models trained to be as accurate as possible in isolation. We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams by considering the distinct abilities of people and machines. The goal is to focus machine learning on problem instances that are difficult for humans, while recognizing instances that are difficult for the machine and seeking human input on them. We demonstrate in two real-world domains (scientific discovery and medical diagnosis) that human-machine teams built via these methods outperform the individual performance of machines and people. We then analyze conditions under which this complementarity is strongest, and which training methods amplify it. Taken together, our work provides the first systematic investigation of how machine learning systems can be trained to complement human reasoning.


Author(s):  
Ben Scott ◽  
Laurence Livermore

The Natural History Museum holds over 80 million specimens and 300 million pages of scientific text. This information is a vital research tool to help solve the most important challenge humans face over the coming years – mapping a sustainable future for ourselves and the ecosystems on which we depend. Digitising these collections and providing the data in a structured, computable form is a mammoth challenge. As of 2020, less than 15% of available specimen information currently residing on specimen labels or physical registers is digitised and publicly available (Walton et al. 2020). Machine learning applications can deliver a step-change in our activities’ scope, scale, and speed (Borsch et al. 2020). As part of SYNTHESYS+, the Natural History Museum is leading on the development of a cloud-based workflow platform for natural science specimens, the Specimen Data Refinery (SDR) (Smith et al. 2019). The SDR will provide a series of Machine Learning (ML) models, ranging from semantic segmentation to identify regions of interest on labels, to natural language processing to extract locality and taxonomic text entities from the labels, and image analysis to identify specimen traits and collection quality metrics. Each ML task is atomic, with users of the SDR selecting which model would best extract data from their digitised specimen images, allowing the workflows to be used in different institutions worldwide. It also solves one of the key problems in developing ML-based applications: the rapidity at which models become obsolete. New ML models can be introduced into the workflow, with incremental changes to improve processing, without interruption or refactoring of the pipeline. Alongside specimens, digitised images of pages of scientific literature provide another vital source of data. Functional traits mediate the interactions between plant species and their environment and play roles in determining species’ range size and threatened status. Such information is contained within the taxonomic descriptions of species and a natural language processing library has been developed to locate and extract plant functional traits from these texts (Hoehndorf et al. 2016). The ML models allow complex interrelationships between taxa and trait entities to be inferred based on the grammatical structure of sentences, improving the accuracy and extent of data point extraction. These two projects, like many other applications of ML in natural history collections, are focused on the extraction of visible information, for example, a piece of text or a measurable trait. Given the image of the specimen or page, a person would be able to extract the self-same information. However, ML excels in pattern matching and inferring unknown characters from an entire corpus. At the museum, we have started exploring this space, with our voyagerAI project for identifying specimens collected on historical expeditions of scientific discovery (e.g., the voyages of the Beagle and Challenger). This process fills in the gaps in specimen provenance and identifies 'lost' specimens collected by some of the most famous names in biodiversity history. Developing new applications of ML to uncover scientific meaning and tell the narratives of our collections, will be at the forefront of our scientific innovation in the coming years. This presentation will give an overview of these projects, and our future plans for using ML to extract data at scale within the Natural History Museum.


2018 ◽  
Vol 22 (11) ◽  
pp. 5639-5656 ◽  
Author(s):  
Chaopeng Shen ◽  
Eric Laloy ◽  
Amin Elshorbagy ◽  
Adrian Albert ◽  
Jerad Bales ◽  
...  

Abstract. Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications and generating new and improved capabilities for scientific discovery and model building. The adoption of DL in hydrology has so far been gradual, but the field is now ripe for breakthroughs. This paper suggests that DL-based methods can open up a complementary avenue toward knowledge discovery in hydrologic sciences. In the new avenue, machine-learning algorithms present competing hypotheses that are consistent with data. Interrogative methods are then invoked to interpret DL models for scientists to further evaluate. However, hydrology presents many challenges for DL methods, such as data limitations, heterogeneity and co-evolution, and the general inexperience of the hydrologic field with DL. The roadmap toward DL-powered scientific advances will require the coordinated effort from a large community involving scientists and citizens. Integrating process-based models with DL models will help alleviate data limitations. The sharing of data and baseline models will improve the efficiency of the community as a whole. Open competitions could serve as the organizing events to greatly propel growth and nurture data science education in hydrology, which demands a grassroots collaboration. The area of hydrologic DL presents numerous research opportunities that could, in turn, stimulate advances in machine learning as well.


Sign in / Sign up

Export Citation Format

Share Document