scholarly journals Citizen science, computing, and conservation: How can “Crowd AI” change the way we tackle large-scale ecological challenges?

2021 ◽  
Vol 8 (2) ◽  
pp. 54-75
Author(s):  
Meredith S. Palmer ◽  
Sarah E. Huebner ◽  
Marco Willi ◽  
Lucy Fortson ◽  
Craig Packer

Camera traps - remote cameras that capture images of passing wildlife - have become a ubiquitous tool in ecology and conservation. Systematic camera trap surveys generate ‘Big Data’ across broad spatial and temporal scales, providing valuable information on environmental and anthropogenic factors affecting vulnerable wildlife populations. However, the sheer number of images amassed can quickly outpace researchers’ ability to manually extract data from these images (e.g., species identities, counts, and behaviors) in timeframes useful for making scientifically-guided conservation and management decisions. Here, we present ‘Snapshot Safari’ as a case study for merging citizen science and machine learning to rapidly generate highly accurate ecological Big Data from camera trap surveys. Snapshot Safari is a collaborative cross-continental research and conservation effort with 1500+ cameras deployed at over 40 eastern and southern Africa protected areas, generating millions of images per year. As one of the first and largest-scale camera trapping initiatives, Snapshot Safari spearheaded innovative developments in citizen science and machine learning. We highlight the advances made and discuss the issues that arose using each of these methods to annotate camera trap data. We end by describing how we combined human and machine classification methods (‘Crowd AI’) to create an efficient integrated data pipeline. Ultimately, by using a feedback loop in which humans validate machine learning predictions and machine learning algorithms are iteratively retrained on new human classifications, we can capitalize on the strengths of both methods of classification while mitigating the weaknesses. Using Crowd AI to quickly and accurately ‘unlock’ ecological Big Data for use in science and conservation is revolutionizing the way we take on critical environmental issues in the Anthropocene era.

Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


Author(s):  
Bradford William Hesse

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.


Author(s):  
Tom Hart ◽  
Fiona Jones ◽  
Caitlin Black ◽  
Chris Lintott ◽  
Casey Youngflesh ◽  
...  

Many of the species in decline around the world are subject to different environmental stressors across their range, so replicated large-scale monitoring programmes, are necessary to disentangle the relative impacts of these threats. At the same time as funding for long-term monitoring is being cut, studies are increasingly being criticised for lacking statistical power. For those taxa or environments where a single vantage point can observe individuals or ecological processes, time-lapse cameras can provide a cost-effective way of collecting time series data replicated at large spatial scales that would otherwise be impossible. However, networks of time-lapse cameras needed to cover the range of species or processes create a problem in that the scale of data collection easily exceeds our ability to process the raw imagery manually. Citizen science and machine learning provide solutions to scaling up data extraction (such as locating all animals in an image). Crucially, citizen science, machine learning-derived classifiers, and the intersection between them, are key to understanding how to establish monitoring systems that are sensitive to – and sufficiently powerful to detect –changes in the study system. Citizen science works relatively ‘out of the box’, and we regard it as a first step for many systems until machine learning algorithms are sufficiently trained to automate the process. Using Penguin Watch (www.penguinwatch.org) data as a case study, we discuss a complete workflow from images to parameter estimation and interpretation: the use of citizen science and computer vision for image processing, and parameter estimation and individual recognition for investigating biological questions. We discuss which techniques are easily generalizable to a range of questions, and where more work is needed to supplement ‘out of the box’ tools. We conclude with a horizon scan of the advances in camera technology, such as on-board computer vision and decision making.


2020 ◽  
pp. 1-12
Author(s):  
Lejie Wang

Since the reform began in our country, with the rapid economic growth in recent years, the income level has grown extremely unequal, and it is difficult for the low-income poor to benefit from the rapid economic growth. The most important prerequisite for the fight against poverty is the accurate identification of the causes of poverty. To date, our country has not reached the level of maturity required to accurately study the causes of poverty in various households. However, with the rapid development of Internet technology and big data technology in recent years, the application of large-scale data technology and data extraction algorithms to poverty reduction can identify truly poor households faster and more accurately. Compared with traditional machine learning algorithms, there are no machine storage and technical constraints, can use a large amount of data and rely on multiple data samples.


2020 ◽  
Author(s):  
Dianne Scherly Varela de Medeiros ◽  
Helio do Nascimento Cunha Neto ◽  
Martin Andreoni Lopez ◽  
Luiz Claudio Schara Magalhães ◽  
Natalia Castro Fernandes ◽  
...  

Abstract In this paper we focus on knowledge extraction from large-scale wireless networks through stream processing. We present the primary methods for sampling, data collection, and monitoring of wireless networks and we characterize knowledge extraction as a machine learning problem on big data stream processing. We show the main trends in big data stream processing frameworks. Additionally, we explore the data preprocessing, feature engineering, and the machine learning algorithms applied to the scenario of wireless network analytics. We address challenges and present research projects in wireless network monitoring and stream processing. Finally, future perspectives, such as deep learning and reinforcement learning in stream processing, are anticipated.


2021 ◽  
Vol 13 (18) ◽  
pp. 10287
Author(s):  
Matyáš Adam ◽  
Pavel Tomášek ◽  
Jiří Lehejček ◽  
Jakub Trojan ◽  
Tomáš Jůnek

Camera traps are increasingly one of the fundamental pillars of environmental monitoring and management. Even outside the scientific community, thousands of camera traps in the hands of citizens may offer valuable data on terrestrial vertebrate fauna, bycatch data in particular, when guided according to already employed standards. This provides a promising setting for Citizen Science initiatives. Here, we suggest a possible pathway for isolated observations to be aggregated into a single database that respects the existing standards (with a proposed extension). Our approach aims to show a new perspective and to update the recent progress in engaging the enthusiasm of citizen scientists and in including machine learning processes into image classification in camera trap research. This approach (combining machine learning and the input from citizen scientists) may significantly assist in streamlining the processing of camera trap data while simultaneously raising public environmental awareness. We have thus developed a conceptual framework and analytical concept for a web-based camera trap database, incorporating the above-mentioned aspects that respect a combination of the roles of experts’ and citizens’ evaluations, the way of training a neural network and adding a taxon complexity index. This initiative could well serve scientists and the general public, as well as assisting public authorities to efficiently set spatially and temporarily well-targeted conservation policies.


2020 ◽  
Author(s):  
Dianne Scherly Varela de Medeiros ◽  
Helio do Nascimento Cunha Neto ◽  
Martin Andreoni Lopez ◽  
Luiz Claudio Schara Magalhães ◽  
Natalia Castro Fernandes ◽  
...  

Abstract In this paper we focus on knowledge extraction from large-scale wireless networks through stream processing. We present the primary methods for sampling, data collection, and monitoring of wireless networks and we characterize knowledge extraction as a machine learning problem on big data stream processing. We show the main trends in big data stream processing frameworks. Additionally, we explore the data preprocessing, feature engineering, and the machine learning algorithms applied to the scenario of wireless network analytics. We address challenges and present research projects in wireless network monitoring and stream processing. Finally, future perspectives, such as deep learning and reinforcement learning in stream processing, are anticipated.


Author(s):  
Supun Kamburugamuve ◽  
Pulasthi Wickramasinghe ◽  
Saliya Ekanayake ◽  
Geoffrey C Fox

With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on large data sets as well as they need to be executed with minimal time in order to extract useful information in a time-constrained environment. Message passing interface (MPI) is a widely used model for developing such algorithms in high-performance computing paradigm, while Apache Spark and Apache Flink are emerging as big data platforms for large-scale parallel machine learning. Even though these big data frameworks are designed differently, they follow the data flow model for execution and user APIs. Data flow model offers fundamentally different capabilities than the MPI execution model, but the same type of parallelism can be used in applications developed in both models. This article presents three distinct machine learning algorithms implemented in MPI, Spark, and Flink and compares their performance and identifies strengths and weaknesses in each platform.


Big Data ◽  
2016 ◽  
pp. 887-898
Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


2021 ◽  
Author(s):  
Karyna Rodriguez ◽  
Neil Hodgson

<p>Seismic data has been and continues to be the main tool for hydrocarbon exploration. Storing very large quantities of seismic data, as well as making it easily accessible and with machine learning functionality, is the way forward to gain regional and local understanding of petroleum systems. Seismic data has been made available as a streamed service through a web-based platform allowing seismic data access on the spot, from large datasets stored in the cloud. A data lake can be defined as transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. The global library of data has been deconstructed from the rigid flat file format traditionally associated with seismic and transformed into a distributed, scalable, big data store. This allows for rapid access, complex queries, and efficient use of computer power – fundamental criteria for enabling Big Data technologies such as deep learning.  </p><p>This data lake concept is already changing the way we access seismic data, enhancing the efficiency of gaining insights into any hydrocarbon basin. Examples include the identification of potentially prolific mixed turbidite/contourite systems in the Trujillo Basin offshore Peru, together with important implications of BSR-derived geothermal gradients, which are much higher than expected in a fore arc setting, opening new exploration opportunities. Another example is de-risking and ranking of offshore Malvinas Basin blocks by gaining new insights into areas until very recently considered to be non-prospective. Further de-risking was achieved by carrying out an in-depth source rock analysis in the Malvinas and conjugate southern South Africa Basins. Additionally, the data lake enabled the development of machine learning algorithms for channel recognition which were successfully applied to data offshore Australia and Norway.</p><p>“On demand” regional seismic dataset access is proving invaluable in our efforts to make hydrocarbon exploration more efficient and successful. Machine learning algorithms are helping to automate the more mechanical tasks, leaving time for the more valuable task of analysing the results. The geological insights gained by combining these 2 aspects confirm the value of seismic data lakes.</p>


Sign in / Sign up

Export Citation Format

Share Document