From village to globe: A dynamic real-time map of African fields through PlantVillage

AbstractA major bottleneck to the application of machine learning tools to satellite data of African farms is the lack of high-quality ground truth data. Here we describe a high throughput method using youth in Kenya that results in a cost-effective method for high-quality data in near real-time. This data is presented to the global community, as a public good, on the day it is collected and is linked to other data sources that will inform our understanding of crop stress, particularly in the context of climate change.

Download Full-text

From Village to Globe: A Dynamic Real-Time Map of African Fields Through PlantVillage

Frontiers in Sustainable Food Systems ◽

10.3389/fsufs.2021.514785 ◽

2021 ◽

Vol 5 ◽

Author(s):

Annalyse Kehs ◽

Peter McCloskey ◽

John Chelal ◽

Derek Morr ◽

Stellah Amakove ◽

...

Keyword(s):

Real Time ◽

Ground Truth ◽

Cost Effective ◽

Quality Data ◽

Learning Tools ◽

High Quality ◽

Ground Truth Data ◽

Cost Effective Method ◽

High Throughput Method ◽

Major Bottleneck

A major bottleneck to the application of machine learning tools to satellite data of African farms is the lack of high-quality ground truth data. Here we describe a high throughput method using youth in Kenya that results in a cost-effective method for high-quality data in near real-time. This data is presented to the global community, as a public good and is linked to other data sources that will inform our understanding of crop stress, particularly in the context of climate change.

Download Full-text

Crowdsourcing Image Analysis for Plant Phenomics to Generate Ground Truth Data for Machine Learning

10.1101/265918 ◽

2018 ◽

Author(s):

Naihui Zhou ◽

Zachary D Siegel ◽

Scott Zarecor ◽

Nigel Lee ◽

Darwin A Campbell ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Best Practices ◽

Ground Truth ◽

Training Data ◽

Quality Data ◽

High Quality ◽

Ground Truth Data ◽

Plant Phenomics

AbstractThe accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.Author SummaryFood security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, butthey require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel – the male flower of the corn plant – from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.

Download Full-text

A QUICK, EFFICIENT, AND COST-EFFECTIVE METHOD FOR ISOLATING HIGH-QUALITY TOTAL RNA FROM TOMATO FRUITS, SUITABLE FOR MOLECULAR BIOLOGY STUDIES

Preparative Biochemistry & Biotechnology ◽

10.1080/10826068.2013.833109 ◽

2013 ◽

Vol 44 (4) ◽

pp. 418-431 ◽

Cited By ~ 1

Author(s):

Alireza Ghannad Sabzevari ◽

Ramin Hosseini

Keyword(s):

Molecular Biology ◽

Cost Effective ◽

High Quality ◽

Tomato Fruits ◽

Total Rna ◽

Cost Effective Method

Download Full-text

Perylene diimide–Cu2+ based fluorescent nanoparticles for the detection of spermine in clinical and food samples: a step toward the development of a diagnostic kit as a POCT tool for spermine

Journal of Materials Chemistry B ◽

10.1039/c9tb02039j ◽

2019 ◽

Vol 7 (45) ◽

pp. 7218-7227 ◽

Cited By ~ 11

Author(s):

Kapil Kumar ◽

Sandeep Kaur ◽

Satwinderjeet Kaur ◽

Gaurav Bhargava ◽

Subodh Kumar ◽

...

Keyword(s):

Real Time ◽

Cost Effective ◽

Food Samples ◽

Fermented Food ◽

Perylene Diimide ◽

Fluorescent Nanoparticles ◽

Solution Form ◽

Cost Effective Method ◽

Real Time Detection

EA-PDI∩Cu2+ complex can be established as cost-effective method to develop diagnostic kit for POCT of spermine for real time detection of spermine in vapor and solution form released from fermented food samples.

Download Full-text

Design and Evaluation of a Crowdsourcing Precision Agriculture Mobile Application for Lambsquarters, Mission LQ

Agronomy ◽

10.3390/agronomy11101951 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1951

Author(s):

Brianna B. Posadas ◽

Mamatha Hanumappa ◽

Kim Niewolny ◽

Juan E. Gilbert

Keyword(s):

Precision Agriculture ◽

Mobile Application ◽

Ground Truth ◽

High Quality ◽

Classification Rate ◽

Human Centered Design ◽

Ground Truth Data ◽

System Usability Scale ◽

Final Design ◽

Design Protocol

Precision agriculture is highly dependent on the collection of high quality ground truth data to validate the algorithms used in prescription maps. However, the process of collecting ground truth data is labor-intensive and costly. One solution to increasing the collection of ground truth data is by recruiting citizen scientists through a crowdsourcing platform. In this study, a crowdsourcing platform application was built using a human-centered design process. The primary goals were to gauge users’ perceptions of the platform, evaluate how well the system satisfies their needs, and observe whether the classification rate of lambsquarters by the users would match that of an expert. Previous work demonstrated a need for ground truth data on lambsquarters in the D.C., Maryland, Virginia (DMV) area. Previous social interviews revealed users who would want a citizen science platform to expand their skills and give them access to educational resources. Using a human-centered design protocol, design iterations of a mobile application were created in Kinvey Studio. The application, Mission LQ, taught people how to classify certain characteristics of lambsquarters in the DMV and allowed them to submit ground truth data. The final design of Mission LQ received a median system usability scale (SUS) score of 80.13, which indicates a good design. The classification rate of lambsquarters was 72%, which is comparable to expert classification. This demonstrates that a crowdsourcing mobile application can be used to collect high quality ground truth data for use in precision agriculture.

Download Full-text

Updated gridded datasets version 2020 provided by the Global Precipitation Climatology Centre (GPCC)

10.5194/egusphere-egu21-14611 ◽

2021 ◽

Author(s):

Elke Rustemeier ◽

Udo Schneider ◽

Markus Ziese ◽

Peter Finger ◽

Andreas Becker

Keyword(s):

Quality Control ◽

Real Time ◽

Land Surface ◽

Quality Data ◽

Digital Object Identifier ◽

Full Data ◽

High Quality ◽

Precipitation Climatology ◽

Global Precipitation Climatology Centre ◽

Global Precipitation

Since its founding in 1989, the Global Precipitation Climatology Centre (GPCC) has been producing global precipitation analyses based on land surface in-situ measurements. In the now over 30 years the underlying database has been continuously expanded and includes a high station density and large temporal coverage. Due to the semi-automatic quality control routinely performed on the incoming station data, the GPCC database has a very high quality. Today, the GPCC holds data from 123,000 stations, about three quarters of them having long time series.The core of the analyses is formed by data from the global meteorological and hydrological services, which provided their records to the GPCC, as well as global and regional data collections.&#160; In addition, the GPCC receives SYNOP and CLIMAT reports via the WMO-GTS. These form a supplement for the high quality precipitation analyses and the basis for the near real-time evaluations.Quality control activities include cross-referencing stations from different sources, flagging of data errors, and correcting temporally or spatially offset data. This data then forms the basis for the following interpolation and product generation.In near real time, the 'First Guess Monthly', 'First Guess Daily', 'Monitoring Product', &#8216;Provisional Daily Precipitation Analysis&#8217; and the 'GPCC Drought Index' are generated. These are based on WMO-GTS data and monthly data generated by the CPC (NOAA). With a 2-3 year update cycle, the high quality data products are generated with intensive quality control and built on the entire GPCC data base. These non-real time products consist of the 'Full Data Monthly', 'Full Data Daily', 'Climatology', and 'HOMPRA-Europe' and are now available in the 2020 version. All gridded datasets presented in this paper are freely available in netcdf format on the GPCC website https://gpcc.dwd.de and referenced by a digital object identifier (DOI). The site also provides an overview of all datasets, as well as a detailed description and further references for each dataset.

Download Full-text

Progressive System: A Deep-Learning Framework for Real-Time Data in Industrial Production

Processes ◽

10.3390/pr8060649 ◽

2020 ◽

Vol 8 (6) ◽

pp. 649

Author(s):

Yifeng Liu ◽

Wei Zhang ◽

Wenhao Du

Keyword(s):

Deep Learning ◽

Real Time ◽

Large Scale ◽

Quality Data ◽

Time Data ◽

High Quality ◽

Real Time System ◽

High Quality Data ◽

Learning Framework ◽

Data Accumulation

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.

Download Full-text

Methodology for Calculating Latency of GPS Probe Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2645-09 ◽

2017 ◽

Vol 2645 (1) ◽

pp. 76-85 ◽

Cited By ~ 3

Author(s):

Zhongxiang Wang ◽

Masoud Hamedi ◽

Stanley Young

Keyword(s):

Real Time ◽

Ground Truth ◽

Transportation Systems ◽

Traffic Information ◽

Reference Source ◽

Traveler Information ◽

Ground Truth Data ◽

Real Time Traffic ◽

Message Signs ◽

Road Segmentation

Crowdsourced GPS probe data, such as travel time on changeable-message signs and incident detection, have been gaining popularity in recent years as a source for real-time traffic information to driver operations and transportation systems management and operations. Efforts have been made to evaluate the quality of such data from different perspectives. Although such crowdsourced data are already in widespread use in many states, particularly the high traffic areas on the Eastern seaboard, concerns about latency—the time between traffic being perturbed as a result of an incident and reflection of the disturbance in the outsourced data feed—have escalated in importance. Latency is critical for the accuracy of real-time operations, emergency response, and traveler information systems. This paper offers a methodology for measuring probe data latency regarding a selected reference source. Although Bluetooth reidentification data are used as the reference source, the methodology can be applied to any other ground truth data source of choice. The core of the methodology is an algorithm for maximum pattern matching that works with three fitness objectives. To test the methodology, sample field reference data were collected on multiple freeway segments for a 2-week period by using portable Bluetooth sensors as ground truth. Equivalent GPS probe data were obtained from a private vendor, and their latency was evaluated. Latency at different times of the day, impact of road segmentation scheme on latency, and sensitivity of the latency to both speed-slowdown and recovery-from-slowdown episodes are also discussed.

Download Full-text

A robust and automated algorithm that uses single-channel spike sorting to label multi-channel Neuropixels data

10.1101/2020.12.19.423558 ◽

2020 ◽

Author(s):

Zheng Zhang ◽

Timothy G. Constandinou

Keyword(s):

Single Channel ◽

Ground Truth ◽

Channel Selection ◽

Quality Data ◽

Spike Sorting ◽

High Quality ◽

High Quality Data ◽

Automated Algorithm ◽

Automated Method ◽

Preliminary Work

AbstractThis paper describes preliminary work towards an automated algorithm for labelling Neuropixel data that exploits the fact that adjacent recording sites are spatially oversampled. This is achieved by combining classical single channel spike sorting with spatial spike grouping, resulting in an improvement in both accuracy and robustness. This is additionally complemented by an automated method for channel selection that determines which channels contain high quality data. The algorithm has been applied to a freely accessible dataset, produced by Cortex Lab, UCL. This has been evaluated to have a accuracy of over 77% compared to a manually curated ground truth.

Download Full-text