A Dataset for Comparing Mirrored and Non-Mirrored Male Bust Images for Facial Recognition

Facial recognition, as well as other types of human recognition, have found uses in identification, security, and learning about behavior, among other uses. Because of the high cost of data collection for training purposes, logistical challenges and other impediments, mirroring images has frequently been used to increase the size of data sets. However, while these larger data sets have shown to be beneficial, their comparative level of benefit to the data collection of similar data has not been assessed. This paper presented a data set collected and prepared for this and related research purposes. The data set included both non-occluded and occluded data for mirroring assessment.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

From pole to pole: 33 years of physical oceanography onboard R/V <i>Polarstern</i>

Earth System Science Data ◽

10.5194/essd-9-211-2017 ◽

2017 ◽

Vol 9 (1) ◽

pp. 211-220 ◽

Cited By ~ 7

Author(s):

Amelie Driemel ◽

Eberhard Fahrbach ◽

Gerd Rohardt ◽

Agnieszka Beszczynska-Möller ◽

Antje Boetius ◽

...

Keyword(s):

Data Collection ◽

Water Cycle ◽

Heat Budget ◽

Ocean Dynamics ◽

The Arctic ◽

Sensor Calibration ◽

Data Sets ◽

Calibration Data ◽

Data Set ◽

Different Characteristics

Abstract. Measuring temperature and salinity profiles in the world's oceans is crucial to understanding ocean dynamics and its influence on the heat budget, the water cycle, the marine environment and on our climate. Since 1983 the German research vessel and icebreaker Polarstern has been the platform of numerous CTD (conductivity, temperature, depth instrument) deployments in the Arctic and the Antarctic. We report on a unique data collection spanning 33 years of polar CTD data. In total 131 data sets (1 data set per cruise leg) containing data from 10 063 CTD casts are now freely available at doi:10.1594/PANGAEA.860066. During this long period five CTD types with different characteristics and accuracies have been used. Therefore the instruments and processing procedures (sensor calibration, data validation, etc.) are described in detail. This compilation is special not only with regard to the quantity but also the quality of the data – the latter indicated for each data set using defined quality codes. The complete data collection includes a number of repeated sections for which the quality code can be used to investigate and evaluate long-term changes. Beginning with 2010, the salinity measurements presented here are of the highest quality possible in this field owing to the introduction of the OPTIMARE Precision Salinometer.

Download Full-text

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Sensors ◽

10.3390/s20030879 ◽

2020 ◽

Vol 20 (3) ◽

pp. 879 ◽

Cited By ~ 2

Author(s):

Uwe Köckemann ◽

Marjan Alirezaie ◽

Jennifer Renoux ◽

Nicolas Tsiftes ◽

Mobyen Uddin Ahmed ◽

...

Keyword(s):

Data Collection ◽

Activity Recognition ◽

Care Home ◽

Open Data ◽

Ground Truth ◽

Smart Homes ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Home Setting

As research in smart homes and activity recognition is increasing, it is of ever increasing importance to have benchmarks systems and data upon which researchers can compare methods. While synthetic data can be useful for certain method developments, real data sets that are open and shared are equally as important. This paper presents the E-care@home system, its installation in a real home setting, and a series of data sets that were collected using the E-care@home system. Our first contribution, the E-care@home system, is a collection of software modules for data collection, labeling, and various reasoning tasks such as activity recognition, person counting, and configuration planning. It supports a heterogeneous set of sensors that can be extended easily and connects collected sensor data to higher-level Artificial Intelligence (AI) reasoning modules. Our second contribution is a series of open data sets which can be used to recognize activities of daily living. In addition to these data sets, we describe the technical infrastructure that we have developed to collect the data and the physical environment. Each data set is annotated with ground-truth information, making it relevant for researchers interested in benchmarking different algorithms for activity recognition.

Download Full-text

A multi-decade record of high-quality fCO<sub>2</sub> data in version 3 of the Surface Ocean CO<sub>2</sub> Atlas (SOCAT)

10.5194/essd-2016-15 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dorothee C. E. Bakker ◽

Benjamin Pfeil ◽

Camilla S. Landa ◽

Nicolas Metzl ◽

Kevin M. O'Brien ◽

...

Keyword(s):

Carbon Dioxide ◽

Surface Water ◽

Data Collection ◽

Data Availability ◽

Data Sets ◽

Science Data ◽

High Quality ◽

Data Set ◽

Surface Ocean ◽

Biogeochemical Models

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.5 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.4 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "Living Data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014).

Download Full-text

Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

Applied Sciences ◽

10.3390/app10072539 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2539 ◽

Cited By ~ 1

Author(s):

Toan Nguyen Mau ◽

Yasushi Inoguchi

Keyword(s):

Big Data ◽

Information Retrieval ◽

Retrieval System ◽

Hash Table ◽

Information Retrieval System ◽

Main Memory ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Similar Data ◽

Data Set

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.

Download Full-text

Students’ General Knowledge of the Learning Process: A Mixed Methods Study Illustrating Integrated Data Collection and Data Consolidation

Journal of Mixed Methods Research ◽

10.1177/1558689816651792 ◽

2016 ◽

Vol 12 (2) ◽

pp. 182-203 ◽

Cited By ~ 7

Author(s):

Joke H. van Velzen

Keyword(s):

Mixed Methods ◽

Data Collection ◽

Learning Process ◽

Mixed Methods Research ◽

Mixed Methods Study ◽

Data Sets ◽

General Knowledge ◽

Data Set ◽

Data Consolidation ◽

New Research

There were two purposes for this mixed methods study: to investigate (a) the realistic meaning of awareness and understanding as the underlying constructs of general knowledge of the learning process and (b) a procedure for data consolidation. The participants were 11th-grade high school and first-year university students. Integrated data collection and data transformation provided for positive but small correlations between awareness and understanding. A comparison of the created combined and integrated new data sets showed that the integrated data set provided for an expected statistically significant outcome, which was in line with the participants’ developmental difference. This study can contribute to the mixed methods research because it proposes a procedure for data consolidation and a new research design.

Download Full-text

Functional Morphology at the Mall

The American Biology Teacher ◽

10.1525/abt.2012.74.6.8 ◽

2012 ◽

Vol 74 (6) ◽

pp. 401-408

Author(s):

Scott P. Hippensteel

Keyword(s):

Functional Morphology ◽

Undergraduate Students ◽

Sample Selection ◽

The United States ◽

Data Sets ◽

Similar Data ◽

Data Set ◽

Fossil Resource ◽

Upper Level ◽

Fossil Data

The primary decorative flooring tile in the Southpark Mall in Charlotte, North Carolina, is fossiliferous limestone that contains Jurassic ammonoids and belemnoids. Visible in these tiles are more than 500 ammonoids, many of which have been cross sectioned equatorially perpendicular to the plane of coiling. Upper-level undergraduate students from UNC Charlotte used this data set to measure ammonoid coiling geometry and, thus, coiling strategy, and their findings were compared with earlier reported research presented in highly respected paleobiology journals. This example of urban paleobiology utilized a large, easily accessible, and readily available fossil data set to introduce functional morphology of coiled cephalopods. Similar data sets are available in public buildings around the United States, providing a valuable fossil resource at a time when shrinking academic budgets would prohibit purchasing such a collection (and many collections have not been updated in decades). As students compared their results with those previously published by professional paleontologists, they were exposed to the methods and limits of the scientific method in the historical sciences, as well as the dangers of poor sample selection.

Download Full-text

Experiences from Recent Geo-Wiki Citizen Science Campaigns in the Creation and Sharing of New Reference Data Sets on Land Cover and Land Use

10.5194/egusphere-egu21-10871 ◽

2021 ◽

Author(s):

Juan Carlos Laso Bayas ◽

Linda See ◽

Myroslava Lesiv ◽

Martina Dürauer ◽

Ivelina Georgieva ◽

...

Keyword(s):

Land Use ◽

Data Collection ◽

Citizen Science ◽

Satellite Imagery ◽

Reference Data ◽

Scientific Publication ◽

Agricultural Fields ◽

Data Sets ◽

Visual Interpretation ◽

Data Set

<div> <p>Geo-Wiki is an online platform for involving citizens in the visual interpretation of very high-resolution satellite imagery to collect reference data on land cover and land use. Instead of being an ongoing citizen science project, short intensive campaigns are organized in which citizens participate. The advantage of this approach is that large amounts of data are collected in a short amount of time with a clearly defined data collection target to reach. Participants can also schedule their time accordingly, with their past feedback indicating that this intensive approach was preferred. The reference data are then used in further scientific research to answer a range of questions such as: How much of the land&#8217;s surface is wild or impacted by humans?&#160; What is the size of agricultural fields globally? The campaigns are organized as competitions with prizes that include Amazon vouchers and co-authorship on a scientific publication. The scientific publication is the mechanism by which the data are openly shared so that other researchers can use this reference data set in other applications. The publication is usually in the form of a data paper, which explains the campaign in detail along with the data set collected. The data are uploaded to a repository such as Pangaea, ZENODO or IIASA&#8217;s own data repository, DARE.&#160; This approach from data collection, to opening up the data, to documentation via a scientific data paper also ensures transparency in the data collection process. There have been several Geo-Wiki citizen science campaigns that have been run over the last decade. Here we provide examples of experiences from five recent campaigns: (i) the Global Cropland mapping campaign to build a cropland validation data set; (ii) the Global Field Size campaign to characterize the size of agricultural fields around the world; (iii) the Human Impact on Forests campaign to produce the first global map of forest management; (iv) the Global Built-up Surface Validation campaign to collect data on built-up surfaces for validation of global built-up products such as the Global Human Settlement Layer (https://ghsl.jrc.ec.europa.eu/); and (v) the Drivers of Tropical Forest Loss campaign, which collected data on the main causes of deforestation in the tropics. In addition to outlining the campaign, the data sets collected and the sharing of the data online, we provide lessons learned from these campaigns, which have built upon experiences collected over the last decade. These include insights related to the quality and consistency of the classifications of the volunteers including different volunteer behaviors; best practices in creating control points for use in the gamification and quality assurance of the campaigns; different methods for training the volunteers in visual interpretation; difficulties in the interpretation of some features, which may need expert input instead as well as the inability of some features to be recognized from satellite imagery; and limitations in the approach regarding change detection due to temporal availability of open satellite imagery, among several others.&#160;</p> </div>

Download Full-text

Perdeuteration, crystallization, data collection and comparison of five neutron diffraction data sets of complexes of human galectin-3C

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798316015540 ◽

2016 ◽

Vol 72 (11) ◽

pp. 1194-1202 ◽

Cited By ~ 13

Author(s):

Francesco Manzoni ◽

Kadhirvel Saraboji ◽

Janina Sprenger ◽

Rohit Kumar ◽

Ann-Louise Noresson ◽

...

Keyword(s):

Data Collection ◽

Neutron Diffraction ◽

Complete Removal ◽

Carbohydrate Binding ◽

Data Sets ◽

Data Set ◽

Galectin 3 ◽

Reduced Data ◽

Collection Strategies ◽

Carbohydrate Binding Site

Galectin-3 is an important protein in molecular signalling events involving carbohydrate recognition, and an understanding of the hydrogen-bonding patterns in the carbohydrate-binding site of its C-terminal domain (galectin-3C) is important for the development of new potent inhibitors. The authors are studying these patterns using neutron crystallography. Here, the production of perdeuterated human galectin-3C and successive improvement in crystal size by the development of a crystal-growth protocol involving feeding of the crystallization drops are described. The larger crystals resulted in improved data quality and reduced data-collection times. Furthermore, protocols for complete removal of the lactose that is necessary for the production of large crystals of apo galectin-3C suitable for neutron diffraction are described. Five data sets have been collected at three different neutron sources from galectin-3C crystals of various volumes. It was possible to merge two of these to generate an almost complete neutron data set for the galectin-3C–lactose complex. These data sets provide insights into the crystal volumes and data-collection times necessary for the same system at sources with different technologies and data-collection strategies, and these insights are applicable to other systems.

Download Full-text

A Study on the Laws Governing Facial Recognition Technology and Data Privacy in Malaysia

Malaysian Journal of Social Sciences and Humanities (MJSSH) ◽

10.47405/mjssh.v6i10.1086 ◽

2021 ◽

Vol 6 (10) ◽

pp. 480-487

Author(s):

Muhammad Ashraf Bin Mohd Nor ◽

Mohammad Asyraf Bin Mohd Tasrib ◽

Bryan Francis ◽

Nurul Izzah Binti Hesham ◽

Mohd Bahrin Bin Othman

Keyword(s):

Data Collection ◽

Data Privacy ◽

Facial Recognition ◽

Data Sets ◽

The Past ◽

Face Detection And Recognition ◽

Complex Technology ◽

Laws And Policies ◽

Facial Images ◽

Detection And Recognition

The advancement of technology in the past decade has led humans to achieve many great things. Among that is facial recognition technology that uses a combination of two techniques which is face detection and recognition that is capable of converting facial images of a person into readable data and connecting it with other data sets which enable it to identify, track or compare it. This study delves into the usage of facial recognition technology in Malaysia where its regulation is almost non-existent. As its usage increases, the invasive features of this technology to collect and connect its data posed a threat to the data privacy of Malaysian citizens. Due to this issue, other countries' laws and policies regarding this technology are examined and compared with Malaysia. This enables the loopholes of the current law and policies to be identified and restructured, which create a clear path on the proper regulations and changes that need to be made. Thus, this study aims to analyse the limitation of law governing data privacy and its concept in Malaysia along with changes that need to be made. This study’s finding shows the shortcoming of Malaysia’s law in governing data privacy especially when it involves complex technology that has great data collection capability like facial recognition.

Download Full-text