A Metropolis Sampling Method for Drawing Representative Samples from Large Databases

Abstract. Diatoms, microscopic, single-celled algae, are present in almost all habitats containing water (e.g. streams, lakes, soil, rocks) and form one of the most common and diverse algal groups in both freshwaters and marine ecosystems. In the terrestrial environment, their diversified species distributions are mainly controlled by physiographical factors and anthropic disturbances. This makes them useful tracers in catchment hydrology. In their use as a hydrological tracer, diatoms are generally sampled in streams by means of an automated sampling method and as a result many samples are collected to cover a whole storm run-off event. As diatom analysis is labour intensive, a trade-off has to be made between the number of sites and the amount of samples per site. A potential way to reduce this number is by using a time-integrated mass-flux sampler. Here, we explored the potential for the Phillips sampler to provide a representative sample of the diatom assemblage of a whole storm run-off event. We addressed this by comparing the diatom community composition of the Phillips sampler to the composite community collected by the automatic samplers for three events. Our results indicate that during two events the Phillips sampler sampled representative samples, whereas significantly different communities were collected during the third event. However, sediment data of this event, which was sampled with automatic samplers, showed much noise meaning that we could not verify if the Phillips sampler sampled representative communities or not. Nevertheless, we believe that this sampler could not only be applied in hydrological tracing using terrestrial diatoms, but may also be a useful tool in water quality assessment.

Download Full-text

Studying Online Activism: The Effects of Sampling Design on Findings

Mobilization An International Quarterly ◽

10.17813/maiq.18.4.54261246r8w05865 ◽

2013 ◽

Vol 18 (4) ◽

pp. 389-406 ◽

Cited By ~ 8

Author(s):

Jennifer Earl

Keyword(s):

Best Practices ◽

Social Movement ◽

Sampling Method ◽

Sampling Design ◽

Robust Methods ◽

Sampling Frame ◽

The Third ◽

Internet Activism ◽

Online Activism ◽

Representative Samples

Social movement scholars are increasingly interested in Internet activism but have struggled to find robust methods for identifying cases, particularly representative samples of online protest content, given that no population list exists. This article reviews early approaches to this problem, focusing on three recent case sampling designs that attempt to address this problem. The first approach purposively samples from an organizationally based sampling frame. The second approach randomly samples from a SMO-based sampling frame. The third approach mimics user routines to identify populations of "reachable" websites on a given topic, which are then randomly sampled. For each approach, I examine the sampling frame and sampling method to understand how cases were selected, outline the assumptions built into the overall sampling design, and discuss an exemplary research project employing each design. Comparisons of findings from these exemplar studies indicate that sampling designs are extremely consequential. I close by recommending best practices.

Download Full-text

A field sampling method for obtaining representative samples of composite fluvial suspended sediment particles for SEM analysis

Journal of Sedimentary Research ◽

10.1306/d42679ce-2b26-11d7-8648000102c1865d ◽

1992 ◽

Vol 62 (4) ◽

pp. 742-744 ◽

Cited By ~ 15

Author(s):

J. C. Woodward ◽

D. E. Walling

Keyword(s):

Suspended Sediment ◽

Sampling Method ◽

Sem Analysis ◽

Field Sampling ◽

Sediment Particles ◽

Representative Samples

Download Full-text

Drawing Representative Samples from Large Databases

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch079 ◽

2011 ◽

pp. 413-420

Author(s):

Wen-Chi Hou ◽

Hong Guo ◽

Feng Yan ◽

Qiang Zhu

Keyword(s):

Data Mining ◽

Importance Sampling ◽

Spatial Data ◽

Database Systems ◽

Spatial Data Mining ◽

Selectivity Estimation ◽

Large Databases ◽

Representative Samples

Sampling has been used in areas like selectivity estimation (Hou & Ozsoyoglu, 1991; Haas & Swami, 1992, Jermaine, 2003; Lipton, Naughton & Schnerder, 1990; Wu, Agrawal, & Abbadi, 2001), OLAP (Acharya, Gibbons, & Poosala, 2000), clustering (Agrawal, Gehrke, Gunopulos, & Raghavan, 1998; Palmer & Faloutsos, 2000), and spatial data mining (Xu, Ester, Kriegel, & Sander, 1998). Due to its importance, sampling has been incorporated into modern database systems.

Download Full-text

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI PENDAPATAN DAN KELAYAKAN USAHA AGROWISATA STRAWBERRY (Fragaria choiloensis L.) PETIK SENDIRI (Studi Kasus : Kabupaten Karo)

Jurnal Agriuma ◽

10.31289/agr.v1i2.2873 ◽

2019 ◽

Vol 1 (2) ◽

pp. 24

Author(s):

Muhammad JufriansyahMuhammad Jufriansyah ◽

Gustami Harahap ◽

Mitra Musika Lubis

Keyword(s):

Linear Regression ◽

Multiple Linear Regression ◽

Sampling Method ◽

Linear Regression Analysis ◽

Secondary Data ◽

Multiple Linear Regression Analysis ◽

Selling Price ◽

North Sumatra ◽

Positive Effect ◽

Representative Samples

<p>In North Sumatra there is one type of horticulture plant, strawberry, especially in Karo district. The purpose of this study was to find out what factors influence the level of income of strawberry picking agrotourism farmers themselves, knowing the price of their own picking strawberry agro-tourism business and knowing whether the picking strawberry agro- tourism business itself is feasible. The sampling method is used by the Central theorem limit method, the total population of strawberry farmers in Karo Regency is 60, in this study 30 farmers were made as representative samples of the entire population. The data collected is primary and secondary data. The analytical method used is multiple linear regression with SPSS 21, BEP software tools and business feasibility analysis using R / C Ratio. (1) From the study using multiple linear regression analysis tools which have positive effect on strawberry farmers' income, namely sales and expenditure volume of RT. (2) Analysis of data from (BEP), it is known that the sales volume reaches the level of 478.80 Kg with a selling price of Rp. 52,760 / Kg, then the sales result is Rp. 38,304,239, with the results of the sale, the strawberry picking agro-business itself was declared even. (3) Analysis of the feasibility of the own strawberry picking agrotourism business in Karo Regency, obtained the results of R / C> 1, then the business is economically feasible.</p><p> </p>

Download Full-text

Practical selection of representative sets of RNA-seq samples using a hierarchical approach

10.1101/2021.02.04.429817 ◽

2021 ◽

Author(s):

Laura H. Tung ◽

Carl Kingsford

Keyword(s):

Computational Method ◽

Divide And Conquer ◽

Similarity Matrix ◽

Rna Seq ◽

Counting Approach ◽

Representative Subset ◽

Large Databases ◽

Memory Reduction ◽

Representative Samples ◽

Selection Of

AbstractDespite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e.g. a k-mer counting approach that selects a subset based on k-mer similarities between RNA-seq samples), because of the huge number of available RNA-seq samples and the large number of k-mers/sequences in each sample, computing the full similarity matrix between all samples using k-mers/sequences for the entire set of RNA-seq samples in a large database (e.g. the SRA) has memory and runtime challenges, making direct representative set selection infeasible with limited computing resources. Therefore, we developed a novel computational method called “hierarchical representative set selection” to handle this challenge. Hierarchical representative set selection is a divide-and-conquer-like algorithm that breaks the representative set selection into sub-selections and hierarchically selects representative samples through multiple levels. We demonstrate that hierarchical representative set selection can achieve performance close to that of direct representative set selection, while largely reducing the runtime and memory requirements of computing the full similarity matrix (up to 8.4X runtime reduction and 4.7X memory reduction for 10000 samples that could be practically run with direct subset selection). We show that hierarchical representative set selection substantially outperforms random sampling on the entire SRA set of RNA-seq samples, making it a practical solution to representative set selection on large databases such as the SRA.

Download Full-text

Applying Generalizability Theory to Optimize Analysis of Spontaneous Teacher Talk in Elementary Classrooms

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00118 ◽

2020 ◽

Vol 63 (6) ◽

pp. 1947-1957

Author(s):

Alexandra Hollo ◽

Johanna L. Staubitz ◽

Jason C. Chow

Keyword(s):

Special Education Teachers ◽

Generalizability Theory ◽

Child Outcomes ◽

Elementary Classrooms ◽

Teacher Talk ◽

Group Instruction ◽

Data Set ◽

School Year ◽

Minimum Number ◽

Representative Samples

Purpose Although sampling teachers' child-directed speech in school settings is needed to understand the influence of linguistic input on child outcomes, empirical guidance for measurement procedures needed to obtain representative samples is lacking. To optimize resources needed to transcribe, code, and analyze classroom samples, this exploratory study assessed the minimum number and duration of samples needed for a reliable analysis of conventional and researcher-developed measures of teacher talk in elementary classrooms. Method This study applied fully crossed, Person (teacher) × Session (samples obtained on 3 separate occasions) generalizability studies to analyze an extant data set of three 10-min language samples provided by 28 general and special education teachers recorded during large-group instruction across the school year. Subsequently, a series of decision studies estimated of the number and duration of sessions needed to obtain the criterion g coefficient ( g > .70). Results The most stable variables were total number of words and mazes, requiring only a single 10-min sample, two 6-min samples, or three 3-min samples to reach criterion. No measured variables related to content or complexity were adequately stable regardless of number and duration of samples. Conclusions Generalizability studies confirmed that a large proportion of variance was attributable to individuals rather than the sampling occasion when analyzing the amount and fluency of spontaneous teacher talk. In general, conventionally reported outcomes were more stable than researcher-developed codes, which suggests some categories of teacher talk are more context dependent than others and thus require more intensive data collection to measure reliably.

Download Full-text

Prospects of âTopologically unquenched QCDâ from a study of the analogous importance sampling method in the massive Schwinger model

Nuclear Physics B - Proceedings Supplements ◽

10.1016/s0920-5632(00)00351-0 ◽

2000 ◽

Vol 83-84 (1-3) ◽

pp. 443-445

Author(s):

S DÃ¼rr

Keyword(s):

Importance Sampling ◽

Sampling Method ◽

Schwinger Model ◽

Importance Sampling Method

Download Full-text

An Empirical Analysis of the Obtrusiveness of and Participants' Compliance with the Electronically Activated Recorder (EAR)

European Journal of Psychological Assessment ◽

10.1027/1015-5759.23.4.248 ◽

2007 ◽

Vol 23 (4) ◽

pp. 248-257 ◽

Cited By ~ 58

Author(s):

Matthias R. Mehl ◽

Shannon E. Holleran

Keyword(s):

Empirical Analysis ◽

Sampling Method ◽

Daily Life ◽

Assessment Method ◽

Data Confidentiality ◽

Short Term ◽

Naturally Occurring ◽

Two Samples ◽

Audio Recorder ◽

Term Monitoring

Abstract. In this article, the authors provide an empirical analysis of the obtrusiveness of and participants' compliance with a relatively new psychological ambulatory assessment method, called the electronically activated recorder or EAR. The EAR is a modified portable audio-recorder that periodically records snippets of ambient sounds from participants' daily environments. In tracking moment-to-moment ambient sounds, the EAR yields an acoustic log of a person's day as it unfolds. As a naturalistic observation sampling method, it provides an observer's account of daily life and is optimized for the assessment of audible aspects of participants' naturally-occurring social behaviors and interactions. Measures of self-reported and behaviorally-assessed EAR obtrusiveness and compliance were analyzed in two samples. After an initial 2-h period of relative obtrusiveness, participants habituated to wearing the EAR and perceived it as fairly unobtrusive both in a short-term (2 days, N = 96) and a longer-term (10-11 days, N = 11) monitoring. Compliance with the method was high both during the short-term and longer-term monitoring. Somewhat reduced compliance was identified over the weekend; this effect appears to be specific to student populations. Important privacy and data confidentiality considerations around the EAR method are discussed.

Download Full-text

A Metropolis Sampling Method for Drawing Representative Samples from Large Databases

A Monte Carlo sampling method for drawing representative samples from large databases

Technical note: A time-integrated sediment trap to sample diatoms for hydrological tracing

Studying Online Activism: The Effects of Sampling Design on Findings

A field sampling method for obtaining representative samples of composite fluvial suspended sediment particles for SEM analysis

Drawing Representative Samples from Large Databases

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI PENDAPATAN DAN KELAYAKAN USAHA AGROWISATA STRAWBERRY (Fragaria choiloensis L.) PETIK SENDIRI (Studi Kasus : Kabupaten Karo)

Practical selection of representative sets of RNA-seq samples using a hierarchical approach

Applying Generalizability Theory to Optimize Analysis of Spontaneous Teacher Talk in Elementary Classrooms

Prospects of âTopologically unquenched QCDâ from a study of the analogous importance sampling method in the massive Schwinger model

An Empirical Analysis of the Obtrusiveness of and Participants' Compliance with the Electronically Activated Recorder (EAR)

Export Citation Format

A Metropolis Sampling Method for Drawing Representative Samples from Large Databases

A Monte Carlo sampling method for drawing representative samples from large databases

Technical note: A time-integrated sediment trap to sample diatoms for hydrological tracing

Studying Online Activism: The Effects of Sampling Design on Findings

A field sampling method for obtaining representative samples of composite fluvial suspended sediment particles for SEM analysis

Drawing Representative Samples from Large Databases

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI PENDAPATAN DAN KELAYAKAN USAHA AGROWISATA STRAWBERRY (Fragaria choiloensis L.) PETIK SENDIRI (Studi Kasus : Kabupaten Karo)

Practical selection of representative sets of RNA-seq samples using a hierarchical approach

Applying Generalizability Theory to Optimize Analysis of Spontaneous Teacher Talk in Elementary Classrooms

Prospects of âTopologically unquenched QCDâ from a study of the analogous importance sampling method in the massive Schwinger model

An Empirical Analysis of the Obtrusiveness of and Participants' Compliance with the Electronically Activated Recorder (EAR)

Prospects of âTopologically unquenched QCDâ from a study of the analogous importance sampling method in the massive Schwinger model