uneven sampling
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 9)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 7 (9) ◽  
Author(s):  
Gal Horesh ◽  
Alyce Taylor-Brown ◽  
Stephanie McGimpsey ◽  
Florent Lassalle ◽  
Jukka Corander ◽  
...  

The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species’ pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 Escherichia coli genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species’ pan-genome.


2021 ◽  
Author(s):  
Oscar J Charles ◽  
Joeseph Roberts ◽  
Judith Breuer ◽  
Richard A Goldstein

Sequence-weighting methods are commonly employed to account for biases in sequence datasets. We use a weighting scheme which considers the observed distinctiveness of sequences and apply it to calculations of linkage disequilibrium. Each sequence now contributes a weighted score to linkage disequilibrium measurements of pairwise loci. We demonstrate that this reduces the effect of uneven sampling, as underrepresented groups of sequences will each contribute more individually than redundant, similar sequences.


2021 ◽  
Vol 39 (3) ◽  
pp. 1-24
Author(s):  
Jiawei Chen ◽  
Chengquan Jiang ◽  
Can Wang ◽  
Sheng Zhou ◽  
Yan Feng ◽  
...  

Sampling strategies have been widely applied in many recommendation systems to accelerate model learning from implicit feedback data. A typical strategy is to draw negative instances with uniform distribution, which, however, will severely affect a model’s convergence, stability, and even recommendation accuracy. A promising solution for this problem is to over-sample the “difficult” (a.k.a. informative) instances that contribute more on training. But this will increase the risk of biasing the model and leading to non-optimal results. Moreover, existing samplers are either heuristic, which require domain knowledge and often fail to capture real “difficult” instances, or rely on a sampler model that suffers from low efficiency. To deal with these problems, we propose CoSam, an efficient and effective collaborative sampling method that consists of (1) a collaborative sampler model that explicitly leverages user-item interaction information in sampling probability and exhibits good properties of normalization, adaption, interaction information awareness, and sampling efficiency, and (2) an integrated sampler-recommender framework, leveraging the sampler model in prediction to offset the bias caused by uneven sampling. Correspondingly, we derive a fast reinforced training algorithm of our framework to boost the sampler performance and sampler-recommender collaboration. Extensive experiments on four real-world datasets demonstrate the superiority of the proposed collaborative sampler model and integrated sampler-recommender framework.


2021 ◽  
Author(s):  
Gal Horesh ◽  
Alyce Taylor-Brown ◽  
Stephanie McGimpsey ◽  
Florent Lassalle ◽  
Jukka Corander ◽  
...  

AbstractThe pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialised bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug resistant pathogens. However, current approaches to analyse a species’ pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7,500 E. coli genomes, one of the most-studied bacterial species used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species’ pan-genome.


Author(s):  
Дмитрий Сергеевич Викторов ◽  
Екатерина Владимировна Пластинина ◽  
Елена Валерьевна Самоволина

В работе обосновываются требования к уровню искажений радиолокационных станций с импульсным и квазинепрерывным излучением, построенных на основе цифровых синтезаторов сигналов четырех типов: цифровых синтезаторах отсчетов напряжения и цифровых синтезаторах отсчетов фазы с равномерной дискретизацией, цифровых синтезаторах отсчетов напряжения и цифровых синтезаторах отсчетов фазы с неравномерной дискретизацией. При построении задающего устройства РЛС возникает вопрос о выборе типа цифрового синтезатора сигналов. Основными исходными критериями при этом являются максимальный рабочий диапазон цифрового синтезатора и уровень внутриполосных искажений. При выборе типа цифрового синтезатора сигналов необходимо учитывать большое количество факторов, основными из которых являются сложность исполнения формирователя цифровых отсчетов, возможность реализации формирователя цифровых отсчетов с требуемым быстродействием и количеством разрядов [1, 2]. При предъявлении требований к суммарному уровню искажений используется критерий допустимого снижения вероятности правильного обнаружения по сравнению с её потенциальным значением при фиксированной вероятности ложной тревоги. Исходя из данного критерия в импульсных РЛС максимальное относительное среднеквадратическое значение искажений взаимокорреляционной функции сигнала с угловой модуляцией, формируемого цифровым синтезатором, не должно превышать $D_{\\delta x}\\le $-(51...67) дБ. В РЛС с квазинепрерывным излучением максимальное относительное среднеквадратическое значение искажений автокорреляционной функции сигнала с угловой модуляцией не должно превышать $D_{\\delta }\\le $-(80...120) дБ. Количество разрядов квантования фазы, напряжения и компенсации временной задержки в цифровых синтезаторах сигналов зависит не только от максимального относительного среднеквадратического значения искажений взаимокорреляционной функции но и от количества отсчетов сигнала с угловой модуляцией. Поэтому первоначально необходимо выбрать эталонную частоту цифрового синтезатора сигналов, задаваясь видом модуляции и эффективной шириной спектра сигнала с угловой модуляцией исходя из ТТХ РЛС. The paper substantiates the requirements for the level of distortion of radar stations with pulsed and quasi-continuous radiation, built on the basis of digital signal synthesizers of four types: digital synthesizers of voltage counts and digital synthesizers of phase counts with uniform sampling, digital synthesizers of voltage counts and digital synthesizers of phase counts with uneven sampling. When building a radar master device, the question arises about choosing the type of digital signal synthesizer. The main initial criteria are the maximum operating range of the digital synthesizer and the level of in-band distortion. When choosing the type of digital signal synthesizer, you must take into account a large number of factors, the main of which are the complexity of the execution of the digital readout shaper, the possibility of implementing a digital readout shaper with the required speed and number of digits [1, 2]. When making requirements for the total level of distortion, the criterion of acceptable reduction of the probability of correct detection in comparison with its potential value for a fixed probability of false alarm is used. Based on this criterion, in pulse radars, the maximum relative RMS value of the distortion of the intercorrelation function of a signal with angular modulation generated by a digital synthesizer should not exceed $D_{\\delta x}\\le $-(51...67) dB. In a radar with quasi-continuous radiation, the maximum relative mean-square value of the distortion of the autocorrelation function of the signal with angular modulation should not exceed $D_{\\delta }\\le $- (80...120) dB. The number of bits of phase quantization, voltage and time delay compensation in digital signal synthesizers depends not only on the maximum relative RMS value of the distortion of the intercorrelation function, but also on the number of samples of the signal with angular modulation. Therefore, initially you need to select the reference frequency of the digital signal synthesizer, setting the type of modulation and the effective spectrum width of the signal with angular modulation based on the tactical and technical characteristics radar.


2020 ◽  
Author(s):  
Ana Carolina Petisco-Souza ◽  
Fernanda Thiesen Brum ◽  
Vinícius Marcilio-Silva ◽  
Victor P. Zwiener ◽  
Andressa Zanella ◽  
...  

ABSTRACTBiodiversity shortfalls are knowledge gaps that may result from uneven sampling through time and space and human interest biases. Gaps in data of functional traits of species may add uncertainty in functional diversity and structure measures and hinder inference on ecosystem functioning and ecosystem services, with negative implications for conservation and restoration practices, such as in Atlantic Forest hotspot. Here we investigate which are the potential drivers of trait data gaps and where geographically they are in the Atlantic Forest. We quantified trait gaps for four key plant functional traits of 2335 trees species, and evaluated which factors drive trait gap at the species and at the geographical level. At the species level, we found larger trait gaps for small-ranged and with no economic use. At the geographical level, we found larger gaps at the Atlantic Forest east coast. Trait gaps were higher away from urban areas, and among species with smaller mean range size and smaller mean economic use of wood, and smaller near protected areas. Efforts on reducing trait gaps of small-ranged and of species with economic use of wood can further advance theory-driven studies and improve knowledge coverage


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2700 ◽  
Author(s):  
Yihang Jiang ◽  
Yuankai Qi ◽  
Will Ke Wang ◽  
Brinnae Bent ◽  
Robert Avram ◽  
...  

The dynamic time warping (DTW) algorithm is widely used in pattern matching and sequence alignment tasks, including speech recognition and time series clustering. However, DTW algorithms perform poorly when aligning sequences of uneven sampling frequencies. This makes it difficult to apply DTW to practical problems, such as aligning signals that are recorded simultaneously by sensors with different, uneven, and dynamic sampling frequencies. As multi-modal sensing technologies become increasingly popular, it is necessary to develop methods for high quality alignment of such signals. Here we propose a DTW algorithm called EventDTW which uses information propagated from defined events as basis for path matching and hence sequence alignment. We have developed two metrics, the error rate (ER) and the singularity score (SS), to define and evaluate alignment quality and to enable comparison of performance across DTW algorithms. We demonstrate the utility of these metrics on 84 publicly-available signals in addition to our own multi-modal biomedical signals. EventDTW outperformed existing DTW algorithms for optimal alignment of signals with different sampling frequencies in 37% of artificial signal alignment tasks and 76% of real-world signal alignment tasks.


2019 ◽  
Vol 286 (1917) ◽  
pp. 20192054 ◽  
Author(s):  
Sandra R. Schachat ◽  
Conrad C. Labandeira ◽  
Matthew E. Clapham ◽  
Jonathan L. Payne

The history of insects’ taxonomic diversity is poorly understood. The two most common methods for estimating taxonomic diversity in deep time yield conflicting results: the ‘range through’ method suggests a steady, nearly monotonic increase in family-level diversity, whereas ‘shareholder quorum subsampling’ suggests a highly volatile taxonomic history with family-level mass extinctions occurring repeatedly, even at the midpoints of geological periods. The only feature shared by these two diversity curves is a steep increase in standing diversity during the Early Cretaceous. This apparent diversification event occurs primarily during the Aptian, the pre-Cenozoic interval with the most described insect occurrences, raising the possibility that this feature of the diversity curves reflects preservation and sampling biases rather than insect evolution and extinction. Here, the capture–mark–recapture (CMR) approach is used to estimate insects’ family-level diversity. This method accounts for the incompleteness of the insect fossil record as well as uneven sampling among time intervals. The CMR diversity curve shows extinctions at the Permian/Triassic and Cretaceous/Palaeogene boundaries but does not contain any mass extinctions within geological periods. This curve also includes a steep increase in diversity during the Aptian, which appears not to be an artefact of sampling or preservation bias because this increase still appears when time bins are standardized by the number of occurrences they contain rather than by the amount of time that they span. The Early Cretaceous increase in family-level diversity predates the rise of angiosperms by many millions of years and can be better attributed to the diversification of parasitic and especially parasitoid insect lineages.


2019 ◽  
Vol 286 (1897) ◽  
pp. 20190091 ◽  
Author(s):  
Joseph T. Flannery Sutherland ◽  
Benjamin C. Moon ◽  
Thomas L. Stubbs ◽  
Michael J. Benton

How much of evolutionary history is lost because of the unevenness of the fossil record? Lagerstätten, sites which have historically yielded exceptionally preserved fossils, provide remarkable, yet distorting insights into past life. When examining macroevolutionary trends in the fossil record, they can generate an uneven sampling signal for taxonomic diversity; by comparison, their effect on morphological variety (disparity) is poorly understood. We show here that lagerstätten impact the disparity of ichthyosaurs, Mesozoic marine reptiles, by preserving higher diversity and more complete specimens. Elsewhere in the fossil record, undersampled diversity and more fragmentary specimens produce spurious results. We identify a novel effect, that a taxon moves towards the centroid of a Generalized Euclidean dataset as its proportion of missing data increases. We term this effect ‘centroid slippage’, as a disparity-based analogue of phylogenetic stemward slippage. Our results suggest that uneven sampling presents issues for our view of disparity in the fossil record, but that this is also dependent on the methodology used, especially true with widely used Generalized Euclidean distances. Mitigation of missing cladistic data is possible by phylogenetic gap filling, and heterogeneous effects of lagerstätten on disparity may be accounted for by understanding the factors affecting their spatio-temporal distribution.


2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Pablo S. Padrón ◽  
David W. Roubik ◽  
Ruben P. Picón

A checklist of Euglossini in Ecuador is given, including all currently described, valid species collected until 2018. The list has been assembled from museum records, fieldwork cited herein, and literature. The former species lists are nearly doubled here, with 1 Aglae, 23 Eufriesea, 68 Euglossa, 18 Eulaema, and 5 Exaerete, 115 in total with >50 new records for the country. Distribution and collection data are included, and some doubtful species are discussed. The Amazon region is the most species rich area but not necessarily a natural pattern, perhaps due to uneven sampling effort across the country. Southern Ecuador is relatively little sampled.


Sign in / Sign up

Export Citation Format

Share Document