Read-Based Phasing of Related Individuals

Mapping Intimacies ◽

10.1101/037101 ◽

2016 ◽

Author(s):

Shilpa Garg ◽

Marcel Martin ◽

Tobias Marschall

Keyword(s):

Optimal Solution ◽

Real Data ◽

Theoretical Framework ◽

Mendelian Inheritance ◽

Sources Of Information ◽

Fixed Parameter ◽

Related Individuals ◽

Fixed Parameter Algorithm ◽

Multiple Variants ◽

Better Than

Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information -- reads and pedigree -- has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2x for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15x coverage per individual.

Are More Profiles Better Than Fewer?: Searching for Parsimony and Relevance in Stated Choice Experiments

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1719-22 ◽

2000 ◽

Vol 1719 (1) ◽

pp. 165-174 ◽

Cited By ~ 13

Author(s):

Peter R. Stopher ◽

David A. Hensher

Keyword(s):

New Zealand ◽

Degrees Of Freedom ◽

Choice Model ◽

Fractional Factorial ◽

Sources Of Information ◽

Stated Choice ◽

Complex Design ◽

Data Collection Instrument ◽

Main Effects ◽

Better Than

Transportation planners increasingly include a stated choice (SC) experiment as part of the armory of empirical sources of information on how individuals respond to current and potential travel contexts. The accumulated experience with SC data has been heavily conditioned on analyst prejudices about the acceptable complexity of the data collection instrument, especially the number of profiles (or treatments) given to each sampled individual (and the number of attributes and alternatives to be processed). It is not uncommon for transport demand modelers to impose stringent limitations on the complexity of an SC experiment. A review of the marketing and transport literature suggests that little is known about the basis for rejecting complex designs or accepting simple designs. Although more complex designs provide the analyst with increasing degrees of freedom in the estimation of models, facilitating nonlinearity in main effects and independent two-way interactions, it is not clear what the overall behavioral gains are in increasing the number of treatments. A complex design is developed as the basis for a stated choice study, producing a fractional factorial of 32 rows. The fraction is then truncated by administering 4, 8, 16, 24, and 32 profiles to a sample of 166 individuals (producing 1, 016 treatments) in Australia and New Zealand faced with the decision to fly (or not to fly) between Australia and New Zealand by either Qantas or Ansett under alternative fare regimes. Statistical comparisons of elasticities (an appropriate behavioral basis for comparisons) suggest that the empirical gains within the context of a linear specification of the utility expression associated with each alternative in a discrete choice model may be quite marginal.

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Machine Learning for the Dynamic Positioning of UAVs for Extended Connectivity

Sensors ◽

10.3390/s21134618 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4618

Author(s):

Francisco Oliveira ◽

Miguel Luís ◽

Susana Sargento

Keyword(s):

Machine Learning ◽

Cellular Networks ◽

Real Data ◽

Emerging Technology ◽

Machine Learning Algorithms ◽

Base Stations ◽

Aerial Vehicle ◽

Positioning Algorithm ◽

The Military ◽

Better Than

Unmanned Aerial Vehicle (UAV) networks are an emerging technology, useful not only for the military, but also for public and civil purposes. Their versatility provides advantages in situations where an existing network cannot support all requirements of its users, either because of an exceptionally big number of users, or because of the failure of one or more ground base stations. Networks of UAVs can reinforce these cellular networks where needed, redirecting the traffic to available ground stations. Using machine learning algorithms to predict overloaded traffic areas, we propose a UAV positioning algorithm responsible for determining suitable positions for the UAVs, with the objective of a more balanced redistribution of traffic, to avoid saturated base stations and decrease the number of users without a connection. The tests performed with real data of user connections through base stations show that, in less restrictive network conditions, the algorithm to dynamically place the UAVs performs significantly better than in more restrictive conditions, reducing significantly the number of users without a connection. We also conclude that the accuracy of the prediction is a very important factor, not only in the reduction of users without a connection, but also on the number of UAVs deployed.

Monitoring Persistence Change in Heavy-Tailed Observations

Symmetry ◽

10.3390/sym13060936 ◽

2021 ◽

Vol 13 (6) ◽

pp. 936

Author(s):

Dan Wang

Keyword(s):

Kernel Method ◽

Alternative Hypothesis ◽

Null Distribution ◽

Real Data ◽

Ratio Test ◽

Finite Sample ◽

Test Statistic ◽

Bootstrap Approximation ◽

Heavy Tailed ◽

Better Than

In this paper, a ratio test based on bootstrap approximation is proposed to detect the persistence change in heavy-tailed observations. This paper focuses on the symmetry testing problems of I(1)-to-I(0) and I(0)-to-I(1). On the basis of residual CUSUM, the test statistic is constructed in a ratio form. I prove the null distribution of the test statistic. The consistency under alternative hypothesis is also discussed. However, the null distribution of the test statistic contains an unknown tail index. To address this challenge, I present a bootstrap approximation method for determining the rejection region of this test. Simulation studies of artificial data are conducted to assess the finite sample performance, which shows that our method is better than the kernel method in all listed cases. The analysis of real data also demonstrates the excellent performance of this method.

Design Optimization of a 3D Parameterized Vane Cascade With Non-Axisymmetric Endwall Based on a Modified EGO Algorithm and Data Mining Techniques

Volume 2C: Turbomachinery ◽

10.1115/gt2017-63738 ◽

2017 ◽

Author(s):

Chenxi Li ◽

Zhendong Guo ◽

Liming Song ◽

Jun Li ◽

Zhenping Feng

Keyword(s):

Data Mining ◽

Global Optimization ◽

Design Space ◽

Pressure Coefficient ◽

Differential Evolution Algorithm ◽

Optimal Solution ◽

Efficient Global Optimization ◽

Data Mining Techniques ◽

Parallel Axis ◽

Better Than

The design of turbomachinery cascades is a typical high dimensional and computationally expensive problem, a metamodel-based global optimization and data mining method is proposed to solve it. A modified Efficient Global Optimization (EGO) algorithm named Multi-Point Search based Efficient Global Optimization (MSEGO) is proposed, which is characterized by adding multiple samples at per iteration. By testing on typical mathematical functions, MSEGO outperforms EGO in accuracy and convergence rate. MSEGO is used for the optimization of a turbine vane with non-axisymmetric endwall contouring (NEC), the total pressure coefficient of the optimal vane is increased by 0.499%. Under the same settings, another two optimization processes are conducted by using the EGO and an Adaptive Range Differential Evolution algorithm (ARDE), respectively. The optimal solution of MSEGO is far better than EGO. While achieving similar optimal solutions, the cost of MSEGO is only 3% of ARDE. Further, data mining techniques are used to extract information of design space and analyze the influence of variables on design performance. Through the analysis of variance (ANOVA), the variables of section profile are found to have most significant effects on cascade loss performance. However, the NEC seems not so important through the ANOVA analysis. This is due to the fact the performance difference between different NEC designs is very small in our prescribed space. However, the designs with NEC are always much better than the reference design as shown by parallel axis, i.e., the NEC would significantly influence the cascade performance. Further, it indicates that the ensemble learning by combing results of ANOVA and parallel axis is very useful to gain full knowledge from the design space.

Improved Tercom Based on Fading Factor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.143-144.770 ◽

2011 ◽

Vol 143-144 ◽

pp. 770-774 ◽

Cited By ~ 3

Author(s):

Shou Lei Lu ◽

Long Zhao ◽

Chang Yun Zhang

Keyword(s):

Correlation Function ◽

Real Data ◽

Angle Error ◽

The Real ◽

Velocity Error ◽

Speed Error ◽

Fading Factor ◽

Yaw Angle ◽

Better Than

In order to solve the problem of the traditional Tercom, which is sensitive to the speed error and yaw angle error, an improved Tercom approach using with fading factor is introduced. The basic idea of this approach is to estimate the navigation position by a novel correlation function. The correlation function is calculated by weighted historical measurements. Experiment results with the real data show that this approach performs better than the traditional Tercom with regard to overcoming velocity error and yaw angle error.

The Use of Official Statistics in Self-Selection Bias Modeling

Journal of Official Statistics ◽

10.1515/jos-2016-0046 ◽

2016 ◽

Vol 32 (4) ◽

pp. 887-905 ◽

Cited By ~ 4

Author(s):

Luciana Dalla Valle

Keyword(s):

Selection Bias ◽

Small Businesses ◽

Real Data ◽

Sources Of Information ◽

Quality Of Data ◽

Official Statistics ◽

Step Method ◽

Development Education ◽

Self Selection ◽

Available Information

Abstract Official statistics are a fundamental source of publicly available information that periodically provides a great amount of data on all major areas of citizens’ lives, such as economics, social development, education, and the environment. However, these extraordinary sources of information are often neglected, especially by business and industrial statisticians. In particular, data collected from small businesses, like small and medium-sized enterprizes (SMEs), are rarely integrated with official statistics data. In official statistics data integration, the quality of data is essential to guarantee reliable results. Considering the analysis of surveys on SMEs, one of the most common issues related to data quality is the high proportion of nonresponses that leads to self-selection bias. This work illustrates a flexible methodology to deal with self-selection bias, based on the generalization of Heckman’s two-step method with the introduction of copulas. This approach allows us to assume different distributions for the marginals and to express various dependence structures. The methodology is illustrated through a real data application, where the parameters are estimated according to the Bayesian approach and official statistics data are incorporated into the model via informative priors.

ANALISIS PERBANDINGAN KINERJA PERUSAHAAN DOMESTIK DAN ASING DENGAN MENGGUNAKAN ANALISIS RASIO MODAL SAHAM

Jurnal Riset Keuangan Dan Akuntansi ◽

10.25134/jrka.v5i1.1918 ◽

2019 ◽

Vol 5 (1) ◽

Author(s):

Syahrul Syarifudin

Keyword(s):

Information Support ◽

Price Ratio ◽

Sources Of Information ◽

Return On Equity ◽

Foreign Companies ◽

Significant Difference ◽

Equity Ratio ◽

Capitalization Rate ◽

Domestic Companies ◽

Better Than

There are companies that stand in Indonesia that are owned by foreigners. People tend to judge that the performance of foreign companies is better than domestic companies. This is due to the assumption that foreign companies have relatively larger capital, technology, and expertise that is better than domestic companies. Another presumption is that before, during, and after the crisis the performance of foreign-owned companies is better than domestic companies. In addition, to find out the good and bad performance of a company, it can use a stock capital ratio analysis. With this stock capital ratio, it can be seen the rate of return on equity, the ratio of earning per share, profit price, capitalization rate, and dividend income. So that the analysis can help investors and potential investors as sources of information support in investing in the company. The results of the data analysis using the T-test (Difference Test) found that there was no significant difference between the return on equity ratio, earnings per share ratio, the profit price ratio, the capitalization rate and dividend income. Thus the performance of domestic companies is significantly similar to the performance of foreign companies.Keywords: Earning per share, profit ratio, , capitalization ratio

Fast effect size shrinkage software for beta-binomial models of allelic imbalance

F1000Research ◽

10.12688/f1000research.20916.2 ◽

2020 ◽

Vol 8 ◽

pp. 2024

Author(s):

Joshua P. Zitovsky ◽

Michael I. Love

Keyword(s):

Allelic Imbalance ◽

Real Data ◽

Shrinkage Estimators ◽

Data Set ◽

Bayesian Shrinkage ◽

In Cis ◽

Posterior Estimation ◽

Binomial Models ◽

Better Than ◽

Diploid Organism

Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.

Nonpreemptive Goal Programing Method in Optimization Nurse Scheduling by Considering Education Level

Jurnal ILMU DASAR ◽

10.19184/jid.v22i2.16939 ◽

2021 ◽

Vol 22 (2) ◽

pp. 85

Author(s):

Fitriani Utina ◽

Lailany Yahya ◽

Nurwan Nurwan

Keyword(s):

Goal Programming ◽

Undergraduate Education ◽

Optimal Solution ◽

Education Level ◽

Programming Method ◽

Nurse Scheduling ◽

Optimal Service ◽

Schedule Design ◽

Good Nurse ◽

Better Than

Nurse scheduling is one of the problems that often arise in hospital management systems. Head of ICU room and nurse to cooperate in making good nurse scheduling for the creation of optimal service. In this paper, we study a hospital nurse schedule design by considering the level of nurse education and the provision of holidays. Nurses with undergraduate education (S1) Nurses become leaders on every shift and are accompanied by nurses with diploma education (D3). The scheduling model in this study using the nonpreemptive goal programming method and LINGO 11.0 software. The preparation of the schedule of nurses assigned to this method can optimize the need for efficient nurses per shift based on education level. The data in the research was obtained by collecting administrative data at Aloei Saboe Gorontalo hospital. The data used are the published schedule by the head of the ICU room. In making a nurse schedule, there are limitations to consider such ashospital regulation. The results of the study obtained an optimal solution in the form of meeting all the desired obstacles. Computational results shows that nurse scheduling using the nonpreemptive goal programming method and LINGO 11.0 software better than the schedule created manually. Every shift is a maximum of one leader with an undergraduate education (S1) background and accompanied by a nurse with a diploma education (D3) background. Keywords: scheduling, goal programming, nonpreemptive goal programming.