L1EM: A tool for accurate locus specific LINE-1 RNA quantification

Mapping Intimacies ◽

10.1101/714014 ◽

2019 ◽

Author(s):

Wilson McKerrow ◽

David Fenyö

Keyword(s):

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Simulated Data ◽

Cellular Damage ◽

Genomic Locus ◽

Protein Coding ◽

Disease States ◽

Rna Quantification ◽

Long Read ◽

Specific Line

AbstractMotivationLINE-1 elements are retrotransposons that are capable of copying their sequence to new genomic loci. LINE-1 derepression is associated with a number of disease states, and has the potential to cause significant cellular damage. Because LINE-1 elements are repetitive, it is difficult to quantify RNA at specific LINE-1 loci and to separate transcripts with protein coding capability from other sources of LINE-1 RNA.ResultsWe provide a tool, L1-EM that uses the expectation maximization algorithm to quantify LINE-1 RNA at each genomic locus, separating transcripts that are capable of generating retrotransposition from those that are not. We show the accuracy of L1-EM on simulated data and against long read sequencing from HEK cells.AvailabilityL1-EM is written in python. The source code along with the necessary annotations are available at https://github.com/FenyoLab/L1EM and distributed under [email protected], [email protected]

Download Full-text

L1EM: a tool for accurate locus specific LINE-1 RNA quantification

Bioinformatics ◽

10.1093/bioinformatics/btz724 ◽

2019 ◽

Cited By ~ 3

Author(s):

Wilson McKerrow ◽

David Fenyö

Keyword(s):

Expectation Maximization Algorithm ◽

Simulated Data ◽

Cellular Damage ◽

Supplementary Information ◽

Genomic Locus ◽

Protein Coding ◽

Disease States ◽

Rna Quantification ◽

Long Read ◽

Specific Line

Abstract Motivation LINE-1 elements are retrotransposons that are capable of copying their sequence to new genomic loci. LINE-1 derepression is associated with a number of disease states, and has the potential to cause significant cellular damage. Because LINE-1 elements are repetitive, it is difficult to quantify LINE-1 RNA at specific loci and to separate transcripts with protein coding capability from other sources of LINE-1 RNA. Results We provide a tool, L1EM that uses the expectation maximization algorithm to quantify LINE-1 RNA at each genomic locus, separating transcripts that are capable of generating retrotransposition from those that are not. We show the accuracy of L1EM on simulated data and against long read sequencing from HEK cells. Availability and implementation L1EM is written in python. The source code along with the necessary annotations are available at https://github.com/FenyoLab/L1EM and distributed under GPLv3. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multiple scaled contaminated normal distribution and its application in clustering

Statistical Modelling ◽

10.1177/1471082x19890935 ◽

2019 ◽

pp. 1471082X1989093 ◽

Cited By ~ 1

Author(s):

Antonio Punzo ◽

Cristina Tortora

Keyword(s):

Expectation Maximization ◽

Heavy Tails ◽

Expectation Maximization Algorithm ◽

Simulated Data ◽

Principal Component ◽

Multivariate Normal ◽

Degree Of Contamination ◽

Robust Estimates ◽

Directional Detection ◽

Heavy Tailed

The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as ‘bad’ points herein) and automatically detect bad points. The price of these advantages is two additional parameters: proportion of good observations and degree of contamination. However, in a multivariate setting, only one proportion of good observations and only one degree of contamination may be limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution. Among its parameters, we have an orthogonal matrix Γ. In the space spanned by the vectors (principal components) of Γ, there is a proportion of good observations and a degree of contamination for each component. Moreover, each observation has a posterior probability of being good with respect to each principal component. Thanks to this probability, the method provides directional robust estimates of the parameters of the nested MN and automatic directional detection of bad points. The term ‘directional’ is added to specify that the method works separately for each principal component. Mixtures of MSCN distributions are also proposed, and an expectation-maximization algorithm is used for parameter estimation. Real and simulated data are considered to show the usefulness of our mixture with respect to well-established mixtures of symmetric distributions with heavy tails.

Download Full-text

EVALUATION OF UNCONSTRAINING METHODS IN AIRLINES’ REVENUE MANAGEMENT SYSTEMS

EMC Review - Časopis za ekonomiju - APEIRON ◽

10.7251/emc1902368b ◽

2020 ◽

Vol 18 (2) ◽

Author(s):

Ružica Škurla Babić ◽

Maja Ozmec-Ban ◽

Jasmin Bajić

Keyword(s):

Revenue Management ◽

Censored Data ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Simulated Data ◽

Management Systems ◽

Data Sets ◽

Booking Limits ◽

Censored Observations ◽

Simulated Data Sets

Airline revenue management systems are used to calculate booking limits on each fare class to maximize expected revenue for all future flight departures. Their performance depends critically on the forecasting module that uses historical data to project future quantities of demand. Those data are censored or constrained by the imposed booking limits and do not represent true demand since rejected requests are not recorded. Eight unconstraining methods that transform the censored data into more accurate estimates of actual historical demand ranging from naive methods such as discarding all censored observation, to complex, such as Expectation Maximization Algorithm and Projection Detruncation Algorithm, are analyzed and their accuracy is compared. Those methods are evaluated and tested on simulated data sets generated by ICE V2.0 software: first, the data sets that represent true demand were produced, then the aircraft capacity was reduced and EMSRb booking limits for every booking class were calculated. These limits constrained the original demand data at various points of the booking process and the corresponding censored data sets were obtained. The unconstrained methods were applied to the censored observations and the resulting unconstrained data were compared to the actual demand data and their performance was evaluated.

Download Full-text