The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

Author(s):  
Галина Николаевна Жукова ◽  
Михаил Васильевич Ульянов

В статье рассмотрена задача восстановления символьных периодических последовательностей, искаженных шумами вставки, а также замены и удаления символов. Поскольку степень детализации символьного описания процесса определяется мощностью алфавита, представляет интерес исследование влияния степени детализации символьного описания на возможность восстановления полной информации об исходной периодической последовательности. Представлено экспериментальное исследование зависимости характеристик качества предложенного авторами метода восстановления периода от мощности алфавита. Для алфавитов разной мощности приводятся доля последовательностей с удовлетворительно восстановленным периодом и относительная погрешность определения длины периода. Качество восстановления оценивается отношением редакционного расстояния от восстановленной периодической последовательности до исходной строго периодической последовательности The relevance of this study is associated with the presence of a wide range of applied problems in real-world data processing and analysis. It is sensible to encode information using symbols from a finite alphabet in such problems. By varying the cardinality of the alphabet, in the description of the process, the symbolic representation provides a level of detail sufficient for real-world data analysis. However, for a number of subject areas in which it is possible to use symbolic coding of trajectories of the examined processes researchers face the presence of distortions, noise, and fragmentation of information. This occurs in bioinformatics, medicine, digital economy, time series forecasting and analysis of business processes. Periodic processes are widely represented in these subject areas. Without noise, these processes correspond to periodic symbolic sequences, i.e. words over a finite alphabet. A researcher often receives a sequence distorted by noises of various origins as the experimental data, instead of the expected periodic symbolic sequence. Under these conditions, when solving the problem of identifying the periodicity, which includes both the determination of a periodically repeating symbolic fragment and its length, hereinafter called the period, the problem requires reducing the effect of noise on the experimental results. The article deals with the problem of recovering periodic sequences, distorted by presence of noise along the replaced and deleted symbols. Since the level of detail in the description of the process depends on the cardinality of the alphabet, it is of interest to study the influence of the level of detail in the symbolic description on the possibility of recovering complete information about the initially periodic sequences. The article experimentally examines the dependence of the cardinality of the alphabet on the quality characteristics of the period recovery method proposed by the authors. For alphabets of different cardinalities, the proportion of sequences with a satisfactorily reconstructed period and the relative error in determining the length of the period are given. The quality of reconstruction of a periodically repeating fragment is estimated by the ratio of the editing distance from the reconstructed periodic sequence to the original sequence distorted by noise

2020 ◽  
Vol 13 (11) ◽  
pp. 371
Author(s):  
Maximilian J. Hochmair ◽  
Hannah Fabikan ◽  
Oliver Illini ◽  
Christoph Weinlinger ◽  
Ulrike Setinek ◽  
...  

In clinical practice, patients with anaplastic lymphoma kinase (ALK)-rearrangement–positive non–small-cell lung cancer commonly receive sequential treatment with ALK tyrosine kinase inhibitors. The third-generation agent lorlatinib has been shown to inhibit a wide range of ALK resistance mutations and thus offers potential benefit in later lines, although real-world data are lacking. This multicenter study retrospectively investigated later-line, real-world use of lorlatinib in patients with advanced ALK- or ROS1-positive lung cancer. Fifty-one patients registered in a compassionate use program in Austria, who received second- or later-line lorlatinib between January 2016 and May 2020, were included in this retrospective real-world data analysis. Median follow-up was 25.3 months. Median time of lorlatinib treatment was 4.4 months for ALK-positive and 12.2 months for ROS-positive patients. ALK-positive patients showed a response rate of 43.2%, while 85.7% percent of the ROS1-positive patients were considered responders. Median overall survival from lorlatinib initiation was 10.2 and 20.0 months for the ALK- and ROS1-positive groups, respectively. In the ALK-positive group, lorlatinib proved efficacy after both brigatinib and alectinib. Lorlatinib treatment was well tolerated. Later-line lorlatinib treatment can induce sustained responses in patients with advanced ALK- and ROS1-positive lung cancer.


2020 ◽  
Author(s):  
Chethan Sarabu ◽  
Sandra Steyaert ◽  
Nirav Shah

Environmental allergies cause significant morbidity across a wide range of demographic groups. This morbidity could be mitigated through individualized predictive models capable of guiding personalized preventive measures. We developed a predictive model by integrating smartphone sensor data with symptom diaries maintained by patients. The machine learning model was found to be highly predictive, with an accuracy of 0.801. Such models based on real-world data can guide clinical care for patients and providers, reduce the economic burden of uncontrolled allergies, and set the stage for subsequent research pursuing allergy prediction and prevention. Moreover, this study offers proof-of-principle regarding the feasibility of building clinically useful predictive models from 'messy,' participant derived real-world data.


Author(s):  
Giovanni Corrao ◽  
Giovanni Alquati ◽  
Giovanni Apolone ◽  
Andrea Ardizzoni ◽  
Giuliano Buzzetti ◽  
...  

The current COVID pandemic crisis made it even clearer that the solutions to several questions that public health must face require the access to good quality data. Several issues of the value and potential of health data and the current critical issues that hinder access are discussed in this paper. In particular, the paper (i) focuses on “real-world data” definition; (ii) proposes a review of the real-world data availability in our country; (iii) discusses its potential, with particular focus on the possibility of improving knowledge on the quality of care provided by the health system; (iv) emphasizes that the availability of data alone is not sufficient to increase our knowledge, underlining the need that innovative analysis methods (e.g., artificial intelligence techniques) must be framed in the paradigm of clinical research; and (v) addresses some ethical issues related to their use. The proposal is to realize an alliance between organizations interested in promoting research aimed at collecting scientifically solid evidence to support the clinical governance of public health.


2019 ◽  
Vol 30 ◽  
pp. v744-v745
Author(s):  
T. Kosmidis ◽  
B. Athanasakou ◽  
P.A. Kosmidis

2013 ◽  
Vol 16 (7) ◽  
pp. A511
Author(s):  
S. Purwins ◽  
C. Spehr ◽  
M. Augustin ◽  
M.A. Radtke ◽  
K. Reich ◽  
...  

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e18061-e18061
Author(s):  
Hui-Li Wong ◽  
Koen Degeling ◽  
Azim Jalali ◽  
Jeremy David Shapiro ◽  
Suzanne Kosmider ◽  
...  

e18061 Background: The wide range of possible combinations and sequences available for mCRC treatment presents a major challenge to clinicians, who need to determine the optimal approach for an individual patient or patient subset. In the absence of clinical trial evidence, real world data are an increasingly valuable resource that can be utilized not only to understand treatment patterns and outcomes in routine practice, but also to define an optimal treatment strategy for individual patients across multiple lines of therapy. Methods: Real world data from an Australian mCRC registry were used to develop an interactive data visualization tool that displays treatment variation, customizable to different levels of detail and specific patient subsets, based on patient and disease characteristics. Next, a discrete event simulation model was developed to predict progression-free (PFS) and overall survival (OS) for first line palliative treatment with doublet chemotherapy alone or with bevacizumab, based on data of 867 patients that were treated accordingly. Results: Of 2694 Australian patients enrolled, 2057 (76%) started 1st line treatment with chemotherapy and/or a biologic agent, 1087 (40%) and 428 (16%) received 2nd and 3rd line therapy, respectively. Combined, these 3 lines of treatment accounted for 733 unique sequences. After recoding treatment to the most intensive chemotherapy and the first exposed biologic, 472 unique sequences remained. In exploratory analyses, the simulation model estimated that median 1st line PFS (95% CI) of 219 (25%) patients could be improved from 175 (156, 199) to 269 days (247, 293) by targeting a different treatment. Conclusions: This was an initial exploration of the potential for data visualization and simulation modeling to inform understanding of practice in mCRC and to guide clinical decision making. Such tools allow clinicians and health system providers to define variation in practice patterns and to identify opportunities to improve care and outcomes. Ultimately, the aim is to improve the delivery of personalized cancer care, where other applications such as conditional survival and cost-effectiveness analyses may be useful.


Oncology ◽  
2021 ◽  
Vol 99 (Suppl. 1) ◽  
pp. 3-7
Author(s):  
George D. Demetri ◽  
Silvia Stacchiotti

Real-world data are defined as data relating to any aspect of a patient’s health status collected in the context of routine health surveillance and medical care delivery. Sources range from insurance billing claims through to electronic surveillance data (e.g., activity trackers). Real-world data derive from large populations in diverse clinical settings and thus can be extrapolated more readily than clinical trial data to patients in different clinical settings or with a variety of comorbidities. Real-world data are used to generate real-world evidence, which might be regarded as a “meta-analysis” of accumulated real-world data. Increasingly, regulatory authorities are recognizing the value of real-world data and real-world evidence, especially for rare diseases where it may be practically unfeasible to conduct randomized controlled trials. However, the quality of real-world evidence depends on the quality of the data collected which, in turn, depends on a correct pathological diagnosis and the homogeneous behaviour of a reliably defined and consistent disease entity. As each of the more than 80 varieties of soft tissue sarcoma (STS) types represents a distinct disease entity, the situation is exceedingly complicated. Discordant diagnoses, which affect data quality, present a major challenge for use of real-world data. As real-world data are difficult to collect, collaboration across sarcoma reference institutions and sophisticated information technology solutions are required before the potential of real-world evidence to inform decision-making in the management of STS can be fully exploited.


Sign in / Sign up

Export Citation Format

Share Document