scholarly journals Improving clustering by imposing network information

2015 ◽  
Vol 1 (7) ◽  
pp. e1500163 ◽  
Author(s):  
Susanne Gerber ◽  
Illia Horenko

Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines. We propose and justify a computationally efficient and straightforward-to-implement way of imposing the available information from networks/graphs (a priori available in many application areas) on a broad family of clustering methods. The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification. This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios. Applying this approach results in an exact unsupervised classification of very short signals, opening new possibilities for clustering methods in the area of a noninvasive brain-computer interface.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Maria El Abbassi ◽  
Jan Overbeck ◽  
Oliver Braun ◽  
Michel Calame ◽  
Herre S. J. van der Zant ◽  
...  

AbstractUnsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.


2019 ◽  
pp. 40-46 ◽  
Author(s):  
V.V. Savchenko ◽  
A.V. Savchenko

We consider the task of automated quality control of sound recordings containing voice samples of individuals. It is shown that in this task the most acute is the small sample size. In order to overcome this problem, we propose the novel method of acoustic measurements based on relative stability of the pitch frequency within a voice sample of short duration. An example of its practical implementation using aninter-periodic accumulation of a speech signal is considered. An experimental study with specially developed software provides statistical estimates of the effectiveness of the proposed method in noisy environments. It is shown that this method rejects the audio recording as unsuitable for a voice biometric identification with a probability of 0,95 or more for a signal to noise ratio below 15 dB. The obtained results are intended for use in the development of new and modifying existing systems of collecting and automated quality control of biometric personal data. The article is intended for a wide range of specialists in the field of acoustic measurements and digital processing of speech signals, as well as for practitioners who organize the work of authorized organizations in preparing for registration samples of biometric personal data.


Author(s):  
Marianna Rita Stancampiano ◽  
Kentaro Suzuki ◽  
Stuart O’Toole ◽  
Gianni Russo ◽  
Gen Yamada ◽  
...  

Abstract In the newborn, penile length is determined by a number of androgen dependent and independent factors. The current literature suggests that there are inter-racial differences in stretched penile length in the newborn and although congenital micropenis should be defined as a stretched penile length of less than 2.5SDS of the mean for the corresponding population and gestation, a pragmatic approach would be to evaluate all boys with a stretched penile length below 2 cm, as congenital micropenis can be a marker for a wide range of endocrine conditions. However, it remains unclear as to whether the state of micropenis, itself, is associated with any long-term consequences. There is a lack of systematic studies comparing the impact of different therapeutic options on long-term outcomes, in terms of genital appearance, quality of life and sexual satisfaction. To date, research has been hampered by a small sample size and inclusion of a wide range of heterogeneous diagnoses; for these reasons, condition specific outcomes have been difficult to compare between studies. Lastly, there is a need for a greater collaborative effort in collecting standardized data so that all real-world or experimental interventions performed at an early age can be studied systematically into adulthood.


Water ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1861 ◽  
Author(s):  
Hossein Tabari

Analysis of hydrological extremes is challenging due to their rarity and small sample size and the interconnections between different types of extremes and gets further complicated by an untrustworthy representation of meso-scale processes involved in extreme events by coarse spatial and temporal scale models as well as biased or missing observations due to technical difficulties during extreme conditions. The special issue “Statistical Analysis and Stochastic Modelling of Hydrological Extremes”—motivated by the need to apply and develop innovative stochastic and statistical approaches to analyze hydrological extremes under current and future climate conditions —encompass 13 research papers. Case studies presented in the papers exploit a wide range of innovative techniques for hydrological extremes analyses. The papers focus on six topics: Historical changes in hydrological extremes, projected changes in hydrological extremes, downscaling of hydrological extremes, early warning and forecasting systems for drought and flood, interconnections of hydrological extremes and applicability of satellite data for hydrological studies. This Editorial provides an overview of the covered topics and reviews the case studies relevant for each topic.


2016 ◽  
Vol 13 (1) ◽  
pp. 713 ◽  
Author(s):  
Gülşah Başol ◽  
Mehmet Fatih Doguyurt ◽  
Seda Demir

<p>This content analysis study aims to methodologically evaluate the appropriateness of meta-analyses, conducted on Turkish samples on a variety of topics. Through an exhausting literature review, 80 meta-analyses were gathered together and coded into a detailed Meta-Analysis Evaluation Form.  The form consisted of 59 items (1 = Not Present, 2 = Present and 3 = Not Mentioned) both regarding the study and substantial characteristics. Two researchers coded the studies and the reliability of the coding of five studies indicated no problems with consistencies of the codings (Kappa= .90). According to the results, the most often encountered problem in meta-analyses was reporting both the fixed and random effects analyses without making a priori decision about the model choice. It was found that 60.0% of the meta-analyses investigated by the current study excluded studies conducted abroad which resulted underrepresentation of the literature.  Furthermore, the studies suffered from a small sample size issues. The methodology (how the studies were selected, coding form, reliability of the codings and etc.) was not explained clearly in more than a quarter of the studies. Therefore, it would be hard to claim that they have sufficient level of internal and external validity. It was hoped that researchers may benefit from the results of the current study to conduct better quality meta-analysis in the future.</p><p> </p><p><strong>Özet</strong></p><p>Bu içerik analizi çalışmasının amacı Türkiye'de yapılan meta analiz çalışmalarının metodolojik değerlendirmesinin yapılmasıdır. Meta Analiz Değerlendirme Formu üzerinden Türkiye literatüründeki 80 meta analiz çalışması kodlanmıştır. Değerlendirme formu çalışmaların künyelerini ve meta analiz yönteminin kullanımındaki çeşitlenmeyi içeren 59 (Evet-Hayır-Belirtilmemiş şeklinde cevaplanabilecek) maddeyi kapsamaktadır. İki araştırmacı kodlamaları gerçekleştirmiş ve öncesinde beş çalışmalık bir pilot çalışma üzerinden kodlamaları arasındaki uyum hesaplanmış ve Kappa katsayısı (Kappa= .90) yeterli düzeyde bulunmuştur. Sonuçlara göre meta analiz çalışmalarındaki en belirgin problem herhangi bir tercihte bulunmaksızın sabit ve rasgele etkiler modellerinin birlikte rapor edilmesidir. Çalışmaların %60'ında yurtdışı çalışmalar dahil edilmeksizin Türkiye örneklemindeki çalışmaları kullanarak meta analiz yapılmıştır. Yurtdışı çalışmalara yer veren meta analizlerde ise sayının çok düşük olduğu dolayısıyla örneklemin temsil ediciliğinin düşük olduğu görülmüştür. Meta analizlerde örneklem büyüklüğünün sayıca çok yetersiz olduğu ya da olmadığı görülmüştür. Çalışmaların dörtte birinden fazlasında metodoloji bölümünde çalışmaların nasıl toplandığı, kodlama formu, kodlamaların güvenirliği gibi konular açıklanmamıştır. Bu durum ilgili meta analiz çalışmalarının güvenirlik ve geçerliğini düşürmektedir. Mevcut değerlendirme çalışmasının, gelecekte meta analiz konusunda çalışacak araştırmacılara metodolojik bakımdan daha kaliteli araştırmalar ortaya koymaları hususunda katkı sağlayacağı beklenmektedir.</p>


2020 ◽  
Vol 57 (2) ◽  
pp. 237-251
Author(s):  
Achilleas Anastasiou ◽  
Alex Karagrigoriou ◽  
Anastasios Katsileros

SummaryThe normal distribution is considered to be one of the most important distributions, with numerous applications in various fields, including the field of agricultural sciences. The purpose of this study is to evaluate the most popular normality tests, comparing the performance in terms of the size (type I error) and the power against a large spectrum of distributions with simulations for various sample sizes and significance levels, as well as through empirical data from agricultural experiments. The simulation results show that the power of all normality tests is low for small sample size, but as the sample size increases, the power increases as well. Also, the results show that the Shapiro–Wilk test is powerful over a wide range of alternative distributions and sample sizes and especially in asymmetric distributions. Moreover the D’Agostino–Pearson Omnibus test is powerful for small sample sizes against symmetric alternative distributions, while the same is true for the Kurtosis test for moderate and large sample sizes.


2000 ◽  
Vol 2 (3) ◽  
pp. 29-39 ◽  
Author(s):  
Judy Wollin ◽  
Helen Dale ◽  
Nancy Spenser ◽  
Anne Walsh

Abstract The aim of this retrospective study was to determine from people with multiple sclerosis (MS) and their families what information would assist a person with newly diagnosed MS — in which format, when, and from whom it should be delivered. Thirty-four Queensland, Australia, residents with MS and 18 family members and friends participated in the main study. Participants were self-selected for this purposive, statewide, cross-sectional study. Nine of the respondents answered open-ended questions in addition to the standard questionnaires, and seven respondents gave in-depth interviews. The respondents recommended that people with a recent MS diagnosis and their families be given a wide range of information reflective of their personal needs. The information should be provided in person (in both group and individual sessions). They preferred to receive the information from their physicians and the staff of the Multiple Sclerosis Society. Research aimed at cures and therapies, as well as counseling and support services, should be discussed early in the course of the disease. Because of the small sample size and retrospective design, additional studies with larger populations are suggested to confirm these results and their cross-cultural applicability.


Author(s):  
Elaine Husni ◽  
Madonna Michael

The epidemiological studies of psoriatic arthritis (PsA) is quite challenging as our understanding of the disease is evolving. A wide range of incidence and prevalence is reported among different countries suggesting genetic and environmental factors influencing the epidemiology of PsA. Other contributing factors accounting for the wide range and variation of PsA epidemiology include age and gender variations, ethnicity, lack of precise case definition, and small sample size. A high level of suspicion in patients with pre-existing psoriasis, and collaborative efforts shared between primary care physicians (PCP), dermatologist, and rheumatologist, will enhance early detection and management of PsA, subsequently improving overall patient outcomes, and quality of life.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e23577-e23577
Author(s):  
Heide Stirnadel-Farrant ◽  
Anadi Mahajan ◽  
Navdeep Dhillon ◽  
Nashita Patel ◽  
Shibani Pokras

e23577 Background: Soft tissue sarcoma (STS) is a rare malignancy with an annual incidence rate of < 5 cases per 100,000 persons; outcomes for metastatic STS (mSTS) are poor. A targeted literature review was conducted to quantify the efficacy/effectiveness of current mSTS therapies. Methods: A structured search based on the population, intervention, comparator, outcome, study type (PICOS) framework was performed on articles (2009–2019) in MEDLINE, Embase, and Cochrane Central. Limited congress searches (ESMO, AACR, ASCO 2016–2018/2019) were also conducted. Clinical trials (CT) and observational studies (Obs) involving patients (pts, any age) with advanced mSTS receiving any pharmacological intervention were included. After screening, selected efficacy (CT) or effectiveness (Obs) endpoints (including progression-free survival [PFS], overall survival [OS], overall response rate, and duration of response) stratified by line of treatment (LOT, if available) were extracted. Results: Overall, 85 studies (56 CT, 29 Obs) met inclusion criteria; study size was 20–4,274 pts. PFS and OS (from 70 studies) were reported for pts with mSTS treated with a wide range of interventions including doxorubicin, trabectedin, pazopanib, and gemcitabine. Across any LOT, median PFS ranged 1.5–9.3 months in CT and 2.1–11.0 months in Obs; ranges were 5.7─28.8 months and 7.0─38.6 months, respectively, for OS. Median PFS and OS were generally lower with later (vs initial) LOT; few studies assessed ≥4 LOT (Table). Outcome data (any LOT) for trabectedin and pazopanib (the only approved targeted mSTS treatments) are shown in Table. Conclusions: This review of the efficacy/effectiveness of current treatments highlights the unmet clinical need for therapies that improve survival outcomes in pts with mSTS. Results may be influenced by small sample size, pt population, and care improvements over the period studied. [Table: see text]


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Ann Von Holle ◽  
Anne Justice ◽  
Kari E North ◽  
Bárbara Angel ◽  
Estela Blanco ◽  
...  

Dyslipidemia is an important risk factor for chronic cardiometabolic diseases. Lipid traits are highly heritable and there are currently >185 established loci influencing lipid levels in adults. Recent studies have confirmed that variants associated with lipids influence lipid levels across the lifecourse, and in ancestrally diverse populations. Given that Hispanic/Latinos (HL) shoulder much of the cardiometabolic burden in the United States, it is important to identify genetic variants that contribute the greatest risk for elevated lipid levels across life stages. Thus, our primary aim is to examine the association of known lipid variants with lipid traits identified in large study of adult participants from a Chilean infancy cohort of primarily European-descent. The sample assessed from 2008 to 2013 (n=546) had genotyping and well-measured lipid phenotypes (median age: 16.8 years, interquartile range: 16.6, 16.9). We assessed single variant associations using linear regression for high density lipoprotein cholesterol (HDL-C), low density lipoprotein cholesterol (LDL-C) and triglycerides (TG), assuming an additive genetic model, adjusted for sex. Additionally, we regressed phenotypes onto weighted trait-specific polygenic risk scores (PRS). Only six variants from the Chilean sample met the a priori threshold of power > 0.8. We found statistically significant effect sizes (mmol/l (se)) for four of the six variants: rs3764261 (0.16 (0.04)) and rs1532085 (0.05 (0.04)) for HDL and rs1260326 (0.34 (0.15)) and rs964184 (0.33 (0.15)) for TG. For each significant variant, direction of effect matched the multiethnic adult GWAS from which SNPs were selected. We compared our findings to a previous study in Finnish children at age 18 years (n=1,216) and found an opposite direction of effect for our significant HDL variants. Likewise, when comparing coefficients for the PRS between the Chilean and Finnish youth sample we found the association to be stronger in the Chilean sample for every trait and gender group with the exception of LDL for males. The lipid loci explained the least amount of total variance for LDL (males=4% and females=5%) and the most amount of variance for HDL (males=20% and females=14%). In conclusion, there is evidence that lipid loci from a HL sample of adolescents contain similar associations as those from European children and adults. Despite the small sample size and possibility for bias with different ancestral groups we found meaningful and statistically significant associations relating lipid loci in a HL cohort of Chilean adolescents with those found in European ancestral groups. These associations emphasize the importance of adolescence as a time for disease prevention given studies demonstrating both the persistence of associations between PRS and lipids over the life course and the increasing role PRS plays in predicting disease.


Sign in / Sign up

Export Citation Format

Share Document