scholarly journals Entropic Ranks: A Methodology for Enhanced, Threshold-Free, Information-Rich Data Partition and Interpretation

2020 ◽  
Vol 10 (20) ◽  
pp. 7077
Author(s):  
Hector-Xavier de Lastic ◽  
Irene Liampa ◽  
Alexandros G. Georgakilas ◽  
Michalis Zervakis ◽  
Aristotelis Chatziioannou

Background: Here, we propose a threshold-free selection method for the identification of differentially expressed features based on robust, non-parametric statistics, ensuring independence from the statistical distribution properties and broad applicability. Such methods could adapt to different initial data distributions, contrary to statistical techniques, based on fixed thresholds. This work aims to propose a methodology, which automates and standardizes the statistical selection, through the utilization of established measures like that of entropy, already used in information retrieval from large biomedical datasets, thus departing from classical fixed-threshold based methods, relying in arbitrary p-value and fold change values as selection criteria, whose efficacy also depends on degree of conformity to parametric distributions,. Methods: Our work extends the rank product (RP) methodology with a neutral selection method of high information-extraction capacity. We introduce the calculation of the RP entropy of the distribution, to isolate the features of interest by their contribution to its information content. Goal is a methodology of threshold-free identification of the differentially expressed features, which are highly informative about the phenomenon under study. Conclusions: Applying the proposed method on microarray (transcriptomic and DNA methylation) and RNAseq count data of varying sizes and noise presence, we observe robust convergence for the different parameterizations to stable cutoff points. Functional analysis through BioInfoMiner and EnrichR was used to evaluate the information potency of the resulting feature lists. Overall, the derived functional terms provide a systemic description highly compatible with the results of traditional statistical hypothesis testing techniques. The methodology behaves consistently across different data types. The feature lists are compact and rich in information, indicating phenotypic aspects specific to the tissue and biological phenomenon investigated. Selection by information content measures efficiently addresses problems, emerging from arbitrary thresh-holding, thus facilitating the full automation of the analysis.

Author(s):  
Hector - Xavier de Lastic ◽  
Irene Liampa ◽  
Alexandros G. Georgakilas ◽  
Michalis Zervakis ◽  
Aristotelis Chatziioannou

Background: Traditional omic analysis relies on p-value and fold change as selection criteria. There is an ongoing debate on their effectiveness in delivering systemic and robust interpretation, due to their dependence on assumptions of conformity with various parametric distributions.Here, we propose a threshold-free selection method based on robust, non-parametric statistics, ensuring independence from the statistical distribution properties and broad applicability. Such methods could adapt to different initial data distributions, contrary to statistical techniques based on fixed thresholds. Methods: Our work extends the Rank Products methodology with a neutral selection method of high information-extraction capacity. We introduce the calculation of the RP distribution’s entropy to isolate the features of interest by their contribution to the distribution’s information content. The aim is a methodology performing threshold-free identification of the differentially expressed features, which are highly informative about the phenomenon under scrutiny. Conclusions: Applying the proposed method on microarray (transcriptomic and DNA methylation) and RNAseq count data of varying sizes and noise presence, we observe robust convergence for the different parameterisations to stable cutoff points. Functional analysis through BioInfoMiner and EnrichR was used to evaluate the information potency of the resulting feature lists. Overall, the derived functional terms provide a systemic description highly compatible with the results of traditional statistical hypothesis testing techniques. The methodology behaves consistently across different data types. The feature lists are compact and information-rich, indicating phenotypic aspects specific to the tissue and biological phenomenon i nvestigated. Selection by information content measures efficiently addresses problems, emerging from arbitrary thresholding, thus facilitating the full automation of the analysis.


2021 ◽  
Author(s):  
Lingfei Wang

AbstractSingle-cell RNA sequencing (scRNA-seq) provides unprecedented technical and statistical potential to study gene regulation but is subject to technical variations and sparsity. Here we present Normalisr, a linear-model-based normalization and statistical hypothesis testing framework that unifies single-cell differential expression, co-expression, and CRISPR scRNA-seq screen analyses. By systematically detecting and removing nonlinear confounding from library size, Normalisr achieves high sensitivity, specificity, speed, and generalizability across multiple scRNA-seq protocols and experimental conditions with unbiased P-value estimation. We use Normalisr to reconstruct robust gene regulatory networks from trans-effects of gRNAs in large-scale CRISPRi scRNA-seq screens and gene-level co-expression networks from conventional scRNA-seq.


2021 ◽  
Vol 3 (2) ◽  
pp. 41-51
Author(s):  
Sri Hidayat ◽  
Syafri Syafri ◽  
Syahriar Tato

Koridor ruas jalan Hertasning-Tun Abdul Razak merupakan wilayah peri-urban yang mengalami dinamika cukup tinggi akibat kebutuhan permukiman dan sarana kegiatan baru. Hal ini memicu terjadinya transformasi spasial. Transformasi spasial memberikan dampak pada peningkatan aktivitas antropogenik yang dapat mengubah iklim perkotaan. Peningkatan aktivitas antropogenik ditandai dengan perbedaan penggunaan lahan dan kinerja lalu lintas sepanjang koridor. Penelitian ini menggunakan metode kuantitatif untuk mengetahui hubungan variabel penggunaan lahan dan kinerja lalu lintas terhadap kondisi iklim perkotaan dengan analisis data menggunakan SEM PLS.  Hasil pengujian hipotesis secara statistik terhadap pengaruh masing-masing variabel independen terhadap variabel dependennya menghasilkan kesimpulan penggunaan lahan berpengaruh signifikan terhadap kondisi iklim dimana nilai T-Statistik sebesar 2,752 > 1,96 atau nilai P sebesar 0,040 < 0,05. Sementara kinerja lalu lintas tidak berpengaruh signifikan terhadap kondisi iklim perkotaan dengan nilai T-Statistik sebesar 1,071 < 1,96 atau nilai P sebesar 0,285 > 0,05. Hasil ini juga menunjukkan bahwa penggunaan lahan di koridor ruas jalan Hertasning-Tun Abdul Razak dapat menyebabkan meningkatnya suhu perkotaan dikawasan tersebut. Namun peningkatan suhu perkotaan pada kawasan tersebut lebih disebabkan oleh aktivitas antropogenik pada penggunaan lahannya dan tidak dipengaruhi oleh luas area yang terbangun. The corridor of the Hertasning-Tun Abdul Razak road section is a peri-urban area experiencing high dynamics due to the need for new housing and activity facilities. This triggers a spatial transformation. Spatial transformation has an impact on increasing anthropogenic activities that can change the urban climate. The increase in anthropogenic activity is indicated by differences in land use and traffic performance along the corridor. This study uses a quantitative method to determine the relationship between land use variables and traffic performance on urban climatic conditions with data analysis using SEM PLS. The results of statistical hypothesis testing on the effect of each independent variable on the dependent variable resulted in the conclusion that land use had a significant effect on climatic conditions where the T-statistic value was 2.752> 1.96 or the P value was 0.040 <0.05. Meanwhile, traffic performance has no significant effect on urban climatic conditions with a T-statistic value of 1.071 <1.96 or a P value of 0.285> 0.05. These results also indicate that land use in the Hertasning-Tun Abdul Razak road corridor can cause an increase in urban temperatures in the area. However, the increase in urban temperature in these areas is more due to anthropogenic activities in land use and is not influenced by the area that is built.


Blood ◽  
2009 ◽  
Vol 114 (22) ◽  
pp. 3795-3795
Author(s):  
Monika Belickova ◽  
Jaroslav Cermak ◽  
Alzbeta Vasikova ◽  
Eva Budinska

Abstract Abstract 3795 Poster Board III-731 Gene expression profiles of CD34+ cells were compared between a cohort of 51 patients with MDS or AML from MDS and 7 healthy controls. The patients were classified according to the WHO criteria as follows: 5q- syndrome (n=7), RA (n=3), RARS (n=2), RCMD (n=10), RAEB-1 (n=7), RAEB-2 (n=15), and AML with MLD (multilineage dysplasia) (n=7). HumanRef-8 v2 Expression Bead Chips (Illumina) were used to generate expression profiles of the samples for >22,000 transcripts. The raw data were normalized data with the R software, lumi package. Normalized data were filtered by detection p-value <0.01, resulting in total number of 9811genes. To identify differentially expressed genes we performed two parallel statistical hypothesis testings: Analysis of Variance (ANOVA) together with Tukey test and empirical bayesian thresholding correction for multiple testing problem; and Significance Analysis of Microarrays (SAM). The results were confirmed by real-time quantitative PCR for six genes (TaqMan Gene Expression Assays). Hierarchical clustering of significantly differentially expressed genes clearly separated patients and controls, 5q-syndrome and RAEB-1 as a separate entities confirming usefulness of WHO classification subgroups. The most up-regulated genes in all patients included HBG2, HBG1, CYBRD1, HSPA1B, ANGPT1, and MYC. We assume that expression changes in globin genes, both fetal and adult globins (HBG2, HBG1 and HBA1, HBB) may play role not only in dysregulation of erythropoiesis but also in the disease progression or leukemic transformation of MDS. Among the most down-regulated genes, 13 genes related to B-lymphopoiesis (e.g. POU2AF1, VPREB1, VPREB3, CD79A, EBF1, LEF1, BCL3, IRF8 & IRF4) were detected, suggesting the abnormal development of B-cell progenitors in all MDS patients. Some of these genes (e.g. VPREB3, LEF1) showed decreasing trend in expression level from early to advanced MDS with the lowest expression in AML with MLD. Patients with advanced MDS had significantly decreased expression of genes involved in in the mitotic cell cycle, DNA replication, and chromosome segregation compared to early MDS where these gene subsets were up-regulated. The DAVID database also identified de-regulation in the cell cycle pathway through its 7 genes (CDC25C, CDC7, CDC20, ORC1L, CCNB2, BUB1, & CCNA2). On the other hand, advanced MDS patients showed significant up-regulation of proto-oncogenes (BMI1, MERTK) and genes related to angiogenesis (ANGPT1), anti-apoptosis (VNN1). The results confirm on molecular basis that increased cell proliferation and resistance to apoptosis together with a loss of cell cycle control, damaged DNA repair and altered immune response may play an important role in the expansion of malignant clone in MDS patients. The study was supported by Grant NR-9235 obtained from the Ministry of Health, Czech Republic. Disclosures: No relevant conflicts of interest to declare.


Author(s):  
Helena Kraemer

“As ye sow. So shall ye reap”: For almost 100 years, researchers have been taught that the be-all and end-all in data-based research is the p-value. The resulting problems have now generated concern, often from us who have long so taught researchers. We must bear a major responsibility for the present situation and must alter our teachings. Despite the fact that the Zhang and Hughes paper is titled “Beyond p-value”, the total focus remains on statistical hypothesis testing studies (HTS) and p-values(1). Instead, I would propose that there are three distinct, necessary, and important phases of research: 1) Hypothesis Generation Studies (HGS) or Exploratory Research (2-4); 2) Hypothesis Testing Studies (HTS); 3) Replication and Application of Results. Of these, HTS is undoubtedly the most important, but without HGS, HTS is often weak and wasteful, and without Replication and Application, the results of HTS are often misleading.


Author(s):  
Dyah Wulandari ◽  
Siti Maria Ulfa ◽  
Arfiyan Ridwan

<p class="MsoNormal" style="margin-top: 0cm; margin-right: 5.6pt; margin-bottom: .0001pt; margin-left: 5.8pt; text-align: justify;">The objective of the research to compare of Zimmer twins website tool as digital storytelling than use nondigital in writing text narrative on students writing ability at the eleventh grade of MA  Yayasan Sirojul  Islam Sukolilo 2018/2019 academic year. Zimmer twins media is an animated movie maker based on the website for the students to create their short stories in movies with many emotion, etc. The sample of this research the Eleventh of MA YASI those are XI- 1 class as the experimental class, and the X1-2 class as the control class consisted of 20 students. The method in this research was a quantitative method. In addition, the design used  was quasi-experimental research, and the instrument used a test. The research was taken by using non-random sampling. Moreover, this research was conducted through the following procedures: giving pre-test, applying treatments and giving post-test. The data analyzed and processed by using the statistic data calculation of ANCOVA by SPSS 23 program. The significant was shown by the students post-test mean in experimental class is 76.55 and the mean post-test in control class is 70.55. The result of the statistical hypothesis testing found from p-value was 0.000. It is lower than the level significant of 0.05. If p-value ≤ from the level significant 0.05. It means that H<sub>1</sub> was accepted and H<sub>0</sub> was rejected. In conclusion, Zimmer twins media can be effective as media teaching to writing ability of narrative text at the eleventh-grade students of MA Yayasan Sirojul IslamSukolilo.</p><table class="MsoNormalTable" style="width: 468.1pt; border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-yfti-tbllook: 1184; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext;" width="624" border="1" cellspacing="0" cellpadding="0"><tbody><tr style="mso-yfti-irow: 0; mso-yfti-firstrow: yes; mso-yfti-lastrow: yes; height: 62.1pt;"><td style="width: 305.8pt; border: none; border-top: solid windowtext 1.0pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; height: 62.1pt;" valign="top" width="408"><p class="MsoNormal" style="margin-top: 0cm; margin-right: 5.6pt; margin-bottom: .0001pt; margin-left: 5.8pt; text-align: justify;"><span style="mso-bidi-language: EN-US;">The objective of the research to compare of Zimmer twins website tool as digital storytelling than use nondigital in writing text narrative on students writing ability at the eleventh grade of MA<span style="mso-spacerun: yes;">  </span>Yayasan Sirojul<span style="mso-spacerun: yes;">  </span>Islam Sukolilo 2018/2019 academic year. Zimmer twins media is an animated movie maker based on the website for the students to create their short stories in movies with many emotion, etc. The sample of this research the Eleventh of MA YASI those are XI- 1 class as the experimental class, and the X1-2 class as the control class consisted of 20 students. The method in this research was a quantitative method. In addition, the design used<span style="mso-spacerun: yes;">  </span>was quasi-experimental research, and the instrument used a test. The research was taken by using non-random sampling. Moreover, this research was conducted through the following procedures: giving pre-test, applying treatments and giving post-test. The data analyzed and processed by using the statistic data calculation of ANCOVA by SPSS 23 program. The significant was shown by the students post-test mean in experimental class is 76.55 and the mean post-test in control class is 70.55. The result of the statistical hypothesis testing found from p-value was 0.000. It is lower than the level significant of 0.05. If p-value ≤ from the level significant 0.05. It means that H<sub>1</sub> was accepted and H<sub>0</sub> was rejected. In conclusion, Zimmer twins media can be effective as media teaching to writing ability of narrative text at the eleventh-grade students of MA Yayasan Sirojul IslamSukolilo.</span></p><p class="MsoNormal" style="text-align: justify;"><span style="font-size: 9.0pt; mso-fareast-font-family: Calibri; mso-bidi-font-style: italic;"> </span></p></td></tr></tbody></table>


Author(s):  
Riko Kelter

AbstractThe Full Bayesian Significance Test (FBST) and the Bayesian evidence value recently have received increasing attention across a variety of sciences including psychology. Ly and Wagenmakers (2021) have provided a critical evaluation of the method and concluded that it suffers from four problems which are mostly attributed to the asymptotic relationship of the Bayesian evidence value to the frequentist p-value. While Ly and Wagenmakers (2021) tackle an important question about the best way of statistical hypothesis testing in the cognitive sciences, it is shown in this paper that their arguments are based on a specific measure-theoretic premise. The identified problems hold only under a specific class of prior distributions which are required only when adopting a Bayes factor test. However, the FBST explicitly avoids this premise, which resolves the problems in practical data analysis. In summary, the analysis leads to the more important question whether precise point null hypotheses are realistic for scientific research, and a shift towards the Hodges-Lehmann paradigm may be an appealing solution when there is doubt on the appropriateness of a precise hypothesis.


2019 ◽  
Vol 17 ◽  
Author(s):  
Xiaoli Yu ◽  
Lu Zhang ◽  
Na Li ◽  
Peng Hu ◽  
Zhaoqin Zhu ◽  
...  

Aim: We aimed to identify new plasma biomarkers for the diagnosis of Pulmonary tuberculosis. Background: Tuberculosis is an ancient infectious disease that remains one of the major global health problems. Until now, effective, convenient, and affordable methods for diagnosis of Pulmonary tuberculosis were still lacked. Objective: This study focused on construct a label-free LC-MS/MS based comparative proteomics between six tuberculosis patients and six healthy controls to identify differentially expressed proteins (DEPs) in plasma. Method: To reduce the influences of high-abundant proteins, albumin and globulin were removed from plasma samples using affinity gels. Then DEPs from the plasma samples were identified using a label-free Quadrupole-Orbitrap LC-MS/MS system. The results were analyzed by the protein database search algorithm SEQUEST-HT to identify mass spectra to peptides. The predictive abilities of combinations of host markers were investigated by general discriminant analysis (GDA), with leave-one-out cross-validation. Results: A total of 572 proteins were identified and 549 proteins were quantified. The threshold for differentially expressed protein was set as adjusted p-value < 0.05 and fold change ≥1.5 or ≤0.6667, 32 DEPs were found. ClusterVis, TBtools, and STRING were used to find new potential biomarkers of PTB. Six proteins, LY6D, DSC3, CDSN, FABP5, SERPINB12, and SLURP1, which performed well in the LOOCV method validation, were termed as potential biomarkers. The percentage of cross-validated grouped cases correctly classified and original grouped cases correctly classified is greater than or equal to 91.7%. Conclusion: We successfully identified five candidate biomarkers for immunodiagnosis of PTB in plasma, LY6D, DSC3, CDSN, SERPINB12, and SLURP1. Our work supported this group of proteins as potential biomarkers for pulmonary tuberculosis, and be worthy of further validation.


Sign in / Sign up

Export Citation Format

Share Document