statistical metrics
Recently Published Documents


TOTAL DOCUMENTS

126
(FIVE YEARS 77)

H-INDEX

10
(FIVE YEARS 5)

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262503
Author(s):  
Guhuai Han ◽  
Tao Zhou ◽  
Yuanheng Sun ◽  
Shoujie Zhu

This paper re-examines the relationships between night-time light (NTL) and gross domestic product (GDP), population, road networks, and carbon emissions in China and India. Two treatments are carried out to those factors and NTL, which include simple summation in each administrative region (total data), and summation normalized by region area (density data). A series of univariate regression and multiple regression experiments are conducted in different countries and at different scales, in order to find the changes in the relationship between NTL and every parameter in different situations. Several statistical metrics, such as R2, Mean Relative Error (MRE), multiple regression weight coefficient, and Pearson’s correlation coefficient are given special attention. We found that GDP, as a comprehensive indicator, is more representative of NTL when the administrative region is relatively comprehensive or highly developed. However, when these regions are unbalanced or undeveloped, the representation of GDP becomes weak and other factors can have a more important influence on the multiple regression. Differences in the relationship between NTL and GDP in China and India can also be reflected in some other factors. In many cases, regression after normalization with the administrative area has a higher R2 value than the total regression. But it is highly influenced by a few highly developed regions like Beijing in China or Chandigarh in India. After the scale of the administrative region becomes fragmented, it is necessary to adjust the model to make the regression more meaningful. The relationship between NTL and carbon emissions shows obvious difference between China and India, and among provinces and counties in China, which may be caused by the different electric power generation and transmission in China and India. From these results, we can know how the NTL is reflected by GDP and other factors in different situations, and then we can make some adjustments.


2022 ◽  
Vol 12 ◽  
Author(s):  
Daniel G. Bunis ◽  
Wanxin Wang ◽  
Júlia Vallvé-Juanico ◽  
Sahar Houshdaran ◽  
Sushmita Sen ◽  
...  

The uterine lining (endometrium) exhibits a pro-inflammatory phenotype in women with endometriosis, resulting in pain, infertility, and poor pregnancy outcomes. The full complement of cell types contributing to this phenotype has yet to be identified, as most studies have focused on bulk tissue or select cell populations. Herein, through integrating whole-tissue deconvolution and single-cell RNAseq, we comprehensively characterized immune and nonimmune cell types in the endometrium of women with or without disease and their dynamic changes across the menstrual cycle. We designed metrics to evaluate specificity of deconvolution signatures that resulted in single-cell identification of 13 novel signatures for immune cell subtypes in healthy endometrium. Guided by statistical metrics, we identified contributions of endometrial epithelial, endothelial, plasmacytoid dendritic cells, classical dendritic cells, monocytes, macrophages, and granulocytes to the endometrial pro-inflammatory phenotype, underscoring roles for nonimmune as well as immune cells to the dysfunctionality of this tissue.


2021 ◽  
Vol 6 (2) ◽  
pp. 140-145
Author(s):  
Mykola Maksymiv ◽  
◽  
Taras Rak

Contrast enhancement is a technique for increasing the contrast of an image to obtain better image quality. As many existing contrast enhancement algorithms typically add too much contrast to an image, maintaining visual quality should be considered as a part of enhancing image contrast. This paper focuses on a contrast enhancement method that is based on histogram transformations to improve contrast and uses image quality assessment to automatically select the optimal target histogram. Improvements in contrast and preservation of visual quality are taken into account in the target histogram, so this method avoids the problem of excessive increase in contrast. In the proposed method, the optimal target histogram is the weighted sum of the original histogram, homogeneous histogram and Gaussian histogram. Structural and statistical metrics of “naturalness of the image” are used to determine the weights of the corresponding histograms. Contrast images are obtained by matching the optimal target histogram. Experiments show that the proposed method gives better results compared to other existing algorithms for increasing contrast based on the transformation of histograms.


2021 ◽  
Vol 25 (2) ◽  
pp. 478-506
Author(s):  
Salvador Pons Bordería ◽  
Elena Pascual Aliaga

As databases make Corpus Linguistics a common tool for most linguists, corpus annotation becomes an increasingly important process. Corpus users do not need only raw data, but also annotated data, submitted to tagging or parsing processes through annotation protocols. One problem with corpus annotation lies in its reliability, that is, in the probability that its results can be replicable by independent researchers. Inter-annotation agreement (IAA) is the process which evaluates the probability that, applying the same protocol, different annotators reach similar results. To measure agreement, different statistical metrics are used. This study applies IAA for the first time to the Valencia Espaol Coloquial (Val.Es.Co.) discourse segmentation model, designed for segmenting and labelling spoken language into discourse units. Whereas most IAA studies merely label a set of in advance pre-defined units, this study applies IAA to the Val.Es.Co. protocol, which involves a more complex two-fold process: first, the speech continuum needs to be divided into units; second, the units have to be labelled. Kripendorffs u -family statistical metrics (Krippendorff et al. 2016) allow measuring IAA in both segmentation and labelling tasks. Three expert annotators segmented a spontaneous conversation into subacts, the minimal discursive unit of the Val.Es.Co. model, and labelled the resulting units according to a set of 10 subact categories. Kripendorffs u coefficients were applied in several rounds to elucidate whether the inclusion of a bigger number of categories and their distinction had an impact on the agreement results. The conclusions show high levels of IAA, especially in the annotation of procedural subact categories, where results reach coefficients over 0.8. This study validates the Val.Es.Co. model as an optimal method to fully analyze a conversation into pragmatically-based discourse units.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260847
Author(s):  
Van Quan Tran ◽  
Hai-Van Thi Mai ◽  
Thuy-Anh Nguyen ◽  
Hai-Bang Ly

An extensive simulation program is used in this study to discover the best ANN model for predicting the compressive strength of concrete containing Ground Granulated Blast Furnace Slag (GGBFS). To accomplish this purpose, an experimental database of 595 samples is compiled from the literature and utilized to find the best ANN architecture. The cement content, water content, coarse aggregate content, fine aggregate content, GGBFS content, carboxylic type hyper plasticizing content, superplasticizer content, and testing age are the eight inputs in this database. As a result, the optimal selection of the ANN design is carried out and evaluated using conventional statistical metrics. The results demonstrate that utilizing the best architecture [8–14–4–1] among the 240 investigated architectures, and the best ANN model, is a very efficient predictor of the compressive strength of concrete using GGBFS, with a maximum R2 value of 0.968 on the training part and 0.965 on the testing part. Furthermore, a sensitivity analysis is performed over 500 Monte Carlo simulations using the best ANN model to determine the reliability of ANN model in predicting the compressive strength of concrete. The findings of this research may make it easier and more efficient to apply the ANN model to many civil engineering challenges.


2021 ◽  
Vol 24 (4) ◽  
pp. 1-34
Author(s):  
Simon Birnbach ◽  
Richard Baker ◽  
Simon Eberz ◽  
Ivan Martinovic

Drones are becoming increasingly popular for hobbyists and recreational use. But with this surge in popularity comes increased risk to privacy as the technology makes it easy to spy on people in otherwise-private environments, such as an individual’s home. An attacker can fly a drone over fences and walls to observe the inside of a house, without having physical access. Existing drone detection systems require specialist hardware and expensive deployment efforts, making them inaccessible to the general public. In this work, we present a drone detection system that requires minimal prior configuration and uses inexpensive commercial off-the-shelf hardware to detect drones that are carrying out privacy invasion attacks. We use a model of the attack structure to derive statistical metrics for movement and proximity that are then applied to received communications between a drone and its controller. We test our system in real-world experiments with two popular consumer drone models mounting privacy invasion attacks using a range of flight patterns. We are able both to detect the presence of a drone and to identify which phase of the privacy attack was in progress while being resistant to false positives from other mobile transmitters. For line-of-sight approaches using our kurtosis-based method, we are able to detect all drones at a distance of 6 m, with the majority of approaches detected at 25 m or farther from the target window without suffering false positives for stationary or mobile non-drone transmitters.


2021 ◽  
Vol 9 (11) ◽  
pp. 232596712110509
Author(s):  
Ayoosh Pareek ◽  
Chad W. Parkes ◽  
Alexey A. Leontovich ◽  
Aaron J. Krych ◽  
Stan Conte ◽  
...  

Background: Basic pitcher statistics have been used to assess performance in pitchers after injury or surgery without being validated. Even among healthy pitchers, the normal variability of these parameters has not yet been established. Purpose: To determine (1) the normal variability of basic and advanced pitcher statistics in healthy professional baseball pitchers and (2) the minimum pitches needed to predict these parameters. Study Design: Cross-sectional study; Level of evidence, 3. Methods: Publicly available data from the MLB Statcast and PITCHf/x databases were used to analyze MLB pitchers during the 2015 and 2016 seasons who recorded a minimum of 100 innings without injury. Basic and advanced baseball pitcher statistics were analyzed. The variability of each parameter was assessed by computing the coefficient of variation (CV) between individual pitchers and across all pitchers. A CV <10 was indicative of a relatively constant parameter, and parameters with a CV >10 were generally considered inconsistent and unreliable. The minimum number of pitches needed to be followed for each variable was also analyzed. Results: A total of 118 pitchers, 55 baseball-specific statistical metrics (38 basic and 17 advanced), and 7.5 million pitches were included and analyzed. Of the 38 basic pitcher statistics, only fastball velocity demonstrated a CV <10 (CV = 1.5), while 6 of 17 (35%) advanced metrics demonstrated acceptable consistency (CV <10). Release position from plate and velocity from the plate were the 2 most consistent advanced parameters. When separated by pitch type, these 2 parameters were the most constant (lowest CV) across every pitch type. Conclusion: We recommend against utilizing nonvalidated statistical measures to assess performance after injury, as they demonstrated unacceptably high variability even among healthy, noninjured professional baseball pitchers. It is our hope that this study will serve as the foundation for the identification and implementation of validated pitcher-dependent statistical measures that can be used to assess return-to-play performance after injury in the future.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S41-S41
Author(s):  
Courtney Moc ◽  
William Shropshire ◽  
Patrick McDaneld ◽  
Samuel A Shelburne ◽  
Samuel L Aitken ◽  
...  

Abstract Background There are several clinical tools that have been developed to predict the likelihood of extended-spectrum β-lactamase producing Enterobacterales; however, the creation of these tools included few patients with cancer or otherwise immunosuppressed. The objectives of this retrospective cohort study were to develop a decision tree and traditional risk score to predict ceftriaxone resistance in cancer patients with Escherichia coli (E. coli) bacteremia as well as to compare the predictive accuracy between the tools. Methods Adults age ≥ 18 years old with E. coli bacteremia at The University of Texas MD Anderson Cancer Center from 1/2018 to 12/2019 were included. Isolates recovered within 1 week from the same patient were excluded. The decision tree was constructed using classification and regression tree analysis, with a minimum node size of 10. The risk score was created using a multivariable logistic regression model derived by using stepwise variable selection with backward elimination at level 0.2. The decision tree and risk score statistical metrics were compared. Results A total of 629 E. coli isolates were screened, of which 580 isolates met criteria. Ceftriaxone-resistant (CRO-R) E. coli accounted for 36% of isolates. The machine learning-derived decision tree included 5 predictors whereas the logistic regression-derived risk score included 7 predictors. The risk score cutoff point of ≥ 5 points demonstrated the most optimized overall classification accuracy. The positive predictive value of the decision tree was higher than that of the risk score (88% vs 74%, respectively), but the area under the receiver operating characteristic curve and model accuracy of the risk score was higher than that of the decision tree (0.85 vs 0.73 and 82% vs 74%, respectively). Figure 1. Clinical Decision Tree Table 1. Regression Model and Assigned Points for Clinical Risk Score Table 2. Statistical Metrics of Clinical Decision Tree and Clinical Risk Score Conclusion The decision tree and risk score can be used to determine the likelihood of whether a cancer patient with E. coli bacteremia has a CRO-R infection. In both clinical tools, the strongest predictor was a history of CRO-R E. coli colonization or infection in the last 6 months. The decision tree was more user-friendly, has fewer variables, and has a better positive predictive value in comparison to the risk score. However, the risk score has a significantly better discrimination and model accuracy than that of the decision tree. Disclosures Samuel L. Aitken, PharmD, MPH, BCIDP, Melinta Therapeutoics (Individual(s) Involved: Self): Consultant, Grant/Research Support


2021 ◽  
Author(s):  
Guangyuan Li ◽  
Song Baobao ◽  
H. L Grimes ◽  
V. B. Surya Prasath ◽  
Nathan L Salomonis

Hundreds of bioinformatics approaches now exist to define cellular heterogeneity from single-cell genomics data. Reconciling conflicts between diverse methods, algorithm settings, annotations or modalities have the potential to clarify which populations are real and establish reusable reference atlases. Here, we present a customizable computational strategy called scTrianguate, which leverages cooperative game theory to intelligently mix-and-match clustering solutions from different resolutions, algorithms, reference atlases, or multi-modal measurements. This algorithm relies on a series of robust statistical metrics for cluster stability that work across molecular modalities to identify high-confidence integrated annotations. When applied to annotations from diverse competing cell atlas projects, this approach is able to resolve conflicts and determine the validity of controversial cell population predictions. Tested with scRNA-Seq, CITE-Seq (RNA + surface ADT), multiome (RNA + ATAC), and TEA-Seq (RNA + surface ADT + ATAC), this approach identifies highly stable and reproducible, known and novel cell populations, while excluding clusters defined by technical artifacts (i.e., doublets). Importantly, we find that distinct cell populations are frequently attributed with features from different modalities (RNA, ATAC, ADT) in the same assay, highlighting the importance of multimodal analysis in cluster determination. As it is flexible, this approach can be updated with new user-defined statistical metrics to alter the decision engine and customized to new measures of stability for different measures of cellular activity.


Sign in / Sign up

Export Citation Format

Share Document