skewed distribution
Recently Published Documents


TOTAL DOCUMENTS

455
(FIVE YEARS 191)

H-INDEX

29
(FIVE YEARS 4)

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 213
Author(s):  
Ghada Abdelmoumin ◽  
Jessica Whitaker ◽  
Danda B. Rawat ◽  
Abdul Rahman

An effective anomaly-based intelligent IDS (AN-Intel-IDS) must detect both known and unknown attacks. Hence, there is a need to train AN-Intel-IDS using dynamically generated, real-time data in an adversarial setting. Unfortunately, the public datasets available to train AN-Intel-IDS are ineluctably static, unrealistic, and prone to obsolescence. Further, the need to protect private data and conceal sensitive data features has limited data sharing, thus encouraging the use of synthetic data for training predictive and intrusion detection models. However, synthetic data can be unrealistic and potentially bias. On the other hand, real-time data are realistic and current; however, it is inherently imbalanced due to the uneven distribution of anomalous and non-anomalous examples. In general, non-anomalous or normal examples are more frequent than anomalous or attack examples, thus leading to skewed distribution. While imbalanced data are commonly predominant in intrusion detection applications, it can lead to inaccurate predictions and degraded performance. Furthermore, the lack of real-time data produces potentially biased models that are less effective in predicting unknown attacks. Therefore, training AN-Intel-IDS using imbalanced and adversarial learning is instrumental to their efficacy and high performance. This paper investigates imbalanced learning and adversarial learning for training AN-Intel-IDS using a qualitative study. It surveys and synthesizes generative-based data augmentation techniques for addressing the uneven data distribution and generative-based adversarial techniques for generating synthetic yet realistic data in an adversarial setting using rapid review, structured reporting, and subgroup analysis.


2022 ◽  
Vol 12 ◽  
Author(s):  
Cunguo Wang ◽  
Ivano Brunner ◽  
Junni Wang ◽  
Wei Guo ◽  
Zhenzhen Geng ◽  
...  

Trees can build fine-root systems with high variation in root size (e.g., fine-root diameter) and root number (e.g., branching pattern) to optimize belowground resource acquisition in forest ecosystems. Compared with leaves, which are visible above ground, information about the distribution and inequality of fine-root size and about key associations between fine-root size and number is still limited. We collected 27,573 first-order fine-roots growing out of 3,848 second-order fine-roots, covering 51 tree species in three temperate forests (Changbai Mountain, CBS; Xianrendong, XRD; and Maoershan, MES) in Northeastern China. We investigated the distribution and inequality of fine-root length, diameter and area (fine-root size), and their trade-off with fine-root branching intensity and ratio (fine-root number). Our results showed a strong right-skewed distribution in first-order fine-root size across various tree species. Unimodal frequency distributions were observed in all three of the sampled forests for first-order fine-root length and area and in CBS and XRD for first-order fine-root diameter, whereas a marked bimodal frequency distribution of first-order fine-root diameter appeared in MES. Moreover, XRD had the highest and MES had the lowest inequality values (Gini coefficients) in first-order fine-root diameter. First-order fine-root size showed a consistently linear decline with increasing root number. Our findings suggest a common right-skewed distribution with unimodality or bimodality of fine-root size and a generalized trade-off between fine-root size and number across the temperate tree species. Our results will greatly improve our thorough understanding of the belowground resource acquisition strategies of temperate trees and forests.


2022 ◽  
pp. 001316442110634
Author(s):  
Patrick D. Manapat ◽  
Michael C. Edwards

When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait (θ) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal θ. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed distribution where the construct is low for most people, medium for some, and high for few. Failure to account for nonnormality may compromise the validity of inferences and conclusions. Although corrections have been developed to account for nonnormality, these methods can be computationally intensive and have not yet been widely adopted. Previous research has recommended implementing nonnormality corrections when θ is not “approximately normal.” This research focused on examining how far θ can deviate from normal before the normality assumption becomes untenable. Specifically, our goal was to identify the type(s) and degree(s) of nonnormality that result in unacceptable parameter recovery for the graded response model (GRM) and 2-parameter logistic model (2PLM).


2022 ◽  
Vol 5 ◽  
Author(s):  
Wytze Marinus ◽  
Eva S. Thuijsman ◽  
Mark T. van Wijk ◽  
Katrien Descheemaeker ◽  
Gerrie W. J. van de Ven ◽  
...  

Smallholder farming in sub-Saharan Africa keeps many rural households trapped in a cycle of poor productivity and low incomes. Two options to reach a decent income include intensification of production and expansion of farm areas per household. In this study, we explore what is a “viable farm size,” i.e., the farm area that is required to attain a “living income,” which sustains a nutritious diet, housing, education and health care. We used survey data from three contrasting sites in the East African highlands—Nyando (Kenya), Rakai (Uganda), and Lushoto (Tanzania) to explore viable farm sizes in six scenarios. Starting from the baseline cropping system, we built scenarios by incrementally including intensified and re-configured cropping systems, income from livestock and off-farm sources. In the most conservative scenario (baseline cropping patterns and yields, minus basic input costs), viable farm areas were 3.6, 2.4, and 2.1 ha, for Nyando, Rakai, and Lushoto, respectively—whereas current median farm areas were just 0.8, 1.8, and 0.8 ha. Given the skewed distribution of current farm areas, only few of the households in the study sites (0, 27, and 4% for Nyando, Rakai, and Lushoto, respectively) were able to attain a living income. Raising baseline yields to 50% of the water-limited yields strongly reduced the land area needed to achieve a viable farm size, and thereby enabled 92% of the households in Rakai and 70% of the households in Lushoto to attain a living income on their existing farm areas. By contrast, intensification of crop production alone was insufficient in Nyando, although including income from livestock enabled the majority of households (73%) to attain a living income with current farm areas. These scenarios show that increasing farm area and/or intensifying production is required for smallholder farmers to attain a living income from farming. Obviously such changes would require considerable capital and labor investment, as well as land reform and alternative off-farm employment options for those who exit farming.


Energies ◽  
2022 ◽  
Vol 15 (1) ◽  
pp. 325
Author(s):  
Wei Guo ◽  
Xiaowei Zhang ◽  
Lixia Kang ◽  
Jinliang Gao ◽  
Yuyang Liu

Due to the complex microscope pore structure of shale, large-scale hydraulic fracturing is required to achieve effective development, resulting in a very complicated fracturing fluid flowback characteristics. The flowback volume is time-dependent, whereas other relevant parameters, such as the permeability, porosity, and fracture half-length, are static. Thus, it is very difficult to build an end-to-end model to predict the time-dependent flowback curves using static parameters from a machine learning perspective. In order to simplify the time-dependent flowback curve into simple parameters and serve as the target parameter of big data analysis and flowback influencing factor analysis, this paper abstracted the flowback curve into two characteristic parameters, the daily flowback volume coefficient and the flowback decreasing coefficient, based on the analytical solution of the seepage equation of multistage fractured horizontal Wells. Taking the dynamic flowback data of 214 shale gas horizontal wells in Weiyuan shale gas block as a study case, the characteristic parameters of the flowback curves were obtained by exponential curve fittings. The analysis results showed that there is a positive correlation between the characteristic parameters which present the characteristics of right-skewed distribution. The calculation formula of the characteristic flowback coefficient representing the flowback potential was established. The correlations between characteristic flowback coefficient and geological and engineering parameters of 214 horizontal wells were studied by spearman correlation coefficient analysis method. The results showed that the characteristic flowback coefficient has a negative correlation with the thickness × drilling length of the high-quality reservoir, the fracturing stage interval, the number of fracturing stages, and the brittle minerals content. Through the method established in this paper, the shale gas flowback curve containing complex flow mechanism can be abstracted into simple characteristic parameters and characteristic coefficients, and the relationship between static data and dynamic data is established, which can help to establish a machine learning method for predicting the flowback curve of shale gas horizontal wells.


Energies ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 212
Author(s):  
Ajit Kumar ◽  
Neetesh Saxena ◽  
Souhwan Jung ◽  
Bong Jun Choi

Critical infrastructures have recently been integrated with digital controls to support intelligent decision making. Although this integration provides various benefits and improvements, it also exposes the system to new cyberattacks. In particular, the injection of false data and commands into communication is one of the most common and fatal cyberattacks in critical infrastructures. Hence, in this paper, we investigate the effectiveness of machine-learning algorithms in detecting False Data Injection Attacks (FDIAs). In particular, we focus on two of the most widely used critical infrastructures, namely power systems and water treatment plants. This study focuses on tackling two key technical issues: (1) finding the set of best features under a different combination of techniques and (2) resolving the class imbalance problem using oversampling methods. We evaluate the performance of each algorithm in terms of time complexity and detection accuracy to meet the time-critical requirements of critical infrastructures. Moreover, we address the inherent skewed distribution problem and the data imbalance problem commonly found in many critical infrastructure datasets. Our results show that the considered minority oversampling techniques can improve the Area Under Curve (AUC) of GradientBoosting, AdaBoost, and kNN by 10–12%.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 5
Author(s):  
Tianjian Yu ◽  
Fan Gao ◽  
Xinyuan Liu ◽  
Jinjun Tang

Spatial autocorrelation and skewed distribution are the most frequent issues in crash rate modelling analysis. Previous studies commonly focus on the spatial autocorrelation between adjacent regions or the relationships between crash rate and potentially risky factors across different quantiles of crash rate distribution, but rarely both. To overcome the research gap, this study utilizes the spatial autoregressive quantile (SARQ) model to estimate how contributing factors influence the total and fatal-plus-injury crash rates and how modelling relationships change across the distribution of crash rates considering the effects of spatial autocorrelation. Three types of explanatory variables, i.e., demographic, traffic networks and volumes, and land-use patterns, were considered. Using data collected in New York City from 2017 to 2019, the results show that: (1) the SARQ model outperforms the traditional quantile regression model in prediction and fitting performance; (2) the effects of variables vary with the quantiles, mainly classifying three types: increasing, unchanged, and U-shaped; (3) at the high tail of crash rate distribution, the effects commonly have sudden increases/decrease. The findings are expected to provide strategies for reducing the crash rate and improving road traffic safety.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Javeria Khaleeq ◽  
Muhammad Amanullah ◽  
Zahra Almaspoor

Dealing with the biological data, the skewed distribution is approximated by the Log-Normal Regression model (LNRM). Traditional estimation techniques for the LNRM are sensitive to unusual observations. These observations greatly affect the model analysis, which makes imprecise conclusions. To overcome this issue, we proposed to develop diagnostics measures based on local influence diagnostics to identify such curious observations in the LNRM under censoring. The proposed measures are derived by perturbing the case weight, response, and explanatory variables. Furthermore, we also consider the One-Step Newton-Raphson method and generalized cook’s distance. We study the Monte Carlo simulation and its application to real data to illustrate the developed approaches.


Mathematics ◽  
2021 ◽  
Vol 9 (24) ◽  
pp. 3227
Author(s):  
Franko Hržić ◽  
Michael Janisch ◽  
Ivan Štajduhar ◽  
Jonatan Lerga ◽  
Erich Sorantin ◽  
...  

In clinical practice, fracture age estimation is commonly required, particularly in children with suspected non-accidental injuries. It is usually done by radiologically examining the injured body part and analyzing several indicators of fracture healing such as osteopenia, periosteal reaction, and fracture gap width. However, age-related changes in healing timeframes, inter-individual variabilities in bone density, and significant intra- and inter-operator subjectivity all limit the validity of these radiological clues. To address these issues, for the first time, we suggest an automated neural network-based system for determining the age of a pediatric wrist fracture. In this study, we propose and evaluate a deep learning approach for automatically estimating fracture age. Our dataset included 3570 medical cases with a skewed distribution toward initial consultations. Each medical case includes a lateral and anteroposterior projection of a wrist fracture, as well as patients’ age, and gender. We propose a neural network-based system with Monte-Carlo dropout-based uncertainty estimation to address dataset skewness. Furthermore, this research examines how each component of the system contributes to the final forecast and provides an interpretation of different scenarios in system predictions in terms of their uncertainty. The examination of the proposed systems’ components showed that the feature-fusion of all available data is necessary to obtain good results. Also, proposing uncertainty estimation in the system increased accuracy and F1-score to a final 0.906±0.011 on a given task.


2021 ◽  
Vol 24 (4) ◽  
pp. 370-381
Author(s):  
Camillo Cammarota

The random sequence of inter-event times of a level-crossing is a statistical tool that can be used to investigate time series from complex phenomena. Typical features of observed series as the skewed distribution and long range correlations are modeled using non linear transformations applied to Gaussian ARMA processes. We investigate the distribution of the inter-event times of the level-crossing events in ARMA processes in function of the probability corresponding to the level. For Gaussian ARMA processes we establish a representation of this indicator, prove its symmetry and that it is invariant with respect to the application of a non linear monotonic transformation. Using simulated series we provide evidence that the symmetry disappears if a non monotonic transformation is applied to an ARMA process. We estimate this indicator in wind speed time series obtained from three different databases. Data analysis provides evidence that the indicator is non symmetric, suggesting that only highly non linear transformations of ARMA processes can be used in modeling. We discuss the possible use of the inter-event times in the prediction task.


Sign in / Sign up

Export Citation Format

Share Document