A random forest method for constructing long-term time series of nighttime light in Central Asia

Author(s):  
Hui Chen ◽  
Yina Qiao ◽  
Hailong Liu
Data ◽  
2018 ◽  
Vol 4 (1) ◽  
pp. 5 ◽  
Author(s):  
Lyudmyla Kirichenko ◽  
Tamara Radivilova ◽  
Vitalii Bulakh

The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with the same fractal properties. Numerical experiments demonstrate that the best results are obtained by the random forest method with regression trees. A comparative analysis of the classification approaches, based on the random forest method, and traditional estimation of self-similarity degree are performed. The results show the advantage of machine learning methods over traditional time series evaluation. The results were used for detecting denial-of-service (DDoS) attacks and demonstrated a high probability of detection.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii85-ii85
Author(s):  
Ping Zhu ◽  
Xianglin Du ◽  
Yoshua Esquenazi ◽  
Jay-Jiguang Zhu

Abstract OBJECTIVES To investigate the long-term survival rates and the related predictors in patients with glioblastoma (GBM) using NCDB. METHODS A total of 51570 GBM patients were derived from the NCDB from 2004 to 2011. Three long-term survival measures were defined as patients who lived for at least 3-year, 5-year, or 10-year after diagnosis, respectively. Multivariable binary logistic regressions were performed to identify predictors in relation to 3-year, 5-year, and 10-year survival rates. The relative importance of each survival predictors was calculated, and random forest method was performed to validate the variable importance and decision tree as well. RESULTS A total of 4782 (9.3%), 1481 (3.9%), and 51 (0.9%) GBM patients survived at least 3-year, 5-year, and 10-years, respectively. Significant predictors related to both 3-year and 5-year survival rates from multivariable logistic regression included tumor resection, recent year of diagnosis, age < 65 years, private insurance, adjuvant therapy, non-whites, female, treatment at facility located in South regions or academic facilities, higher income, and non-comorbidity. Moreover, patients who traveled >50 miles for treatment and received care transition were significantly more likely to survival at least 3 years. However, only five predictors were associated with 10-year survivorship: residence-hospital distance >20 miles, non-whites, age < 65 years, resection, and higher income. Based on the calculations of relative importance and random forest method, the most important five factors to predict long-term survival were age, tumor resection, year of diagnosis, comorbidity, and adjuvant therapy (3-year survival); age, tumor resection, comorbidity, gender, and insurance (5-year survival); and age, race, residence-hospital distance, income, and comorbidity (10-year survival), respectively. CONCLUSIONS This study identifies non-molecular factors predicting long-term survivorship among GBM patients using NCDB dataset. Our findings suggested that 3-year and 5-year survivors share similar determinants, while 10-year survivors could be more different in socio-demographics and clinical features.


2021 ◽  
Author(s):  
Spatola Maria Floriana ◽  
Borghetti Marco ◽  
Rita Angelo ◽  
Nolè Angelo

<p>Wildfire disturbances severely modifies the ecosystem structure and natural regeneration processes. Predicting mid- to long-term post-fire vegetation recovery patterns, is pivotal to improve post-fire management and restoration of burned areas forest ecosystems management. Currently, many research efforts have been conducted, in order to monitor and predict wildfires, using Machine Learning and Remote sensing techniques. Instead, the method proposed in this study combines Satellite images and Data Mining algorithm to process data collected by time series and regional forest dataset to predict post-fire vegetation recovery patterns. For this reason, we analysed Normalized Burn Ratio (NBR) patterns from Landsat Time series (LTS), to assess post-fire vegetation recovery for several wildfires that occurred in three different forest Corine Land Cover classes (311, 312, 313) in the Basilicata region during the period 2005-2012. Random Forest model, was used to classify the observed recovery patterns and investigate the influence of burn severity, topographic variables, climate and spectral vegetation indices on post-fire recovery. Image acquisition and Random Forest classifier was undertaken in Google Earth Engine (GEE). Results of bootstrapping classification, across forest type, show high percentage for high recovered (HR) classes and moderate recovered (MR) classes and moderate-low percentage for low (LR) and unrecovered (UR) classes. Specifically, in the holm- and cork-oak and oak forests show medium to high recovery rates, while Mediterranean pine and conifer-oak forests show the slowest recovery rates. Different post-fire recovery patterns are related to fire severity, vegetation type and post-fire environmental conditions. Our methodology shows that post-fire recovery classification, using RF classifier provides a robust method for both local and broad scale monitoring for mid- to long-term recovery response.</p><p>Keywords: Wildfires, Post-fire recovery, Landsat Time Series (LTS), Google Earth Engine, Wildfires, Machine Learning, Random Forest.</p>


2020 ◽  
Vol 27 (6) ◽  
pp. 37-55
Author(s):  
E. V. Zarova ◽  
E. I. Dubravskaya

The topic of quantitative research on informal employment has a consistently high relevance both in the Russian Federation and in other countries due to its high dependence on cyclicality and crisis stages in economic dynamics of countries with any level of economic development. Developing effective government policy measures to overcome the negative impact of informal employment requires special attention in theoretical and applied research to assessing the factors and conditions of informal employment in the Russian Federation including at the regional level. Such effects of informal employment as a shortfall in taxes, potential losses in production efficiency, and negative social consequences are a concern for the authorities of the federal and regional levels. Development of quantitative indicators to determine the level of informal employment in the regions, taking into account their specifics in the general spatial and economic system of Russia are necessary to overcome these negative effects. The article proposes and tests methods for solving the problem of assessing the impact of hierarchical relationships on macroeconomic factors at the regional level of informal employment in constituent entities of the Russian Federation. Majority of the works on the study of informal employment are based on basic statistical methods of spatial-dynamic analysis, as well as on the now «traditional» methods of cluster and correlation-regression analysis. Without diminishing the merits of these methods, it should be noted that they are somewhat limited in identifying hidden structural connections and interdependencies in such a complex multidimensional phenomenon as informal employment. In order to substantiate the possibility of overcoming these limitations, the article proposes indicators of regional statistics that directly and indirectly characterize informal employment and also presents the possibilities of using the «random forest» method to identify groups of constituent entities of the Russian Federation that have similar macroeconomic factors of informal employment. The novelty of this method in terms of research objectives is that it allows one to assess the impact of macroeconomic indicators of regional development on the level of informal employment, taking into account the implicit, not predetermined by the initial hypotheses, hierarchical relationships of factor indicators. Based on the generalization of the studies presented in the literature, as well as the authors’ statistical calculations using Rosstat data, the authors came to the conclusion about the high importance of macroeconomic parameters of regional development and systemic relationships of macroeconomic indicators in substantiating the differentiation of the informal level across the constituent entities of the Russian Federation.


2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


2016 ◽  
Vol 9 (1) ◽  
pp. 53-62 ◽  
Author(s):  
R. D. García ◽  
O. E. García ◽  
E. Cuevas ◽  
V. E. Cachorro ◽  
A. Barreto ◽  
...  

Abstract. This paper presents the reconstruction of a 73-year time series of the aerosol optical depth (AOD) at 500 nm at the subtropical high-mountain Izaña Atmospheric Observatory (IZO) located in Tenerife (Canary Islands, Spain). For this purpose, we have combined AOD estimates from artificial neural networks (ANNs) from 1941 to 2001 and AOD measurements directly obtained with a Precision Filter Radiometer (PFR) between 2003 and 2013. The analysis is limited to summer months (July–August–September), when the largest aerosol load is observed at IZO (Saharan mineral dust particles). The ANN AOD time series has been comprehensively validated against coincident AOD measurements performed with a solar spectrometer Mark-I (1984–2009) and AERONET (AErosol RObotic NETwork) CIMEL photometers (2004–2009) at IZO, obtaining a rather good agreement on a daily basis: Pearson coefficient, R, of 0.97 between AERONET and ANN AOD, and 0.93 between Mark-I and ANN AOD estimates. In addition, we have analysed the long-term consistency between ANN AOD time series and long-term meteorological records identifying Saharan mineral dust events at IZO (synoptical observations and local wind records). Both analyses provide consistent results, with correlations  >  85 %. Therefore, we can conclude that the reconstructed AOD time series captures well the AOD variations and dust-laden Saharan air mass outbreaks on short-term and long-term timescales and, thus, it is suitable to be used in climate analysis.


Sign in / Sign up

Export Citation Format

Share Document