scholarly journals A Data Adaptive Model for Retail Sales of Electricity

Author(s):  
Johanna Marcelia

When fitting a model to a data set, the goal is to create a model that captures the trends present in the data. However, data often contains regions where the underlying model changes or exhibits shifts in certain parameters due to economic events. These locations in the data are known as changepoints, and ignoring them can result in high error and incorrect forecasts. By developing a specific cost function and optimizing using the genetic algorithm, we are able to locate and account for the changepoints in a given data set. We specifically apply this process to the retail sales of electricity in the United States by examining data sets from each state's residential, commercial, and industrial sectors. We demonstrate that, when changepoints are accounted for, model trends can be computed more accurately. We specifically explore this in the case of data sets that exhibit changepoints due to the 2020 (and ongoing) pandemic.

2018 ◽  
Vol 40 ◽  
pp. 06021
Author(s):  
David Abraham ◽  
Tate McAlpin ◽  
Keaton Jones

The movement of bed forms (sand dunes) in large sand-bed rivers is being used to determine the transport rate of bed load. The ISSDOTv2 (Integrated Section Surface Difference Over Time version 2) methodology uses time sequenced differences of measured bathymetric surfaces to compute the bed-load transport rate. The method was verified using flume studies [1]. In general, the method provides very consistent and repeatable results, and also shows very good fidelity with most other measurement techniques. Over the last 7 years we have measured, computed and compiled what we believe to be the most extensive data set anywhere of bed-load measurements on large, sand bed rivers. Most of the measurements have been taken on the Mississippi, Missouri, Ohio and Snake Rivers in the United States. For cases where multiple measurements were made at varying flow rates, bed-load rating curves have been produced. This paper will provide references for the methodology, but is intended more to discuss the measurements, the resulting data sets, and current and potential uses for the bed-load data.


2020 ◽  
Vol 7 (1) ◽  
pp. 163-180
Author(s):  
Saagar S Kulkarni ◽  
Kathryn E Lorenz

This paper examines two CDC data sets in order to provide a comprehensive overview and social implications of COVID-19 related deaths within the United States over the first eight months of 2020. By analyzing the first data set during this eight-month period with the variables of age, race, and individual states in the United States, we found correlations between COVID-19 deaths and these three variables. Overall, our multivariable regression model was found to be statistically significant.  When analyzing the second CDC data set, we used the same variables with one exception; gender was used in place of race. From this analysis, it was found that trends in age and individual states were significant. However, since gender was not found to be significant in predicting deaths, we concluded that, gender does not play a significant role in the prognosis of COVID-19 induced deaths. However, the age of an individual and his/her state of residence potentially play a significant role in determining life or death. Socio-economic analysis of the US population confirms Qualitative socio-economic Logic based Cascade Hypotheses (QLCH) of education, occupation, and income affecting race/ethnicity differently. For a given race/ethnicity, education drives occupation then income, where a person lives, and in turn his/her access to healthcare coverage. Considering socio-economic data based QLCH framework, we conclude that different races are poised for differing effects of COVID-19 and that Asians and Whites are in a stronger position to combat COVID-19 than Hispanics and Blacks.


Land ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 156
Author(s):  
Rafael Moreno-Sanchez ◽  
James Raines ◽  
Jay Diffendorfer ◽  
Mark Drummond ◽  
Jessica Manko

This paper presents a synopsis of the challenges and limitations presented by existing and emerging land use/land cover (LULC) digital data sets when used to analyze the extent, habitat quality, and LULC changes of the monarch (Danaus plexippus) migratory habitat across the United States of America (US) and Mexico. First, the characteristics, state of the knowledge, and issues related to this habitat are presented. Then, the characteristics of the existing and emerging LULC digital data sets with global or cross-border coverage are listed, followed by the data sets that cover only the US or Mexico. Later, we discuss the challenges for determining the extent, habitat quality, and LULC changes in the monarchs’ migratory habitat when using these LULC data sets in conjunction with the current state of the knowledge of the monarchs’ ecology, behavior, and foraging/roosting plants used during their migration. We point to approaches to address some of these challenges, which can be categorized into: (a) LULC data set characteristics and availability; (b) availability of ancillary land management information; (c) ability to construct accurate forage suitability indices for their migration habitat; and (d) level of knowledge of the ecological and behavioral patterns of the monarchs during their journey.


2020 ◽  
Vol 86 (3) ◽  
pp. 208-212
Author(s):  
Brianna Dowd ◽  
Irfan Khan ◽  
Dessy Boneva ◽  
Mark Mckenney ◽  
Adel Elkbuli

Gun-related injuries are a hotly debated sociopolitical topic in the United States. Annually, more than 33 million Americans seek heathcare services for mental health issues. These conditions are the leading cause of combined disability and death among women and the second highest among men. Our study's main objective was to identify cases of self-inflicted penetrating firearm injuries with reported pre-existing psychiatric conditions as defined in the 2013–2016 National Trauma Data Standard. The 2013–2016 Research Data Sets (RDSs) were reviewed. Cases were identified using the ICD-9 external cause codes 955–955.4, and ICD 10th Edition Clinical Modification external cause codes X72–X74. Odds ratios were calculated, and categorical data were analyzed by using the chi-squared test, with significance defined as P < 0.05. The 2013–2016 Research Data Set consists of 3,577,168 reported cases, with 15,535 observations of self-inflicted penetrating firearms injuries. Of those patients, 18.4 per cent had major psychiatric illnesses, 7.5 per cent had alcohol use disorder, 6.4 per cent had drug use disorder, and 0.6 per cent had dementia. An upward trend in the proportion of patients with major psychiatric illnesses was observed, from 15.5 per cent in 2013 to 18.6 per cent in 2016, peaking in 2015 at 20.9 per cent. Nearly one in three self-inflicted penetrating firearm injuries in the United States is associated with pre-existing behavioral health conditions. Advances in understanding the behavioral and social determinants leading to these conditions, and strategies to improve the diagnosis of mental illness and access to mental health care are required.


2000 ◽  
Vol 32 (3) ◽  
pp. 411-426 ◽  
Author(s):  
Matthew A Zook

This paper provides a description and analysis of the clustering behavior of the commercial Internet content industry in specific geographical locations within the United States. Using a data set of Internet domain name developed in the summer of 1998, I show that three regions—San Francisco, New York, and Los Angeles—are the leading centers for Internet content in the United States in terms both of absolute size and of degree of specialization. In order to understand better how the industrial structure of a region impacts the formation of the Internet content business, I provide an analysis of how the commercialization of the Internet has changed from 1993 to 1998 and explore the relationship between existing industrial sectors and the specialization in commercial domain names. Over time there appears to be a stronger connection between Internet content and information-intensive industries than between Internet content and the industries providing the computer and telecommunications technology necessary for the Internet to operate. Although it is not possible to assign a definitive causal explanation to the relationships outlined here, this paper provides a first step in theorizing about the overall commercialization process of the Internet.


2020 ◽  
Author(s):  
Xiaoqian Jiang ◽  
Lishan Yu ◽  
Hamisu M. Salihub ◽  
Deepa Dongarwar

BACKGROUND In the United States, State laws require birth certificates to be completed for all births; and federal law mandates national collection and publication of births and other vital statistics data. National Center for Health Statistics (NCHS) has published the key statistics of birth data over the years. These data files, from as early as the 1970s, have been released and made publicly available. There are about 3 million new births each year, and every birth is a record in the data set described by hundreds of variables. The total data cover more than half of the current US population, making it an invaluable resource to study and examine birth epidemiology. Using such big data, researchers can ask interesting questions and study longitudinal patterns, for example, the impact of mother's drinking status to infertility in metropolitans in the last decade, or the education level of the biological father to the c-sections over the years. However, existing published data sets cannot directly support these research questions as there are adjustments to the variables and their categories, which makes these individually published data files fragmented. The information contained in the published data files is highly diverse, containing hundreds of variables each year. Besides minor adjustments like renaming and increasing variable categories, some major updates significantly changed the fields of statistics (including removal, addition, and modification of the variables), making the published data disconnected and ambiguous to use over multiple years. Researchers have previously reconstructed features to study temporal patterns, but the scale is limited (focusing only on a few variables of interest). Many have reinvented the wheels, and such reconstructions lack consistency as different researchers might use different criteria to harmonize variables, leading to inconsistent findings and limiting the reproducibility of research. There is no systematic effort to combine about five decades of data files into a database that includes every variable that has ever been released by NCHS. OBJECTIVE To utilize machine learning techniques to combine the United States (US) natality data for the last five decades, with changing variables and factors, into a consistent database. METHODS We developed a feasible and efficient deep-learning-based framework to harmonize data sets of live births in the US from 1970 to 2018. We constructed a graph based on the property and elements of databases including variables and conducted a graph convolutional network (GCN) on the graph to learn the graph embeddings for nodes where the learned embeddings implied the similarity of variables. We devised a novel loss function with a slack margin and a banlist mechanism (for a random walk) to learn the desired structure (two nodes sharing more information were more similar to each other.). We developed an active learning mechanism to conduct the harmonization. RESULTS We harmonized historical US birth data and resolved conflicts in ambiguous terms. From a total of 9,321 variables (i.e., 783 stemmed variables, from 1970 to 2018) we applied our model iteratively together with human review, obtaining 323 hyperchains of variables. Hyperchains for harmonization were composed of 201 stemmed variable pairs when considering any pairs of different stemmed variables changed over years. During the harmonization, the first round of our model provided 305 candidates stemmed variable pairs (based on the top-20 most similar variables of each variable based on the learned embeddings of variables) and achieved recall and precision of 87.56%, 57.70%, respectively. CONCLUSIONS Our harmonized graph neural network (HGNN) method provides a feasible and efficient way to connect relevant databases at a meta-level. Adapting to databases' property and characteristics, HGNN can learn patterns and search relations globally, which is powerful to discover the similarity between variables among databases. Smart utilization of machine learning can significantly reduce the manual effort in database harmonization and integration of fragmented data into useful databases for future research.


2011 ◽  
Vol 9 (1-2) ◽  
pp. 58-69
Author(s):  
Marlene Kim

Asian Americans and Pacific Islanders (AAPIs) in the United States face problems of discrimination, the glass ceiling, and very high long-term unemployment rates. As a diverse population, although some Asian Americans are more successful than average, others, like those from Southeast Asia and Native Hawaiians and Pacific Islanders (NHPIs), work in low-paying jobs and suffer from high poverty rates, high unemployment rates, and low earnings. Collecting more detailed and additional data from employers, oversampling AAPIs in current data sets, making administrative data available to researchers, providing more resources for research on AAPIs, and enforcing nondiscrimination laws and affirmative action mandates would assist this population.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Richard Johnston ◽  
Xiaohan Yan ◽  
Tatiana M. Anderson ◽  
Edwin A. Mitchell

AbstractThe effect of altitude on the risk of sudden infant death syndrome (SIDS) has been reported previously, but with conflicting findings. We aimed to examine whether the risk of sudden unexpected infant death (SUID) varies with altitude in the United States. Data from the Centers for Disease Control and Prevention (CDC)’s Cohort Linked Birth/Infant Death Data Set for births between 2005 and 2010 were examined. County of birth was used to estimate altitude. Logistic regression and Generalized Additive Model (GAM) were used, adjusting for year, mother’s race, Hispanic origin, marital status, age, education and smoking, father’s age and race, number of prenatal visits, plurality, live birth order, and infant’s sex, birthweight and gestation. There were 25,305,778 live births over the 6-year study period. The total number of deaths from SUID in this period were 23,673 (rate = 0.94/1000 live births). In the logistic regression model there was a small, but statistically significant, increased risk of SUID associated with birth at > 8000 feet compared with < 6000 feet (aOR = 1.93; 95% CI 1.00–3.71). The GAM showed a similar increased risk over 8000 feet, but this was not statistically significant. Only 9245 (0.037%) of mothers gave birth at > 8000 feet during the study period and 10 deaths (0.042%) were attributed to SUID. The number of SUID deaths at this altitude in the United States is very small (10 deaths in 6 years).


1998 ◽  
Vol 27 (3) ◽  
pp. 351-369 ◽  
Author(s):  
MICHAEL NOBLE ◽  
SIN YI CHEUNG ◽  
GEORGE SMITH

This article briefly reviews American and British literature on welfare dynamics and examines the concepts of welfare dependency and ‘dependency culture’ with particular reference to lone parents. Using UK benefit data sets, the welfare dynamics of lone mothers are examined to explore the extent to which they inform the debates. Evidence from Housing Benefits data show that even over a relatively short time period, there is significant turnover in the benefits-dependent lone parent population with movement in and out of income support as well as movement into other family structures. Younger lone parents and owner-occupiers tend to leave the data set while older lone parents and council tenants are most likely to stay. Some owner-occupier lone parents may be relatively well off and on income support for a relatively short time between separation and a financial settlement being reached. They may also represent a more highly educated and highly skilled group with easier access to the labour market than renters. Any policy moves paralleling those in the United States to time limit benefit will disproportionately affect older lone parents.


2014 ◽  
Vol 7 (5) ◽  
pp. 2477-2484 ◽  
Author(s):  
J. C. Kathilankal ◽  
T. L. O'Halloran ◽  
A. Schmidt ◽  
C. V. Hanson ◽  
B. E. Law

Abstract. A semi-parametric PAR diffuse radiation model was developed using commonly measured climatic variables from 108 site-years of data from 17 AmeriFlux sites. The model has a logistic form and improves upon previous efforts using a larger data set and physically viable climate variables as predictors, including relative humidity, clearness index, surface albedo and solar elevation angle. Model performance was evaluated by comparison with a simple cubic polynomial model developed for the PAR spectral range. The logistic model outperformed the polynomial model with an improved coefficient of determination and slope relative to measured data (logistic: R2 = 0.76; slope = 0.76; cubic: R2 = 0.73; slope = 0.72), making this the most robust PAR-partitioning model for the United States currently available.


Sign in / Sign up

Export Citation Format

Share Document