scholarly journals Coal Production Analysis using Machine Learning

Author(s):  
CN Sujatha

Coal will keep on giving a significant segment of energy prerequisites in the US for at any rate the following quite a few years. It is basic that exact data portraying the sum, area, and nature of the coal assets and stores be accessible to satisfy energy needs. It is likewise significant that the US separate its coal assets productively, securely, and in a naturally mindful way. A restored center around government support for coal-related examination, facilitated across offices and with the dynamic cooperation of the states and modern area, is a basic component for every one of these necessities. In this project we attempt to predict the coal production using various features given the data set. We attempt to implement regression algorithms and find the best algorithm along with fine tuning the parameters of the algorithm. The existing system uses the linear regression model one of the main issues with this basic linear regression is that it does not have a regularization parameter and hence overfits the data. The system also does not provide enough pre-processing and visualization or Exploratory Data Analysis (EDA). We aim to build advanced regression models like ridge and lasso regression and also fine tune the parameters of the model. These models would be trained on a data set which will be engineered carefully after performing the feature engineering.

Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Daniela C. Rodríguez ◽  
Diwakar Mohan ◽  
Caroline Mackenzie ◽  
Jess Wilhelm ◽  
Ezinne Eze-Ajoku ◽  
...  

Abstract Background In 2015 the US President’s Emergency Plan for AIDS Relief (PEPFAR) initiated its Geographic Prioritization (GP) process whereby it prioritized high burden areas within countries, with the goal of more rapidly achieving the UNAIDS 90–90-90 targets. In Kenya, PEPFAR designated over 400 health facilities in Northeastern Kenya to be transitioned to government support (known as central support (CS)). Methods We conducted a mixed methods evaluation exploring the effect of GP on health systems, and HIV and non-HIV service delivery in CS facilities. Quantitative data from a facility survey and health service delivery data were gathered and combined with data from two rounds of interviews and focus group discussions (FGDs) conducted at national and sub-national level to document the design and implementation of GP. The survey included 230 health facilities across 10 counties, and 59 interviews and 22 FGDs were conducted with government officials, health facility providers, patients, and civil society. Results We found that PEPFAR moved quickly from announcing the GP to implementation. Despite extensive conversations between the US government and the Government of Kenya, there was little consultation with sub-national actors even though the country had recently undergone a major devolution process. Survey and qualitative data identified a number of effects from GP, including discontinuation of certain services, declines in quality and access to HIV care, loss of training and financial incentives for health workers, and disruption of laboratory testing. Despite these reports, service coverage had not been greatly affected; however, clinician strikes in the post-transition period were potential confounders. Conclusions This study found similar effects to earlier research on transition and provides additional insights about internal country transitions, particularly in decentralized contexts. Aside from a need for longer planning periods and better communication and coordination, we raise concerns about transitions driven by epidemiological criteria without adaptation to the local context and their implication for priority-setting and HIV investments at the local level.


2021 ◽  
pp. 194016122110091
Author(s):  
Magdalena Wojcieszak ◽  
Ericka Menchen-Trevino ◽  
Joao F. F. Goncalves ◽  
Brian Weeks

The online environment dramatically expands the number of ways people can encounter news but there remain questions of whether these abundant opportunities facilitate news exposure diversity. This project examines key questions regarding how internet users arrive at news and what kinds of news they encounter. We account for a multiplicity of avenues to news online, some of which have never been analyzed: (1) direct access to news websites, (2) social networks, (3) news aggregators, (4) search engines, (5) webmail, and (6) hyperlinks in news. We examine the extent to which each avenue promotes news exposure and also exposes users to news sources that are left leaning, right leaning, and centrist. When combined with information on individual political leanings, we show the extent of dissimilar, centrist, or congenial exposure resulting from each avenue. We rely on web browsing history records from 636 social media users in the US paired with survey self-reports, a unique data set that allows us to examine both aggregate and individual-level exposure. Visits to news websites account for about 2 percent of the total number of visits to URLs and are unevenly distributed among users. The most widespread ways of accessing news are search engines and social media platforms (and hyperlinks within news sites once people arrive at news). The two former avenues also increase dissimilar news exposure, compared to accessing news directly, yet direct news access drives the highest proportion of centrist exposure.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Jian-ye Yuan ◽  
Xin-yuan Nan ◽  
Cheng-rong Li ◽  
Le-le Sun

Considering that the garbage classification is urgent, a 23-layer convolutional neural network (CNN) model is designed in this paper, with the emphasis on the real-time garbage classification, to solve the low accuracy of garbage classification and recycling and difficulty in manual recycling. Firstly, the depthwise separable convolution was used to reduce the Params of the model. Then, the attention mechanism was used to improve the accuracy of the garbage classification model. Finally, the model fine-tuning method was used to further improve the performance of the garbage classification model. Besides, we compared the model with classic image classification models including AlexNet, VGG16, and ResNet18 and lightweight classification models including MobileNetV2 and SuffleNetV2 and found that the model GAF_dense has a higher accuracy rate, fewer Params, and FLOPs. To further check the performance of the model, we tested the CIFAR-10 data set and found the accuracy rates of the model (GAF_dense) are 0.018 and 0.03 higher than ResNet18 and SufflenetV2, respectively. In the ImageNet data set, the accuracy rates of the model (GAF_dense) are 0.225 and 0.146 higher than Resnet18 and SufflenetV2, respectively. Therefore, the garbage classification model proposed in this paper is suitable for garbage classification and other classification tasks to protect the ecological environment, which can be applied to classification tasks such as environmental science, children’s education, and environmental protection.


2016 ◽  
Vol 311 (3) ◽  
pp. F539-F547 ◽  
Author(s):  
Minhtri K. Nguyen ◽  
Dai-Scott Nguyen ◽  
Minh-Kevin Nguyen

Because changes in the plasma water sodium concentration ([Na+]pw) are clinically due to changes in the mass balance of Na+, K+, and H2O, the analysis and treatment of the dysnatremias are dependent on the validity of the Edelman equation in defining the quantitative interrelationship between the [Na+]pw and the total exchangeable sodium (Nae), total exchangeable potassium (Ke), and total body water (TBW) (Edelman IS, Leibman J, O'Meara MP, Birkenfeld LW. J Clin Invest 37: 1236–1256, 1958): [Na+]pw = 1.11(Nae + Ke)/TBW − 25.6. The interrelationship between [Na+]pw and Nae, Ke, and TBW in the Edelman equation is empirically determined by accounting for measurement errors in all of these variables. In contrast, linear regression analysis of the same data set using [Na+]pw as the dependent variable yields the following equation: [Na+]pw = 0.93(Nae + Ke)/TBW + 1.37. Moreover, based on the study by Boling et al. (Boling EA, Lipkind JB. 18: 943–949, 1963), the [Na+]pw is related to the Nae, Ke, and TBW by the following linear regression equation: [Na+]pw = 0.487(Nae + Ke)/TBW + 71.54. The disparities between the slope and y-intercept of these three equations are unknown. In this mathematical analysis, we demonstrate that the disparities between the slope and y-intercept in these three equations can be explained by how the osmotically inactive Na+ and K+ storage pool is quantitatively accounted for. Our analysis also indicates that the osmotically inactive Na+ and K+ storage pool is dynamically regulated and that changes in the [Na+]pw can be predicted based on changes in the Nae, Ke, and TBW despite dynamic changes in the osmotically inactive Na+ and K+ storage pool.


1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lam Hoang Viet Le ◽  
Toan Luu Duc Huynh ◽  
Bryan S. Weber ◽  
Bao Khac Quoc Nguyen

PurposeThis paper aims to identify the disproportionate impacts of the COVID-19 pandemic on labor markets.Design/methodology/approachThe authors conduct a large-scale survey on 16,000 firms from 82 industries in Ho Chi Minh City, Vietnam, and analyze the data set by using different machine-learning methods.FindingsFirst, job loss and reduction in state-owned enterprises have been significantly larger than in other types of organizations. Second, employees of foreign direct investment enterprises suffer a significantly lower labor income than those of other groups. Third, the adverse effects of the COVID-19 pandemic on the labor market are heterogeneous across industries and geographies. Finally, firms with high revenue in 2019 are more likely to adopt preventive measures, including the reduction of labor forces. The authors also find a significant correlation between firms' revenue and labor reduction as traditional econometrics and machine-learning techniques suggest.Originality/valueThis study has two main policy implications. First, although government support through taxes has been provided, the authors highlight evidence that there may be some additional benefit from targeting firms that have characteristics associated with layoffs or other negative labor responses. Second, the authors provide information that shows which firm characteristics are associated with particular labor market responses such as layoffs, which may help target stimulus packages. Although the COVID-19 pandemic affects most industries and occupations, heterogeneous firm responses suggest that there could be several varieties of targeted policies-targeting firms that are likely to reduce labor forces or firms likely to face reduced revenue. In this paper, the authors outline several industries and firm characteristics which appear to more directly be reducing employee counts or having negative labor responses which may lead to more cost–effect stimulus.


2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
Janet Myhre ◽  
Daniel R. Jeske ◽  
Michael Rennie ◽  
Yingtao Bi

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.


Sign in / Sign up

Export Citation Format

Share Document