scholarly journals Data-driven prediction of mean wind turbulence from topographic data

2021 ◽  
Vol 1201 (1) ◽  
pp. 012005
Author(s):  
B Morais da Costa ◽  
J Þ Snæbjörnsson ◽  
O A Øiseth ◽  
J Wang ◽  
J B Jakobsen

Abstract This study presents a data-driven model to predict mean turbulence intensities at desired generic locations, for all wind directions. The model, a multilayer perceptron, requires only information about the local topography and a historical dataset of wind measurements and topography at other locations. Five years of data from six different wind measurement mast locations were used. A k-fold cross-validation evaluated the model at each location, where four locations were used for the training data, another location was used for validation, and the remaining one to test the model. The model outperformed the approach given in the European standard, for both performance metrics used. The results of different hyperparameter optimizations are presented, allowing for uncertainty estimates of the model performances.

Author(s):  
Chao Hu ◽  
Byeng D. Youn ◽  
Pingfeng Wang

The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust, i.e., it may be less accurate when the real data acquired after the deployment differs from the testing data; (ii) it wastes the resources for constructing the algorithms that are discarded in the deployment; (iii) it requires the testing data in addition to the training data, which increases the overall expenses for the algorithm selection. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely, the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms for data-driven prognostics. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. Two case studies were employed to demonstrate the effectiveness of the proposed prognostic approach. The results suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Yanjuan Li ◽  
Zitong Zhang ◽  
Zhixia Teng ◽  
Xiaoyan Liu

Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer’s disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.


2020 ◽  
Author(s):  
Hylke Beck ◽  
Seth Westra ◽  
Eric Wood

<p>We introduce a unique set of global observation-based climatologies of daily precipitation (<em>P</em>) occurrence (related to the lower tail of the <em>P</em> distribution) and peak intensity (related to the upper tail of the <em>P</em> distribution). The climatologies were produced using Random Forest (RF) regression models trained with an unprecedented collection of daily <em>P</em> observations from 93,138 stations worldwide. Five-fold cross-validation was used to evaluate the generalizability of the approach and to quantify uncertainty globally. The RF models were found to provide highly satisfactory performance, yielding cross-validation coefficient of determination (<em>R</em><sup>2</sup>) values from 0.74 for the 15-year return-period daily <em>P</em> intensity to 0.86 for the >0.5 mm d<sup>-1</sup> daily <em>P</em> occurrence. The performance of the RF models was consistently superior to that of state-of-the-art reanalysis (ERA5) and satellite (IMERG) products. The highest <em>P</em> intensities over land were found along the western equatorial coast of Africa, in India, and along coastal areas of Southeast Asia. Using a 0.5 mm d<sup>-1</sup> threshold, <em>P</em> was estimated to occur 23.2 % of days on average over the global land surface (excluding Antarctica). The climatologies including uncertainty estimates will be released as the Precipitation DISTribution (PDIST) dataset via www.gloh2o.org/pdist. We expect the dataset to be useful for numerous purposes, such as the evaluation of climate models, the bias correction of gridded <em>P</em> datasets, and the design of hydraulic structures in poorly gauged regions.</p>


Repositor ◽  
2020 ◽  
Vol 2 (8) ◽  
Author(s):  
Nabillah Annisa Rahmayanti ◽  
Yufis Azhar ◽  
Gita Indah Marthasari

AbstrakBullying sering terjadi pada anak-anak khususnya remaja dan meresahkan para orang tua. Maraknya kasus bullying di negeri ini bahkan sampai menyebabkan korban jiwa. Hal ini dapat dicegah dengan cara mengetahui gejala-gejala seorang anak yang mengalami bullying. Kondisi seorang anak yang tidak dapat mengungkapkan keluh kesahnya, tentu membuat orang tua dan juga guru di sekolah sukar dalam mengerti apa yang sedang menimpanya. Hal tersebut bisa saja dikarenakan anak sedang mengalami tindakan bullying oleh teman-temannya. Oleh karena itu peneliti memiliki tujuan untuk menghasilkan fitur yang telah terseleksi dengan menggunakan algoritma C5.0. Sehingga dengan menggunakan fitur yang telah terseleksi dapat meringankan pekerjaan dalam mengisi kuisioner dan juga mempersingkat waktu dalam menentukan seorang anak apakah terkena bullying atau tidak berdasarkan gejala yang ada di setiap pertanyaan pada kuisioner. Untuk menunjang data dalam penelitian ini, peneliti menggunakan kuisioner untuk mendapatkan jawaban dari pertanyaan yang berisi tentang gejala anak yang menjadi korban bullying. Jawaban dari responden akan diolah menjadi kumpulan data yang nantinya akan dibagi menjadi data latih dan data uji untuk selanjutnya diteliti dengan menggunakan Algoritma C5.0. Metode evaluasi yang digunakan pada penelitian ini yaitu 10 fold cross validation dan untuk menilai akurasi menggunakan confusion matrix. Penelitian ini juga melaukan perbandingan dengan beberapa algoritma klasifikasi lainnya yaitu Naive Bayes dan KNN yang bertujuan untuk melhat seberapa akurat algoritma C5.0 dalam melakukan seleksi fitur. Hasil pengujian menunjukkan bahwa algoritma C5.0 mampu melakukan seleksi fitur dan juga memiliki tingkat akurasi yang lebih baik jika dibandingkan dengan algoritma Naive Bayes dan KNN dengan hasil akurasi sebelum menggunakan seleksi fitur sebesar 92,77% dan setelah menggunakan seleksi fitur sebesar 93,33%. Abstract Bullying often occurs in children, especially teenagers and unsettles parents. The rise of cases of bullying in this country even caused casualties. This can be prevented by knowing the symptoms of a child who has bullying. The condition of a child who cannot express his complaints, certainly makes parents and teachers at school difficult to understand what is happening to them. This could be because the child is experiencing bullying by his friends. Therefore, researchers have a goal to produce selected features using the C5.0 algorithm. So using the selected features can ease the work in filling out questionnaires and also shorten the time in determining whether a child is exposed to bullying or not based on the symptoms in each question in the questionnaire. To support the data in this study, the researcher used a questionnaire to get answers to questions that contained the symptoms of children who were victims of bullying. The answer from the respondent will be processed into a data collection which will later be divided into training data and test data for further research using the C5.0 Algorithm. The evaluation method used in this study is 10 fold cross validation and to assess accuracy using confusion matrix. This study also carried out a comparison with several other classification algorithms, namely Naive Bayes and KNN which aimed to see how accurate the C5.0 algorithm was in feature selection. The test results show that the C5.0 algorithm is capable of feature selection and also has a better accuracy compared to the Naive Bayes and KNN algorithms with accuracy results before using feature selection of 92.77% and after using feature selection of 93.33%


2018 ◽  
Vol 1 (2) ◽  
pp. 70-75
Author(s):  
Abdul Rozaq

Building materials is an important factor to built a house, to estimate funds the needs of build a house, consumers or developers can estimate the funds needed to build a house. To solve these problems use case base reasoning (CBR) approach, which method is capable of reasoning or solving the problem based on the cases that have been there as a solution to new problems. The system built in this study is a CBR system for determine the needs of house building materials. The consultation process is done by inserting new cases compared to the old case similarity value is then calculated using the nearest neighbor. The first test by inserting test data then compared with each type of home then obtained an accuracy of 83.6%. The second test is done by K-fold Cross Validation with K = 25 with the number of data 200, the data will be divided into two parts, namely the training data and test data, training data as many as 192 data and test data as many as 8 data. K-Fold Cross Validation method. This CBR system can produce an accuracy of 85.71%


2021 ◽  
Author(s):  
Elisabeth Pfaehler ◽  
Daniela Euba ◽  
Andreas Rinscheid ◽  
Otto S. Hoekstra ◽  
Josee Zijlstra ◽  
...  

Abstract Background Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations. Results In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.


2018 ◽  
Vol 232 ◽  
pp. 02026
Author(s):  
Lu Zhou ◽  
Guang-geng Li ◽  
Yu-mei Zhou ◽  
Dan Yin ◽  
Yan Sun ◽  
...  

In the study, we propose a TCM diagnosis model that can be used for multi-label classification and give clear diagnosis, as well as the basis for diagnosis and differentiation when the symptoms correspond to multiple diseases or syndromes. The implementation of the model is divided into three steps. Firstly, choose the machine learning algorithm to train the TCM diagnosis model. The features of the training data are symptoms and the labels are diseases or syndromes. Secondly, give the number α (α>1, α∈Z+), the model will output the diagnoses with the top α highest probability according to the input symptoms as candidate diagnoses. Finally, the rules of differential diagnosis are designed to determine which candidate diagnoses should be reserved, thereby complete the multi-label classification. In our test dataset, by 10-fold cross-validation, the average accuracy of the single label classification was 0.882; the average precision was 0.974; the average recall was 1.000; the average f1 score was 0.967; the average accuracy of the multi-label classification was 0.706; the average micro precision was 0.934; the average micro recall was 0.941 and the average hamming loss was 0.060. Through the test we can know that this model had a good potential for auxiliary decision making in clinical diagnosis and treatment.


2020 ◽  
Vol 37 (5) ◽  
pp. 1737-1756
Author(s):  
Zhen Yang ◽  
Kangning Song ◽  
Xingsheng Gu ◽  
Zhi Wang ◽  
Xiaoyi Liang

Purpose Nitrogen oxides (NOx) have been considered as primarily responsible for many serious environmental problems. Removing NO is the key task to remove NOx hazards. To clarify, NO removal process for pitch-based spherical-activated carbons (PSACs), an online prediction and optimization technique in real-time based on support vector machine algorithm in regression (support vector regression [SVR]) is discussed. The purpose of this paper is to develop a predictor and optimizer system on selective catalytic reduction of NO (SCRN) using experimental data and data-driven SVR intelligence methods. Design/methodology/approach Predictor and optimizer using developed SVR have been proposed. To modify the training efficiency of SVR, the authors especially customize batch normalization and k-fold cross-validation techniques according to the unique characteristics of PSACs model. Findings The results present that SVR provides a property regression model since it can linkage linear and non-linear process and property relationships in few experimental data sets. Also, the integrated normalization and k-fold cross-validation show a satisfying improvement and results for SVR optimization. The predicted results of predictor and optimizer in single and double factor systems are in excellent agreement with the experimental data. Originality/value SCRN-PO for predicting and optimization SCRN problems is developed by data-driven methods. The outperformed SCRN-PO system is used to predict multiple-factors property parameters and obtain optimum technological parameters in real-time. Also, experiment duration is greatly shortened.


2020 ◽  
Author(s):  
Young Jae Kim ◽  
Eun Young Yoo ◽  
Kwang Gi Kim

Abstract Background: The purpose of this study was to propose a deep learning-based method for automated detection of the pectoral muscle, in order to reduce misdetection in a computer-aided diagnosis (CAD) system for diagnosing breast cancer in mammography. This study also aimed to assess the performance of the deep learning method for pectoral muscle detection by comparing it to an image processing-based method using the random sample consensus (RANSAC) algorithm. Methods: Using the 322 images in the Mammographic Image Analysis Society (MIAS) database, the pectoral muscle detection model was trained with the U-Net architecture. Of the total data, 80% was allocated as training data and 20% was allocated as test data, and the performance of the deep learning model was tested by 5-fold cross validation. Results: The image processing-based method for pectoral muscle detection using RANSAC showed 92% detection accuracy. Using the 5-fold cross validation, the deep learning-based method showed a mean sensitivity of 95.55%, mean specificity of 99.88%, mean accuracy of 99.67%, and mean Dice similarity coefficient (DSC) of 95.88%. Conclusions: The proposed deep learning-based method of pectoral muscle detection performed better than an existing image processing-based method. In the future, by collecting data from various medical institutions and devices to further train the model and improve its reliability, we expect that this model could greatly reduce misdetection rates by CAD systems for breast cancer diagnosis.


Sentiment Analysis probes public opinion on user generated content on Web like blogs, social media or e-commerce websites. The results of Sentiment Analysis are getting much attention with marketers that they are able to evaluate the success of an advertising campaign or the attitude of people on a new product launch. Business owners and advertising companies are using Sentiment Analysis to start new business strategies and to identify opportunities for new product development. In this paper, with R programming, the tweets from Twitter about Samsung Galaxy mobile phone and Apple Iphone were retrieved from three countries namely USA, UK and India for creating the dataset. The collected tweets were classified into positive, negative and neutral sentiments. The machine learning classifier algorithms like Naïve Bayes, Support Vector Machine, Random Forest, Decision Tree, Artificial Neural Network, XGBoost with K Fold cross validation were applied on the dataset and the results were tabulated for comparing and estimating which classifier algorithm yields the best accuracy. Other performance metric values like F Score, Precision, Recall were also calculated for comparison of various classifier performances on Sentiment Analysis. It was found that XGBoost method combined with K Fold cross validation has produced the best accuracy in prediction. We have also applied SentiStrength algorithm to find out the intensity or the strength of positive and negative comments from each sentence. With the help of the results in hand, we were able to predict the brand of mobile phone that was preferred in each country.


Sign in / Sign up

Export Citation Format

Share Document