the ensemble method
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 23)

H-INDEX

6
(FIVE YEARS 2)

Author(s):  
Mohammad Sadegh Sheikhaei ◽  
Hasan Zafari ◽  
Yuan Tian

In this article, we propose a new encoding scheme for named entity recognition (NER) called Joined Type-Length encoding (JoinedTL). Unlike most existing named entity encoding schemes, which focus on flat entities, JoinedTL can label nested named entities in a single sequence. JoinedTL uses a packed encoding to represent both type and span of a named entity, which not only results in less tagged tokens compared to existing encoding schemes, but also enables it to support nested NER. We evaluate the effectiveness of JoinedTL for nested NER on three nested NER datasets: GENIA in English, GermEval in German, and PerNest, our newly created nested NER dataset in Persian. We apply CharLSTM+WordLSTM+CRF, a three-layer sequence tagging model on three datasets encoded using JoinedTL and two existing nested NE encoding schemes, i.e., JoinedBIO and JoinedBILOU. Our experiment results show that CharLSTM+WordLSTM+CRF trained with JoinedTL encoded datasets can achieve competitive F1 scores as the ones trained with datasets encoded by two other encodings, but with 27%–48% less tagged tokens. To leverage the power of three different encodings, i.e., JoinedTL, JoinedBIO, and JoinedBILOU, we propose an encoding-based ensemble method for nested NER. Evaluation results show that the ensemble method achieves higher F1 scores on all datasets than the three models each trained using one of the three encodings. By using nested NE encodings including JoinedTL with CharLSTM+WordLSTM+CRF, we establish new state-of-the-art performance with an F1 score of 83.7 on PerNest, 74.9 on GENIA, and 70.5 on GermEval, surpassing two recent neural models specially designed for nested NER.


2021 ◽  
Vol 5 (6) ◽  
pp. 1207-1215
Author(s):  
Ulfah Nur Oktaviana ◽  
Yufis Azhar

Garbage is a big problem for the sustainability of the environment, economy, and society, where the demand for waste increases along with the growth of society and its needs. Where in 2019 Indonesia was able to produce 66-67 million tons of waste, which is an increase from the previous year of 2 to 3 million tons of waste. Waste management efforts have been carried out by the government, including by making waste sorting regulations. This sorting is known as 3R (reduce, reuse, recycle), but most people do not sort their waste properly. In this study, a model was developed that can sort out 6 types of waste including: cardboard, glass, metal, paper, plastic, trash. The model was built using the transfer learning method with a pretrained model DenseNet169. Where the optimal results are shown for the classes that have been oversampling previously with an accuracy of 91%, an increase of 1% compared to the model that has an unbalanced data distribution. The next model optimization is done by applying the ensemble method to the four models that have been oversampled on the training dataset with the same architecture. This method shows an increase of 3% to 5%  while the final accuracy on the test of dataset is 96%.


Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3066
Author(s):  
Do-Yeon Hwang ◽  
Seok-Hwan Choi ◽  
Jinmyeong Shin ◽  
Moonkyu Kim ◽  
Yoon-Ho Choi

In this paper, we propose a new deep learning-based image translation method to predict and generate images after hair transplant surgery from images before hair transplant surgery. Since existing image translation models use a naive strategy that trains the whole distribution of translation, the image translation models using the original image as the input data result in converting not only the hair transplant surgery region, which is the region of interest (ROI) for image translation, but also the other image regions, which are not the ROI. To solve this problem, we proposed a novel generative adversarial network (GAN)-based ROI image translation method, which converts only the ROI and retains the image for the non-ROI. Specifically, by performing image translation and image segmentation independently, the proposed method generates predictive images from the distribution of images after hair transplant surgery and specifies the ROI to be used for generated images. In addition, by applying the ensemble method to image segmentation, we propose a more robust method through complementing the shortages of various image segmentation models. From the experimental results using a real medical image dataset, e.g., 1394 images before hair transplantation and 896 images after hair transplantation, to train the GAN model, we show that the proposed GAN-based ROI image translation method performed better than the other GAN-based image translation methods, e.g., by 23% in SSIM (Structural Similarity Index Measure), 452% in IoU (Intersection over Union), and 42% in FID (Frechet Inception Distance), on average. Furthermore, the ensemble method that we propose not only improves ROI detection performance but also shows consistent performances in generating better predictive images from preoperative images taken from diverse angles.


Author(s):  
Amri Muhaimin ◽  
Prismahardi Aji Riyantoko ◽  
Hendri Prabowo ◽  
Trimono Trimono

Intermittent dataset is a unique data that will be challenging to forecast. Because the data is containing a lot of zeros. The kind of intermittent data can be sales data and rainfall data. Because both sometimes no data recorded in a certain period. In this research, the model is created to overcome the problem. The approach that is used in this research is the ensemble method. Mostly the intermittent data comes from the Negative Binomial because the variance is over the mean. We use two datasets, which are rainfall and sales data. So, our approach is creating the base model from the time series regression with Negative Binomial based, and then we augmented the base model with a tree-based model which is random forest. Furthermore, we compare the result with the benchmark method which is The Croston method and Single Exponential Smoothing (SES). As the result, our approach can overcome the benchmark based on metric value by 1.79 and 7.18.


2021 ◽  
Author(s):  
Muhammad Ali Fauzi ◽  
Bian Yang

High stress levels among hospital workers could be harmful to both workers and the institution. Enabling the workers to monitor their stress level has many advantages. Knowing their own stress level can help them to stay aware and feel more in control of their response to situations and know when it is time to relax or take some actions to treat it properly. This monitoring task can be enabled by using wearable devices to measure physiological responses related to stress. In this work, we propose a smartwatch sensors based continuous stress detection method using some individual classifiers and classifier ensembles. The experiment results show that all of the classifiers work quite well to detect stress with an accuracy of more than 70%. The results also show that the ensemble method obtained higher accuracy and F1-measure compared to all of the individual classifiers. The best accuracy was obtained by the ensemble with soft voting strategy (ES) with 87.10% while the hard voting strategy (EH) achieved the best F1-measure with 77.45%.


2021 ◽  
Author(s):  
Rushad Ravilievich Rakhimov ◽  
Oleg Valerievich Zhdaneev ◽  
Konstantin Nikolaevich Frolov ◽  
Maxim Pavlovich Babich

Abstract The ultimate objective of this paper is to describe the experience of using a machine learning model prepared by the ensemble method to prevent stuck pipe events during well construction process on extended reach wells. The tasks performed include collecting, analyzing and cleaning historical data, selecting and preparing a machine learning model, testing it on real-time data by means of desktop application. The idea is to display the solution at the rig floor, allowing Driller to quickly take actions for prevention of stuck pipe event. Historical data mining and analysis were performed using software for remote monitoring. Preparation, labelling and cleaning of historical and real-time data were executed using programmable scripts and big data techniques. The machine learning algorithm was developed using the ensemble method, which allows to combine several models to improve the final result. On the field of interest, the most common type of stuck pipe are solids induced pack offs. They occur due to insufficient hole cleaning from drilled cuttings and wellbore collapse due to rocks instability. Stuck pipe prevention on extended reach drilling (ERD) wells requires holistic approach meanwhile final role is assigned to the driller. Due to continuously exceeding ERD envelope and increased workloads on both personnel and drilling equipment, the effectiveness of preventing accidents is deteriorating. This leads to severe consequences: Bottom Hole Assembly lost in hole, the necessity to re-drill the bore and eventually to increased Non-Productive Time (NPT). Developed application based on ensemble machine learning algorithm shows prediction accuracy above 94%. Reacting on alarms, driller can quickly take measures to prevent downhole accidents during well construction of ERD wells.


Author(s):  
Omair Rashed Abdulwareth Almanifi ◽  
Mohd Azraai Mohd Razman ◽  
Ismail Mohd Khairuddin ◽  
Muhammad Amirul Abdullah ◽  
Anwar P.P. Abdul Majeed

Author(s):  
Simone A. Ludwig

Abstract Epilepsy is a chronic neurological disorder that is caused by unprovoked recurrent seizures. The most commonly used tool for the diagnosis of epilepsy is the electroencephalogram (EEG) whereby the electrical activity of the brain is measured. In order to prevent potential risks, the patients have to be monitored as to detect an epileptic episode early on and to provide prevention measures. Many different research studies have used a combination of time and frequency features for the automatic recognition of epileptic seizures. In this paper, two fusion methods are compared. The first is based on an ensemble method and the second uses the Choquet fuzzy integral method. In particular, three different machine learning approaches namely RNN, ML and DNN are used as inputs for the ensemble method and the Choquet fuzzy integral fusion method. Evaluation measures such as confusion matrix, AUC and accuracy are compared as well as MSE and RMSE are provided. The results show that the Choquet fuzzy integral fusion method outperforms the ensemble method as well as other state-of-the-art classification methods.


Author(s):  
Kiran Maka ◽  
S. Pazhanirajan ◽  
Sujata Mallapur

In this work, two approaches have been presented to derive the important variables that an auditor should watch out for during the audit trials of a financial statement. To achieve this goal, machine learning modeling is leveraged. In the first approach, important features or variables are derived based on ensemble method and in the second approach, an explainable model is used to corroborate and expand the conclusions derived from the ensemble method. A dataset of financial statements that was labeled manually is utilized for this purpose. Four important measures, namely, random forest recommendations of first approach, random Forest Explaner -pvalue, random Forest Explainer-first multi-way importance plot and random Forest Explainer-second multi-way importance plot, are employed to derive the important features. A final list of six variables is derived from these two approaches and four measures


Water ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 2588
Author(s):  
Hao-Che Ho ◽  
Yen-Ming Chiang ◽  
Che-Chi Lin ◽  
Hong-Yuan Lee ◽  
Cheng-Chia Huang

The change in movable beds is related to the mechanisms of sediment transport and hydrodynamics. Numerical modelling with empirical equations and the simplified momentum equation is the common means to analyze the complicated sediment transport processing in river channels. The optimization of parameters is essential to obtain the proper results. Inadequate parameters would cause errors during the simulation process and accumulate the errors with long-time simulation. The optimized parameter combination for numerical modelling, however, is rarely discussed. This study adopted the ensemble method to simulate the change in the river channel, with a single model combined with multiple parameters. The optimized parameter combinations for a given river reach are investigated. Two river basins, located in Taiwan, were used as study cases, to simulate river morphology through the SRH-2D, which was developed by the U.S. Bureau of Reclamation. The input parameters related to the sediment transport module were randomly selected within a reasonable range. The parameter sets with proper results were selected as ensemble members. The concentration of sedimentation and bathymetry elevation was used to conduct the calibration. Both study cases show that 20 ensemble members were good enough to capture the results and save simulation time. However, when the ensemble members increased to 100, there was no significant improvement, but a longer simulation time. The result showed that the peak concentration and the occurrence of time could be predicted by the ensemble size of 20. Moreover, with consideration of the bed elevation as the target, the result showed that this method could quantitatively simulate the change in bed elevation. With both cases, this study showed that the ensemble method is a suitable approach for river morphology numerical modelling. The ensemble size of 20 can effectively obtain the result and reduce the uncertainty for sediment transport simulation.


Sign in / Sign up

Export Citation Format

Share Document