scholarly journals Application of Improved Boosting Algorithm for Art Image Classification

2021 ◽  
Vol 2021 ◽  
pp. 1-11 ◽  
Author(s):  
Yue Wu

In the field of computer science, data mining is a hot topic. It is a mathematical method for identifying patterns in enormous amounts of data. Image mining is an important data mining technique involving a variety of fields. In image mining, art image organization is an interesting research field worthy of attention. The classification of art images into several predetermined sets is referred to as art image categorization. Image preprocessing, feature extraction, object identification, object categorization, object segmentation, object classification, and a variety of other approaches are all part of it. The purpose of this paper is to suggest an improved boosting algorithm that employs a specific method of traditional and simple, yet weak classifiers to create a complex, accurate, and strong classifier image as well as a realistic image. This paper investigated the characteristics of cartoon images, realistic images, painting images, and photo images, created color variance histogram features, and used them for classification. To execute classification experiments, this paper uses an image database of 10471 images, which are randomly distributed into two portions that are used as training data and test data, respectively. The training dataset contains 6971 images, while the test dataset contains 3478 images. The investigational results show that the planned algorithm has a classification accuracy of approximately 97%. The method proposed in this paper can be used as the basis of automatic large-scale image classification and has strong practicability.

2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Author(s):  
Shaolei Wang ◽  
Zhongyuan Wang ◽  
Wanxiang Che ◽  
Sendong Zhao ◽  
Ting Liu

Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.


2011 ◽  
Vol 255-260 ◽  
pp. 4242-4246 ◽  
Author(s):  
Hui Mi Hsu ◽  
Sao Jeng Chao ◽  
Jia Ruey Chang

The pavement condition index (PCI), a numerical rating from 0 to 100, gives a good indication of the pavement condition. However, the pavement distress survey is a labor-intensive procedure which is performed quite subjectively by experienced pavement engineers. Then, a highly complicated calculation is required to determine the PCI of a road network. It is advantageous to determine the PCI from relevant pavement parameters. This study demonstrates how to develop a PCI assessment model based on pavement parameters by combining data mining technique and group method of data handling (GMDH) method. Records from provincial and county roads with asphalt surface and wide variety of pavement structure in Taiwan were employed. After conducting the find dependencies (FD) algorithm in data mining techniques, 120 dependent records were extracted from 253 raw records. For the PCI model development, 100 records were randomly selected as the training dataset. GMDH was successfully applied to develop a PCI assessment model that uses 7 critical pavement parameters and PCI as inputs and output, respectively. The R2 for the training dataset is 0.849. The rest of 20 records were utilized as the testing dataset, which has 0.851 of R2 based on the PCI assessment model. This study confirms that combining data mining technique and GMDH method has the potential to provide significant assistance in pavement condition assessment. The model proposed in this study provides a good foundation for further refinement when additional data is available.


Author(s):  
Asri Hidayad ◽  
Sarjon Defit ◽  
S Sumijan

The purpose of this study is to evaluate whether Tahfiz activities and learning outcomes are effective or not. The data processed in this study were data on tahfiz activities and data on student learning outcomes in class XI (eleven) totaling 42 data sourced from memorization of tahfiz, tahfiz grades, and student grades in Madrasah Aliyah Negeri 1 Bukittinggi. Based on the analysis of the data, this classification uses one of the methods of the Data Mining algorithm, K-Means Clustering. K-Means Clustering algorithm works based on the grouping method, In this data mining technique consists of data testing and training data with the input of the number of memorization of tahfiz, and the value of tahfiz as well as learning outcomes. The results of this study the school can determine how influential this activity tahfiz on student grades.


2020 ◽  
Vol 2 (2) ◽  
pp. 41-47
Author(s):  
Asri Hidayad ◽  
Sarjon Defit ◽  
Sumijan Sumijan

The purpose of this study is to evaluate whether Tahfiz activities and learning outcomes are effective or not. The data processed in this study were data on tahfiz activities and data on student learning outcomes in class XI (eleven) totaling 42 data sourced from memorization of tahfiz, tahfiz grades, and student grades in Madrasah Aliyah Negeri 1 Bukittinggi. Based on the analysis of the data, this classification uses one of the methods of the Data Mining algorithm, K-Means Clustering. K-Means Clustering algorithm works based on the grouping method, In this data mining technique consists of data testing and training data with the input of the number of memorization of tahfiz, and the value of tahfiz as well as learning outcomes. The results of this study the school can determine how influential this activity tahfiz on student grades.


2019 ◽  
Vol 11 (19) ◽  
pp. 2190
Author(s):  
Kushiyama ◽  
Matsuoka

After a large-scale disaster, many damaged buildings are demolished and treated as disaster waste. Though the weight of disaster waste was estimated two months after the 2016 earthquake in Kumamoto, Japan, the estimated weight was significantly different from the result when the disaster waste disposal was completed in March 2018. The amount of disaster waste generated is able to be estimated by an equation by multiplying the total number of severely damaged and partially damaged buildings by the coefficient of generated weight per building. We suppose that the amount of disaster waste would be affected by the conditions of demolished buildings, namely, the areas and typologies of building structures, but this has not yet been clarified. Therefore, in this study, we aimed to use geographic information system (GIS) map data to create a time series GIS map dataset with labels of demolished and remaining buildings in Mashiki town for the two-year period prior to the completion of the disaster waste disposal. We used OpenStreetMap (OSM) data as the base data and time series SPOT images observed in the two years following the Kumamoto earthquake to label all demolished and remaining buildings in the GIS map dataset. To effectively label the approximately 16,000 buildings in Mashiki town, we calculated an indicator that shows the possibility of the buildings to be classified as the remaining and demolished buildings from a change of brightness in SPOT images. We classified 5701 demolished buildings from 16,106 buildings, as of March 2018, by visual interpretation of the SPOT and Pleiades images with reference to this indicator. We verified that the number of demolished buildings was almost the same as the number reported by Mashiki municipality. Moreover, we assessed the accuracy of our proposed method: The F-measure was higher than 0.9 using the training dataset, which was verified by a field survey and visual interpretation, and included the labels of the 55 demolished and 55 remaining buildings. We also assessed the accuracy of the proposed method by applying it to all the labels in the OSM dataset, but the F-measure was 0.579. If we applied test data including balanced labels of the other 100 demolished and 100 remaining buildings, which were other than the training data, the F-measure was 0.790 calculated from the SPOT image of 25 March 2018. Our proposed method performed better for the balanced classification but not for imbalanced classification. We studied the examples of image characteristics of correct and incorrect estimation by thresholding the indicator.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Poulamee Chakraborty ◽  
Bhabani S. Das ◽  
Hitesh B. Vasava ◽  
Niranjan Panigrahi ◽  
Priyabrata Santra

Abstract Pedotransfer function (PTF) approach is a convenient way for estimating difficult-to-measure soil properties from basic soil data. Typically, PTFs are developed using a large number of samples collected from small (regional) areas for training and testing a predictive model. National soil legacy databases offer an opportunity to provide soil data for developing PTFs although legacy data are sparsely distributed covering large areas. Here, we examined the Indian soil legacy (ISL) database to select a comprehensive training dataset for estimating cation exchange capacity (CEC) as a test case in the PTF approach. Geostatistical and correlation analyses showed that legacy data entail diverse spatial and correlation structure needed in building robust PTFs. Through non-linear correlation measures and intelligent predictive algorithms, we developed a methodology to extract an efficient training dataset from the ISL data for estimating CEC with high prediction accuracy. The selected training data had comparable spatial variation and nonlinearity in parameters for training and test datasets. Thus, we identified specific indicators for constructing robust PTFs from legacy data. Our results open a new avenue to use large volume of existing soil legacy data for developing region-specific PTFs without the need for collecting new soil data.


Sign in / Sign up

Export Citation Format

Share Document