The Study of Multiple Classes Boosting Classification Method Based on Local Similarity

Boosting of the ensemble learning model has made great progress, but most of the methods are Boosting the single mode. For this reason, based on the simple multiclass enhancement framework that uses local similarity as a weak learner, it is extended to multimodal multiclass enhancement Boosting. First, based on the local similarity as a weak learner, the loss function is used to find the basic loss, and the logarithmic data points are binarized. Then, we find the optimal local similarity and find the corresponding loss. Compared with the basic loss, the smaller one is the best so far. Second, the local similarity of the two points is calculated, and then the loss is calculated by the local similarity of the two points. Finally, the text and image are retrieved from each other, and the correct rate of text and image retrieval is obtained, respectively. The experimental results show that the multimodal multi-class enhancement framework with local similarity as the weak learner is evaluated on the standard data set and compared with other most advanced methods, showing the experience proficiency of this method.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

Measuring Congestion and Reliability Impacts of Safety Projects

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211006729 ◽

2021 ◽

pp. 036119812110067

Author(s):

Simona Babiceanu ◽

Sanhita Lahiri ◽

Mena Lockwood

Keyword(s):

Performance Measures ◽

Positive Impact ◽

Operating Conditions ◽

Vehicle Miles Traveled ◽

Data Set ◽

Data Points ◽

Practical Recommendations

This study uses a suite of performance measures that was developed by taking into consideration various aspects of congestion and reliability, to assess impacts of safety projects on congestion. Safety projects are necessary to help move Virginia’s roadways toward safer operation, but can contribute to congestion and unreliability during execution, and can affect operations after execution. However, safety projects are assessed primarily for safety improvements, not for congestion. This study identifies an appropriate suite of measures, and quantifies and compares the congestion and reliability impacts of safety projects on roadways for the periods before, during, and after project execution. The paper presents the performance measures, examines their sensitivity based on operating conditions, defines thresholds for congestion and reliability, and demonstrates the measures using a set of Virginia safety projects. The data set consists of 10 projects totalling 92 mi and more than 1M data points. The study found that, overall, safety projects tended to have a positive impact on congestion and reliability after completion, and the congestion variability measures were sensitive to the threshold of reliability. The study concludes with practical recommendations for primary measures that may be used to measure overall impacts of safety projects: percent vehicle miles traveled (VMT) reliable with a customized threshold for Virginia; percent VMT delayed; and time to travel 10 mi. However, caution should be used when applying the results directly to other situations, because of the limited number of projects used in the study.

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

Elemental ratio measurements of organic compounds using aerosol mass spectrometry: characterization, improved calibration, and implications

Atmospheric Chemistry and Physics ◽

10.5194/acp-15-253-2015 ◽

2015 ◽

Vol 15 (1) ◽

pp. 253-272 ◽

Cited By ~ 418

Author(s):

M. R. Canagaratna ◽

J. L. Jimenez ◽

J. H. Kroll ◽

Q. Chen ◽

S. H. Kessler ◽

...

Keyword(s):

High Resolution ◽

Vacuum Ultraviolet ◽

Laboratory Data ◽

Detailed Examination ◽

Relative Difference ◽

Data Set ◽

Elemental Ratios ◽

Standard Data ◽

Aerosol Mass ◽

Relative Errors

Abstract. Elemental compositions of organic aerosol (OA) particles provide useful constraints on OA sources, chemical evolution, and effects. The Aerodyne high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS) is widely used to measure OA elemental composition. This study evaluates AMS measurements of atomic oxygen-to-carbon (O : C), hydrogen-to-carbon (H : C), and organic mass-to-organic carbon (OM : OC) ratios, and of carbon oxidation state (OS C) for a vastly expanded laboratory data set of multifunctional oxidized OA standards. For the expanded standard data set, the method introduced by Aiken et al. (2008), which uses experimentally measured ion intensities at all ions to determine elemental ratios (referred to here as "Aiken-Explicit"), reproduces known O : C and H : C ratio values within 20% (average absolute value of relative errors) and 12%, respectively. The more commonly used method, which uses empirically estimated H2O+ and CO+ ion intensities to avoid gas phase air interferences at these ions (referred to here as "Aiken-Ambient"), reproduces O : C and H : C of multifunctional oxidized species within 28 and 14% of known values. The values from the latter method are systematically biased low, however, with larger biases observed for alcohols and simple diacids. A detailed examination of the H2O+, CO+, and CO2+ fragments in the high-resolution mass spectra of the standard compounds indicates that the Aiken-Ambient method underestimates the CO+ and especially H2O+ produced from many oxidized species. Combined AMS–vacuum ultraviolet (VUV) ionization measurements indicate that these ions are produced by dehydration and decarboxylation on the AMS vaporizer (usually operated at 600 °C). Thermal decomposition is observed to be efficient at vaporizer temperatures down to 200 °C. These results are used together to develop an "Improved-Ambient" elemental analysis method for AMS spectra measured in air. The Improved-Ambient method uses specific ion fragments as markers to correct for molecular functionality-dependent systematic biases and reproduces known O : C (H : C) ratios of individual oxidized standards within 28% (13%) of the known molecular values. The error in Improved-Ambient O : C (H : C) values is smaller for theoretical standard mixtures of the oxidized organic standards, which are more representative of the complex mix of species present in ambient OA. For ambient OA, the Improved-Ambient method produces O : C (H : C) values that are 27% (11%) larger than previously published Aiken-Ambient values; a corresponding increase of 9% is observed for OM : OC values. These results imply that ambient OA has a higher relative oxygen content than previously estimated. The OS C values calculated for ambient OA by the two methods agree well, however (average relative difference of 0.06 OS C units). This indicates that OS C is a more robust metric of oxidation than O : C, likely since OS C is not affected by hydration or dehydration, either in the atmosphere or during analysis.

Download Full-text

Generation of a Complete Profile for Porosity Log While Drilling Complex Lithology by Employing the Artificial Intelligence

10.2118/208642-ms ◽

2021 ◽

Author(s):

Ahmed Al-Sabaa ◽

Hany Gamal ◽

Salaheldin Elkatatny

Keyword(s):

Artificial Intelligence ◽

Prediction Model ◽

Real Time ◽

Storage Capacity ◽

Data Set ◽

Drilling Parameters ◽

Unseen Data ◽

Rock Porosity ◽

Data Points ◽

Logging Tool

Abstract The formation porosity of drilled rock is an important parameter that determines the formation storage capacity. The common industrial technique for rock porosity acquisition is through the downhole logging tool. Usually logging while drilling, or wireline porosity logging provides a complete porosity log for the section of interest, however, the operational constraints for the logging tool might preclude the logging job, in addition to the job cost. The objective of this study is to provide an intelligent prediction model to predict the porosity from the drilling parameters. Artificial neural network (ANN) is a tool of artificial intelligence (AI) and it was employed in this study to build the porosity prediction model based on the drilling parameters as the weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q). The novel contribution of this study is to provide a rock porosity model for complex lithology formations using drilling parameters in real-time. The model was built using 2,700 data points from well (A) with 74:26 training to testing ratio. Many sensitivity analyses were performed to optimize the ANN model. The model was validated using unseen data set (1,000 data points) of Well (B), which is located in the same field and drilled across the same complex lithology. The results showed the high performance for the model either for training and testing or validation processes. The overall accuracy for the model was determined in terms of correlation coefficient (R) and average absolute percentage error (AAPE). Overall, R was higher than 0.91 and AAPE was less than 6.1 % for the model building and validation. Predicting the rock porosity while drilling in real-time will save the logging cost, and besides, will provide a guide for the formation storage capacity and interpretation analysis.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

A standard data set for performance analysis of advanced IR image processing techniques

10.1117/12.919004 ◽

2012 ◽

Cited By ~ 1

Author(s):

A. Robert Weiß ◽

Uwe Adomeit ◽

Philippe Chevalier ◽

Stéphane Landeau ◽

Piet Bijl ◽

...

Keyword(s):

Image Processing ◽

Performance Analysis ◽

Data Set ◽

Image Processing Techniques ◽

Standard Data ◽

Processing Techniques

Download Full-text

Image Retrieval with Similar Object Detection and Local Similarity to Detected Objects

PRICAI 2019: Trends in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-29894-4_4 ◽

2019 ◽

pp. 42-55

Author(s):

Sidra Hanif ◽

Chao Li ◽

Anis Alazzawe ◽

Longin Jan Latecki

Keyword(s):

Image Retrieval ◽

Object Detection ◽

Local Similarity ◽

Similar Object

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Lake and mire isolation data set for the estimation of post-glacial land uplift in Fennoscandia

Earth System Science Data ◽

10.5194/essd-12-869-2020 ◽

2020 ◽

Vol 12 (2) ◽

pp. 869-873

Author(s):

Jari Pohjola ◽

Jari Turunen ◽

Tarmo Lipping

Keyword(s):

Spatial Location ◽

Land Uplift ◽

Data Set ◽

Dry Land ◽

Archaeological Data ◽

Complex Process ◽

Postglacial Land Uplift ◽

Data Points ◽

Ice Retreat ◽

Shoreline Displacement

Abstract. Postglacial land uplift is a complex process related to the continental ice retreat that took place about 10 000 years ago and thus started the viscoelastic response of the Earth's crust to rebound back to its equilibrium state. To empirically model the land uplift process based on past behaviour of shoreline displacement, data points of known spatial location, elevation and dating are needed. Such data can be obtained by studying the isolation of lakes and mires from the sea. Archaeological data on human settlements (i.e. human remains, fireplaces etc.) are also very useful as the settlements were indeed situated on dry land and were often located close to the coast. This information can be used to validate and update the postglacial land uplift model. In this paper, a collection of data underlying empirical land uplift modelling in Fennoscandia is presented. The data set is available at https://doi.org/10.1594/PANGAEA.905352 (Pohjola et al., 2019).

Download Full-text