Lake and mire isolation data set for the estimation of post-glacial land uplift in Fennoscandia

Abstract. Postglacial land uplift is a complex process related to the continental ice retreat that took place about 10 000 years ago and thus started the viscoelastic response of the Earth's crust to rebound back to its equilibrium state. To empirically model the land uplift process based on past behaviour of shoreline displacement, data points of known spatial location, elevation and dating are needed. Such data can be obtained by studying the isolation of lakes and mires from the sea. Archaeological data on human settlements (i.e. human remains, fireplaces etc.) are also very useful as the settlements were indeed situated on dry land and were often located close to the coast. This information can be used to validate and update the postglacial land uplift model. In this paper, a collection of data underlying empirical land uplift modelling in Fennoscandia is presented. The data set is available at https://doi.org/10.1594/PANGAEA.905352 (Pohjola et al., 2019).

Download Full-text

Lake and mire isolation data set for the estimation of post-glacial land uplift in Fennoscandia

10.5194/essd-2019-165 ◽

2019 ◽

Author(s):

Jari Pohjola ◽

Jari Turunen ◽

Tarmo Lipping

Keyword(s):

Spatial Location ◽

Land Uplift ◽

Data Set ◽

Dry Land ◽

Uplift Modeling ◽

Past Behavior ◽

Complex Process ◽

Postglacial Land Uplift ◽

Data Points ◽

Ice Retreat

Abstract. Postglacial land uplift is a complex process related to the continental ice retreat that took place about 10,000 years ago and thus started the viscoelastic response of the Earth's crust to rebound back to it's equilibrium state. To empirically model the land uplift process based on past behavior of shoreline displacement, data points of known spatial location, elevation and dating are needed. Such data can be obtained by studying the isolation of lakes and mires from the sea. Archaeological data on human settlements (i.e., human remains, fireplaces etc.) are also very useful as the settlements were indeed situated on dry land and were often located close to the coast. This information can be used to validate and update the postglacial land uplift model. In this paper, a collection of data underlying empirical land uplift modeling in Fennoscandia is presented. The data set is available at https://doi.org/10.1594/PANGAEA.905352 (Pohjola et al. 2019).

Download Full-text

Measuring Congestion and Reliability Impacts of Safety Projects

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211006729 ◽

2021 ◽

pp. 036119812110067

Author(s):

Simona Babiceanu ◽

Sanhita Lahiri ◽

Mena Lockwood

Keyword(s):

Performance Measures ◽

Positive Impact ◽

Operating Conditions ◽

Vehicle Miles Traveled ◽

Data Set ◽

Data Points ◽

Practical Recommendations

This study uses a suite of performance measures that was developed by taking into consideration various aspects of congestion and reliability, to assess impacts of safety projects on congestion. Safety projects are necessary to help move Virginia’s roadways toward safer operation, but can contribute to congestion and unreliability during execution, and can affect operations after execution. However, safety projects are assessed primarily for safety improvements, not for congestion. This study identifies an appropriate suite of measures, and quantifies and compares the congestion and reliability impacts of safety projects on roadways for the periods before, during, and after project execution. The paper presents the performance measures, examines their sensitivity based on operating conditions, defines thresholds for congestion and reliability, and demonstrates the measures using a set of Virginia safety projects. The data set consists of 10 projects totalling 92 mi and more than 1M data points. The study found that, overall, safety projects tended to have a positive impact on congestion and reliability after completion, and the congestion variability measures were sensitive to the threshold of reliability. The study concludes with practical recommendations for primary measures that may be used to measure overall impacts of safety projects: percent vehicle miles traveled (VMT) reliable with a customized threshold for Virginia; percent VMT delayed; and time to travel 10 mi. However, caution should be used when applying the results directly to other situations, because of the limited number of projects used in the study.

Download Full-text

The Study of Multiple Classes Boosting Classification Method Based on Local Similarity

Algorithms ◽

10.3390/a14020037 ◽

2021 ◽

Vol 14 (2) ◽

pp. 37

Author(s):

Shixun Wang ◽

Qiang Chen

Keyword(s):

Image Retrieval ◽

Loss Function ◽

Single Mode ◽

Local Similarity ◽

Text And Image ◽

Data Set ◽

Standard Data ◽

Weak Learner ◽

Great Progress ◽

Data Points

Boosting of the ensemble learning model has made great progress, but most of the methods are Boosting the single mode. For this reason, based on the simple multiclass enhancement framework that uses local similarity as a weak learner, it is extended to multimodal multiclass enhancement Boosting. First, based on the local similarity as a weak learner, the loss function is used to find the basic loss, and the logarithmic data points are binarized. Then, we find the optimal local similarity and find the corresponding loss. Compared with the basic loss, the smaller one is the best so far. Second, the local similarity of the two points is calculated, and then the loss is calculated by the local similarity of the two points. Finally, the text and image are retrieved from each other, and the correct rate of text and image retrieval is obtained, respectively. The experimental results show that the multimodal multi-class enhancement framework with local similarity as the weak learner is evaluated on the standard data set and compared with other most advanced methods, showing the experience proficiency of this method.

Download Full-text

Generation of a Complete Profile for Porosity Log While Drilling Complex Lithology by Employing the Artificial Intelligence

10.2118/208642-ms ◽

2021 ◽

Author(s):

Ahmed Al-Sabaa ◽

Hany Gamal ◽

Salaheldin Elkatatny

Keyword(s):

Artificial Intelligence ◽

Prediction Model ◽

Real Time ◽

Storage Capacity ◽

Data Set ◽

Drilling Parameters ◽

Unseen Data ◽

Rock Porosity ◽

Data Points ◽

Logging Tool

Abstract The formation porosity of drilled rock is an important parameter that determines the formation storage capacity. The common industrial technique for rock porosity acquisition is through the downhole logging tool. Usually logging while drilling, or wireline porosity logging provides a complete porosity log for the section of interest, however, the operational constraints for the logging tool might preclude the logging job, in addition to the job cost. The objective of this study is to provide an intelligent prediction model to predict the porosity from the drilling parameters. Artificial neural network (ANN) is a tool of artificial intelligence (AI) and it was employed in this study to build the porosity prediction model based on the drilling parameters as the weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q). The novel contribution of this study is to provide a rock porosity model for complex lithology formations using drilling parameters in real-time. The model was built using 2,700 data points from well (A) with 74:26 training to testing ratio. Many sensitivity analyses were performed to optimize the ANN model. The model was validated using unseen data set (1,000 data points) of Well (B), which is located in the same field and drilled across the same complex lithology. The results showed the high performance for the model either for training and testing or validation processes. The overall accuracy for the model was determined in terms of correlation coefficient (R) and average absolute percentage error (AAPE). Overall, R was higher than 0.91 and AAPE was less than 6.1 % for the model building and validation. Predicting the rock porosity while drilling in real-time will save the logging cost, and besides, will provide a guide for the formation storage capacity and interpretation analysis.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0030 ◽

2014 ◽

Vol 23 (1) ◽

pp. 75-82 ◽

Cited By ~ 12

Author(s):

Cagatay Catal

Keyword(s):

Supervised Classification ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Classification Methods ◽

Data Set ◽

Software Defect ◽

Data Points ◽

Supervised Classification Methods ◽

Prediction Approach

AbstractPredicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classification methods for semi-supervised defect prediction. Low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods have been investigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Rock Strength Prediction in Real-Time while Drilling Employing Random Forest and Functional Network Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4050843 ◽

2021 ◽

pp. 1-21

Author(s):

Hany Gamal ◽

Ahmed Alsaihati ◽

Salaheldin Elkatatny ◽

Saleh Haidary ◽

Abdulazeez Abdulraheem

Keyword(s):

Random Forest ◽

Real Time ◽

Rock Strength ◽

Prediction Models ◽

Functional Network ◽

Percentage Error ◽

Data Set ◽

Unseen Data ◽

Drilling Data ◽

Data Points

Abstract The rock unconfined compressive strength (UCS) is one of the key parameters for geomechanical and reservoir modeling in the petroleum industry. Obtaining the UCS by conventional methods such as experimental work or empirical correlation from logging data are time consuming and highly cost. To overcome these drawbacks, this paper utilized the help of artificial intelligence (AI) to predict (in a real-time) the rock strength from the drilling parameters using two AI tools. Random forest (RF) based on principal component analysis (PCA), and functional network (FN) techniques were employed to build two UCS prediction models based on the drilling data such as weight on bit (WOB), drill string rotating-speed (RS), drilling torque (T), stand-pipe pressure (SPP), mud pumping rate (Q), and the rate of penetration (ROP). The models were built using 2,333 data points from well (A) with 70:30 training to testing ratio. The models were validated using unseen data set (1,300 data points) of Well (B) which is located in the same field and drilled across the same complex lithology. The results of the PCA-based RF model outperformed the FN in terms of correlation coefficient (R) and average absolute percentage error (AAPE). The overall accuracy for PCA-based RF was R of 0.99 and AAPE of 4.3 %, and for FN yielded R of 0.97 and AAPE of 8.5%. The validation results showed that R was 0.99 for RF and 0.96 for FN, while the AAPE was 4 and 7.9 % for RF and FN models, respectively. The developed PCA-based RF and FN models provide an accurate UCS estimation in real-time from the drilling data, saving time and cost and enhancing the well stability by generating UCS log from the rig drilling data.

Download Full-text

Categorization of Data Clustering Techniques

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch052 ◽

2008 ◽

pp. 568-577

Author(s):

Baoying Wang ◽

Imad Rahal ◽

Richard Leipold

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Data Clustering ◽

Analysis Data ◽

Discovery Process ◽

Data Set ◽

Market Basket ◽

Clustering Techniques ◽

Data Points ◽

Class Labels

Data clustering is a discovery process that partitions a data set into groups (clusters) such that data points within the same group have high similarity while being very dissimilar to points in other groups (Han & Kamber, 2001). The ultimate goal of data clustering is to discover natural groupings in a set of patterns, points, or objects without prior knowledge of any class labels. In fact, in the machine-learning literature, data clustering is typically regarded as a form of unsupervised learning as opposed to supervised learning. In unsupervised learning or clustering, there is no training function as in supervised learning. There are many applications for data clustering including, but not limited to, pattern recognition, data analysis, data compression, image processing, understanding genomic data, and market-basket research.

Download Full-text