Journal of Software | ScienceGate

Using Metrics for Risk Prediction in Object-Oriented Software: A Cross-Version Validation

Journal of Software ◽

10.17706/jsw.17.1.1-20 ◽

2022 ◽

pp. 1-20

Author(s):

Salim Moudache ◽

◽

Mourad Badri

Keyword(s):

Random Forest ◽

Correlation Coefficient ◽

Prediction Models ◽

Risk Model ◽

Object Oriented ◽

Machine Learning Algorithms ◽

Support Vector ◽

Empirical Risk ◽

Severity Prediction ◽

Object Oriented Metrics

This work aims to investigate the potential, from different perspectives, of a risk model to support Cross-Version Fault and Severity Prediction (CVFSP) in object-oriented software. The risk of a class is addressed from the perspective of two particular factors: the number of faults it can contain and their severity. We used various object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the concept of Euclidean distance. We used a dataset collected from five successive versions of an open-source Java software system (ANT). We investigated different variants of the considered risk model, based on various combinations of object-oriented metrics pairs. We used different machine learning algorithms for building the prediction models: Naive Bayes (NB), J48, Random Forest (RF), Support Vector Machines (SVM) and Multilayer Perceptron (ANN). We investigated the effectiveness of the prediction models for Cross-Version Fault and Severity Prediction (CVFSP), using data of prior versions of the considered system. We also investigated if the considered risk model can give as output the Empirical Risk (ER) of a class, a continuous value considering both the number of faults and their different levels of severity. We used different techniques for building the prediction models: Linear Regression (LR), Gaussian Process (GP), Random forest (RF) and M5P (two decision trees algorithms), SmoReg and Artificial Neural Network (ANN). The considered risk model achieves acceptable results for both cross-version binary fault prediction (a g-mean of 0.714, an AUC of 0.725) and cross-version multi-classification of levels of severity (a g-mean of 0.758, an AUC of 0.771). The model also achieves good results in the estimation of the empirical risk of a class by considering both the number of faults and their levels of severity (intra-version analysis with a correlation coefficient of 0.659, cross-version analysis with a correlation coefficient of 0.486).

Download Full-text

Framework of Intelligent System for Machine Learning Algorithm Selection in Social Sciences

Journal of Software ◽

10.17706/jsw.17.1.21-28 ◽

2022 ◽

pp. 21-28

Author(s):

Dijana Oreški ◽

Keyword(s):

Machine Learning ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Algorithm Selection ◽

Data Set ◽

Ranking Algorithms ◽

Meta Learning ◽

The Right

The ability to generate data has never been as powerful as today when three quintile bytes of data are generated daily. In the field of machine learning, a large number of algorithms have been developed, which can be used for intelligent data analysis and to solve prediction and descriptive problems in different domains. Developed algorithms have different effects on different problems.If one algorithmworks better on one dataset,the same algorithm may work worse on another data set. The reason is that each dataset has different features in terms of local and global characteristics. It is therefore imperative to know intrinsic algorithms behavior on different types of datasets andchoose the right algorithm for the problem solving. To address this problem, this papergives scientific contribution in meta learning field by proposing framework for identifying the specific characteristics of datasets in two domains of social sciences:education and business and develops meta models based on: ranking algorithms, calculating correlation of ranks, developing a multi-criteria model, two-component index and prediction based on machine learning algorithms. Each of the meta models serve as the basis for the development of intelligent system version. Application of such framework should include a comparative analysis of a large number of machine learning algorithms on a large number of datasetsfromsocial sciences.

Download Full-text

Intelligent Resume Retrieval Based on Lucence

Journal of Software ◽

10.17706/jsw.17.1.29-35 ◽

2022 ◽

pp. 29-35

Author(s):

Jianping Du ◽

Keyword(s):

Human Resources ◽

Search Engine ◽

Full Text ◽

Scoring Function ◽

Basic Requirement ◽

Work Efficiency ◽

Text Search ◽

Full Text Search ◽

Filtering Algorithm ◽

Intelligent Filtering

With the development of Internet, the electronic resume has gradually replaced the paper one. It is the basic requirement of recruitment for enterprises to retrieve the talent information that fulfills the requirement quickly and without omission.Based on the framework of SpringBoot and Lucence full-text search engine, this paper implements a resume intelligent filtering algorithm, which improves the query speed of the system by establishing an index database. At the same time,the scoring function improves the accuracy of the filtering results, reduces the pressure of high concurrency of the database, improves the work efficiency of the Human Resources Department, and avoids the talent loss.

Download Full-text

Mining Weighted Periodic Patterns by a Weighted Direction Graph Based Approach for Time-Series Databases

Journal of Software ◽

10.17706/jsw.16.6.267-284 ◽

2021 ◽

pp. 267-284

Author(s):

Ye-In Chang ◽

◽

Cheng-An Fu ◽

Jia-Zhen Que

Keyword(s):

Time Series ◽

Data Structure ◽

Data Structures ◽

Processing Time ◽

Pattern Mining ◽

Periodic Pattern ◽

Performance Study ◽

Periodic Patterns ◽

Memory Space ◽

Suffix Trie

Periodic pattern mining in time series database plays an important part in data mining. However, most existing algorithms consider only the count of each item, but do not consider about the value of each item. To consider the value of each item on periodic pattern mining in time series databases, Chanda et al. proposed an algorithm called WPPM. In their algorithm, they construct the suffix trie to store the candidate pattern at first. However, the suffix trie would use too much storage space. In order to decrease the processing time for constructing the data structure, in this paper, we propose two data structures to store the candidates. The first data structure is Weighted Paired Matrix. After scanning the database, we will transform the database into the matrix type, and it is used for the second data structures. Therefore, our algorithm not only can decrease the usage of the memory space, but also the processing time. Because we do not need to use so much time to construct so many nodes and edges. Moreover, wealso consider the case of incremental mining for the increase of the data length. From the performance study, we show that our proposed algorithm based on the Weighted Direction Graphis more efficient than the WPPMalgorithm.

Download Full-text

Research on an Improved MB-LBP 3D Face Recognition Method

Journal of Software ◽

10.17706/jsw.16.6.306-314 ◽

2021 ◽

pp. 306-314

Author(s):

Liangliang Shi ◽

◽

Xia Wang ◽

Yongliang Shen

Keyword(s):

Face Recognition ◽

Recognition Rate ◽

Depth Image ◽

Support Vector ◽

3D Face Recognition ◽

Recognition Method ◽

Recognition Time ◽

3D Face ◽

Feature Information ◽

Average Information

In order to improve the accuracy and speed of 3D face recognition, this paper proposes an improved MB-LBP 3D face recognition method. First, the MB-LBP algorithm is used to extract the features of 3D face depth image, then the average information entropy algorithm is used to extract the effective feature information of the image, and finallythe Support Vector Machine algorithm is used to identify the extracted effective information. The recognition rate on the Texas 3DFRD database is 96.88%, and the recognition time is 0.025s. The recognition rate in the self-made depth library is 96.36%, and the recognition time is 0.02s.It can be seen from the experimental results that the algorithm in this paper has better performance in terms of accuracy and speed.

Download Full-text

Adversarial Semi-supervised Learning for Corporate Credit Ratings

Journal of Software ◽

10.17706/jsw.16.6.259-266 ◽

2021 ◽

pp. 259-266

Author(s):

Bojing Feng ◽

◽

Wenfang Xue

Keyword(s):

Supervised Learning ◽

Financial Risk ◽

Credit Rating ◽

Learning Algorithm ◽

Unlabeled Data ◽

Assessment Process ◽

Rating System ◽

Second Phase ◽

Corporate Credit ◽

Corporate Credit Rating

Corporate credit rating is an analysis of credit risks withina corporation, which plays a vital role during the management of financial risk. Traditionally, the rating assessment process based on the historical profile of corporation is usually expensive and complicated, which often takes months. Therefore, most of the corporations, duetothelack in money and time, can’t get their own credit level. However, we believe that although these corporations haven’t their credit rating levels (unlabeled data), this big data contains useful knowledgeto improve credit system. In this work, its major challenge lies in how to effectively learn the knowledge from unlabeled data and help improve the performance of the credit rating system. Specifically, we consider the problem of adversarial semi-supervised learning (ASSL) for corporate credit rating which has been rarely researched before. A novel framework adversarial semi-supervised learning for corporate credit rating (ASSL4CCR) which includes two phases is proposed to address these problems. In the first phase, we train a normal rating system via a machine-learning algorithm to give unlabeled data pseudo rating level. Then in the second phase, adversarial semi-supervised learning is applied uniting labeled data and pseudo-labeleddatato build the final model. To demonstrate the effectiveness of the proposed ASSL4CCR, we conduct extensive experiments on the Chinese public-listed corporate rating dataset, which proves that ASSL4CCR outperforms the state-of-the-art methods consistently.

Download Full-text

Team Collaboration Assessment Method in Marine Engine Room Simulator

Journal of Software ◽

10.17706/jsw.16.6.315-332 ◽

2021 ◽

pp. 315-332

Author(s):

Hui Cao ◽

Keyword(s):

Evaluation Method ◽

Similarity Theory ◽

Fuzzy Comprehensive Evaluation ◽

Assessment Method ◽

Task Completion ◽

Weighting Method ◽

Team Collaboration ◽

Engine Room ◽

The Impact ◽

Assessment Result

Based on the fuzzy mathematics and set similarity theory an intelligent collaboration assessment method for engine room simulator was studied. First, an integrated weighting method using both subjective and objective information was designed to obtain the weight vector; second, the fuzzy comprehensive evaluation method was used to calculate the completion degree of team collaboration, then the Dice coefficient and the Tversky coefficient were adopted to quantify the sequence factor, interactivity factor, redundancy factor and unauthorized factor of team collaboration effectiveness; third, a comprehensive calculation was achieved by the completion degree and the four factors to get the team collaboration assessment result; finally, the influence of the collaboration factors on assessment result was analyzed by an example, and it was found that even if the team get a higher task completion degree, due to some factors, the score is still low. The research shows that the collaborative performance of a team can greatly influence the final assessment result, the quantitative analysis of team collaboration can more objectively reveal the impact on collaboration. It is an effective method to add the influence of team cooperation factors to the traditional individual evaluation.

Download Full-text

Object Metrics for Green Software

Journal of Software ◽

10.17706/jsw.16.6.285-305 ◽

2021 ◽

pp. 285-305

Author(s):

Mourad Chabane Oussalah ◽

Romain Brohan ◽

Ossama Moustafa

Keyword(s):

Energy Consumption ◽

Significant Part ◽

Common Object ◽

Green Software ◽

The Impact

Today, the energy consumption of computers represents a significant part of the overall consump-tion. The purpose of this article is to apply object and architectural metrics to observe the impact on applica-tion consumption. This article focuses on the most common object applications to date, and their architec-tures that are already useful to optimize the reusability, composability or dynamicity of these applications. To do this, consumption must be evaluated and compared according to the variations of object and architec-tural metrics. These observations help to determine how effective these metrics could be.

Download Full-text

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Journal of Software ◽

10.17706/jsw.16.5.219-234 ◽

2021 ◽

pp. 219-234

Author(s):

Wen Yin ◽

◽

Chenchen Pan ◽

Nanyi Deng ◽

Dong Ji

Keyword(s):

Machine Learning ◽

Life Expectancy ◽

Mean Squared Error ◽

Negative Impact ◽

Influential Factors ◽

Machine Learning Techniques ◽

Statistical Machine Learning ◽

Learning Methods ◽

Machine Learning Methods ◽

Squared Error

The COVID-19 pandemic has caused a significant negative impact on countries around the world, and there appears to be an observable difference in severity among nations. This study aims to provide an insight into the roles many social and economic factors played in contributing to this variation. By investigating potential patterns through exploratory data analysis, followed by constructing models using several popular machine learning techniques, we examine the validity of the underlying assumptions and identifying any potential limitations. Total deaths per million population is used as dependent variable with log transformation to remove outliers. A set of factors such as life expectancy, unemployment rate and population are available in the dataset. After removing and transforming outliers, various machine learning methods with cross validation are implemented and the optimal model is determined by predefined metrics such as root-mean-squared-error (RMSE) and mean-squared-error (MAE). The results show that the Gradient Boost Machine (GBM) technique achieves the most optimal results in terms of minimum RMSE and MAE. The RMSE and MAE values indicate no over fitting issues and the GBM algorithm captures the most influential factors such as life expectancy, healthcare expense per Gross Domestic Product (GDP) and GDP per capita, which are clearly critical explanatory variables for predicting total deaths per million population.

Download Full-text

Unknown Oriented Programming: Mathematical Continuation

Journal of Software ◽

10.17706/jsw.16.5.200-207 ◽

2021 ◽

pp. 200-207

Author(s):

Zhu Ping ◽

Keyword(s):

Spatial Distribution ◽

Natural Language ◽

Large Scale ◽

Research Direction ◽

Language Design ◽

Unknown Input ◽

Strategy Formulation ◽

Natural Language Semantic ◽

Abstract Function ◽

Engineering Problems

Natural language semantic engineering problems are faced with unknown input and intensive knowledge challenges. In order to adapt to the featuresof natural language semantic engineering, the AI programinglanguage needs to be expanded mathematically: 1) Using many ways to improve the spatial distribution and coverage of instances; 2) Keeping different abstract function versions running at the same time; 3) Providing a large numberof knowledge configuration files and supporting functions to deal with intensive knowledge problems; 4) Using the most possibilitypriority call to solve the problem of multiple running branchestraversal. This paper introduces the unknown oriented programming ideas, basic strategy formulation,language design and simulation running examples. It provides a new method for the incremental research and development of large-scale natural language semantic engineeringapplication. Finally, this paper summarizes the full text and puts forward the further research direction.

Download Full-text

Journal of Software
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By International Academy Publishing (Iap)

Using Metrics for Risk Prediction in Object-Oriented Software: A Cross-Version Validation

Framework of Intelligent System for Machine Learning Algorithm Selection in Social Sciences

Intelligent Resume Retrieval Based on Lucence

Mining Weighted Periodic Patterns by a Weighted Direction Graph Based Approach for Time-Series Databases

Research on an Improved MB-LBP 3D Face Recognition Method

Adversarial Semi-supervised Learning for Corporate Credit Ratings

Team Collaboration Assessment Method in Marine Engine Room Simulator

Object Metrics for Green Software

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Unknown Oriented Programming: Mathematical Continuation

Export Citation Format

Journal of SoftwareLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By International Academy Publishing (Iap)

Using Metrics for Risk Prediction in Object-Oriented Software: A Cross-Version Validation

Framework of Intelligent System for Machine Learning Algorithm Selection in Social Sciences

Intelligent Resume Retrieval Based on Lucence

Mining Weighted Periodic Patterns by a Weighted Direction Graph Based Approach for Time-Series Databases

Research on an Improved MB-LBP 3D Face Recognition Method

Adversarial Semi-supervised Learning for Corporate Credit Ratings

Team Collaboration Assessment Method in Marine Engine Room Simulator

Object Metrics for Green Software

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Unknown Oriented Programming: Mathematical Continuation

Journal of Software
Latest Publications