Use case repository framework based on machine learning algorithm to analyze the software development estimation with intelligent information systems

Author(s):  
R. Lalitha ◽  
B. Latha ◽  
G. Sumathi

The success of information system process depends on accuracy of software estimation. Estimation is done at initial phase of software development. It requires a collection of all relevant required information for estimating the software effort. In this paper, a methodology is proposed to maintain a knowledgeable use case repository to store the use cases of various projects in several software project-related domains. This acts as a reference model to compare similar use cases of similar types of projects. The use case points are calculated and using this, schedule estimation and effort estimation of a project are calculated using the formulas of software engineering. These values are compared with the estimated effort and scheduled effort of a new project under development. Apart from these, the effective machine learning technique called neural network is used to measure how accurately the information is processed by use of case repository framework. The proposed machine learning-based use case repository system helps to estimate and analyze the effort using the machine learning algorithms.

Effort estimation is a crucial step that leads to Duration estimation and cost estimation in software development. Estimations done in the initial stage of projects are based on requirements that may lead to success or failure of the project. Accurate estimations lead to success and inaccurate estimates lead to failure. There is no one particular method which cloud do accurate estimations. In this work, we propose Machine learning techniques linear regression and K-nearest Neighbors to predict Software Effort estimation using COCOMO81, COCOMONasa, and COCOMONasa2 datasets. The results obtained from these two methods have been compared. The 80% data in data sets used for training and remaining used as the test set. The correlation coefficient, Mean squared error (MSE) and Mean magnitude relative error (MMRE) are used as performance metrics. The experimental results show that these models forecast the software effort accurately.


2021 ◽  
Author(s):  
Bhuvaneswari Sankaranarayanan ◽  
Aria Abubakar ◽  
David F. Allen ◽  
Ivan Diaz Granados

Abstract Log interpretation is the task of analyzing and processing well logs to generate the subsurface properties around wells. A direct application of machine learning (ML) to this task is to train an ML model for predicting properties in target wells given well logs (data) and properties (labels) in a set of training wells in the same field and/or region. Our ML model of choice for predicting the desired properties is the decision tree-based learning algorithm called random forests (RF). We also devise a mechanism to automatically tune the hyperparameters of this algorithm depending on the data in the training wells. This eliminates the tedious task of carefully tuning the hyperparameters for every new set of training wells and provides a one-click solution. In addition to predicting the properties, we compute the uncertainty in the predicted properties in the form of prediction intervals using the concept of quantile regression forests (QRF). We test our workflow on two use cases. First, we consider a petrophysics use case on an unconventional land dataset to predict the petrophysical properties such as water saturation, total porosity, volume of clay, and total organic carbon from petrophysics logs. Then, we consider a geomechanics use case on a conventional offshore dataset to predict the lithology, pore pressure, and rock mechanical properties. We obtain a good prediction performance on both use cases. The uncertainty estimates also complement the ML model's prediction of the properties by explaining the various correlations that are found to be existing among them based on domain knowledge. The entire workflow of automating the tuning of hyperparameters and training the ML model to predict the properties along with its estimate of uncertainty provide a complete solution to apply the ML workflow for automated log interpretation.


2011 ◽  
Vol 7 (3) ◽  
pp. 41-53 ◽  
Author(s):  
Jeremiah D. Deng ◽  
Martin Purvis ◽  
Maryam Purvis

Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often cannot achieve satisfactory results. This paper presents an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the necessity and effectiveness of the proposed approach.


Author(s):  
Jeremiah D. Deng ◽  
Martin Purvis ◽  
Maryam Purvis

Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often cannot achieve satisfactory results. This paper presents an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the necessity and effectiveness of the proposed approach.


2020 ◽  
Vol 10 (9) ◽  
pp. 3044 ◽  
Author(s):  
Bo Kyung Park ◽  
R. Young Chul Kim

Sometimes unclearly describing the requirement specifications of satisfied customer’s needs, means it may be difficult to develop the production of high-quality software systems. A persistent issue of requirement engineering is how to clearly understand the requirements of the large and complex software project, and also how to analyze them exactly. To solve this problem, we propose a linguistic analysis method based on the semantic analysis of the Fillmore’s textual approach. This method extracts use-cases from informal requirement specifications. For applied requirement engineering with this method, we suggest extracting a use-case diagram, as well as calculating the software effort estimation with the original use-case point (UCP). To simply explanations of our use-case extraction method, we use one example of a simple postal information system.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 592
Author(s):  
Radek Silhavy ◽  
Petr Silhavy ◽  
Zdenka Prokopova

Software size estimation represents a complex task, which is based on data analysis or on an algorithmic estimation approach. Software size estimation is a nontrivial task, which is important for software project planning and management. In this paper, a new method called Actors and Use Cases Size Estimation is proposed. The new method is based on the number of actors and use cases only. The method is based on stepwise regression and led to a very significant reduction in errors when estimating the size of software systems compared to Use Case Points-based methods. The proposed method is independent of Use Case Points, which allows the elimination of the effect of the inaccurate determination of Use Case Points components, because such components are not used in the proposed method.


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1195
Author(s):  
Priya Varshini A G ◽  
Anitha Kumari K ◽  
Vijayakumar Varadarajan

Software Project Estimation is a challenging and important activity in developing software projects. Software Project Estimation includes Software Time Estimation, Software Resource Estimation, Software Cost Estimation, and Software Effort Estimation. Software Effort Estimation focuses on predicting the number of hours of work (effort in terms of person-hours or person-months) required to develop or maintain a software application. It is difficult to forecast effort during the initial stages of software development. Various machine learning and deep learning models have been developed to predict the effort estimation. In this paper, single model approaches and ensemble approaches were considered for estimation. Ensemble techniques are the combination of several single models. Ensemble techniques considered for estimation were averaging, weighted averaging, bagging, boosting, and stacking. Various stacking models considered and evaluated were stacking using a generalized linear model, stacking using decision tree, stacking using a support vector machine, and stacking using random forest. Datasets considered for estimation were Albrecht, China, Desharnais, Kemerer, Kitchenham, Maxwell, and Cocomo81. Evaluation measures used were mean absolute error, root mean squared error, and R-squared. The results proved that the proposed stacking using random forest provides the best results compared with single model approaches using the machine or deep learning algorithms and other ensemble techniques.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


Sign in / Sign up

Export Citation Format

Share Document