Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content

In the field of software engineering, practitioners’ share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors’ mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves [Formula: see text] accuracy in binary classification in SO design patterns tag and [Formula: see text] accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system’s predicted reputation labels match the quality of provided answers.

Download Full-text

Distribution-preserving data augmentation

PeerJ Computer Science ◽

10.7717/peerj-cs.571 ◽

2021 ◽

Vol 7 ◽

pp. e571

Author(s):

Nurdan Ayse Saran ◽

Murat Saran ◽

Fatih Nar

Keyword(s):

Data Augmentation ◽

Image Data ◽

Large Data ◽

Data Availability ◽

Superior Performance ◽

Color Distribution ◽

Spatial Transformations ◽

Wide Range ◽

Dataset Size ◽

Existing Data

In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels’ color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.

Download Full-text

Evaluating Word Similarity Measure of Embeddings Through Binary Classification

Journal of Computer Science Research ◽

10.30564/jcsr.v1i3.1268 ◽

2019 ◽

Vol 1 (3) ◽

Author(s):

A. Aziz Altowayan ◽

Lixin Tao

Keyword(s):

Similarity Measure ◽

Binary Classification ◽

General Purpose ◽

Feature Representation ◽

Entity Recognition ◽

Language Models ◽

Data Set ◽

Word Similarity ◽

Domain Specific ◽

Retrieval Rate

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.

Download Full-text

Modelling and Designing of IoT Systems Using UML Diagrams

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Integrating the Internet of Things Into Software Engineering Practices ◽

10.4018/978-1-5225-7790-4.ch003 ◽

2019 ◽

pp. 36-61

Author(s):

K. Sridhar Patnaik ◽

Itu Snigdh

Keyword(s):

Software Engineering ◽

Design Patterns ◽

Architectural Design ◽

Intelligent System ◽

Patient Health ◽

Systematic Development ◽

Uml Diagrams ◽

Robotic Devices ◽

Remote Patient

Despite the rapid growth in IoT research, a general principled software engineering approach for the systematic development of IoT systems and applications is still missing. Software engineering as a discipline provides the necessary platform to carry on the underlying design, coding, implementation, as well as maintenance of such systems. UML diagrams present a visually comprehensible outlay of the construction of IoT systems. The chapter covers the modelling of IoT systems using UML diagrams. Starting with the architectural design of any IoT system to behavioral aspects is covered in this chapter using a case study of IoT-based remote patient health monitoring system. The diagrams shown in this chapter are the sample diagrams for understanding IoT-based complex systems. The chapter focuses on the work carried out by Franco Zambonelli in context of developing abstract model of an IoT system using software engineering concepts. The chapter also focus on the pioneer work carried by J. F. Peters in intelligent system design patterns for robotic devices using pattern classification.

Download Full-text

On Using Grey Literature and Google Scholar in Systematic Literature Reviews in Software Engineering

IEEE Access ◽

10.1109/access.2020.2971712 ◽

2020 ◽

Vol 8 ◽

pp. 36226-36243 ◽

Cited By ~ 3

Author(s):

Affan Yasin ◽

Rubia Fatima ◽

Lijie Wen ◽

Wasif Afzal ◽

Muhammad Azhar ◽

...

Keyword(s):

Software Engineering ◽

Grey Literature ◽

Google Scholar ◽

Literature Reviews

Download Full-text

Social Structure Based Design Patterns for Agent-Oriented Software Engineering

Software Applications ◽

10.4018/978-1-60566-060-8.ch049 ◽

2009 ◽

pp. 773-796

Author(s):

Manuel Kolp ◽

Stéphane Faulkner ◽

Yves Wautelet

Keyword(s):

Software Engineering ◽

Design Patterns ◽

Autonomous Agents ◽

Knowledge Bases ◽

Multi Agent Systems ◽

Business Systems ◽

Agent Systems ◽

Social Patterns ◽

Agent Oriented Software Engineering ◽

Multi Agent

Multi-agent systems (MAS) architectures are gaining popularity over traditional ones for building open, distributed, and evolving software required by today’s corporate IT applications such as e-business systems, Web services, or enterprise knowledge bases. Since the fundamental concepts of multi-agent systems are social and intentional rather than object, functional, or implementationoriented, the design of MAS architectures can be eased by using social patterns. They are detailed agent-oriented design idioms to describe MAS architectures composed of autonomous agents that interact and coordinate to achieve their intentions, like actors in human organizations. This article presents social patterns and focuses on a framework aimed to gain insight into these patterns. The framework can be integrated into agent-oriented software engineering methodologies used to build MAS. We consider the Broker social pattern to illustrate the framework. An overview of the mapping from system architectural design (through organizational architectural styles), to system detailed design (through social patterns), is presented with a data integration case study. The automation of creating design patterns is also discussed.

Download Full-text

The Object-Oriented Design Knowledge

Object-Oriented Design Knowledge ◽

10.4018/978-1-59140-896-3.ch001 ◽

2011 ◽

pp. 1-7

Author(s):

Javier Garzas ◽

Mario Piattini

Keyword(s):

Software Engineering ◽

Best Practices ◽

Design Patterns ◽

Architectural Design ◽

Large Body ◽

Object Oriented ◽

Design Knowledge ◽

Object Oriented Design ◽

Body Of Knowledge

In order to establish itself as a branch of engineering, a profession must understand its accumulated knowledge. In this regard, software engineering has advanced greatly in recent years, but it still suffers from the lack of a structured classification of its knowledge. In this sense, in the field of object-oriented micro-architectural design designers have accumulated a large body of knowledge and it is still have not organized or unified. Therefore, items such as design patterns are the most popular example of accumulated knowledge, but other elements of knowledge exist such as principles, heuristics, best practices, bad smells, refactorings, and so on, which are not clearly differentiated; indeed, many are synonymous and others are just vague concepts.

Download Full-text

Automatic and Reliable Leaf Disease Detection Using Deep Learning Techniques

AgriEngineering ◽

10.3390/agriengineering3020020 ◽

2021 ◽

Vol 3 (2) ◽

pp. 294-312

Author(s):

Muhammad E. H. Chowdhury ◽

Tawsifur Rahman ◽

Amith Khandakar ◽

Mohamed Arselene Ayari ◽

Aftab Ullah Khan ◽

...

Keyword(s):

Deep Learning ◽

Binary Classification ◽

Experimental Studies ◽

Plant Diseases ◽

Superior Performance ◽

World Population ◽

Comparative Performance ◽

Learning Techniques ◽

Segmented Images ◽

Segmentation Models

Plants are a major source of food for the world population. Plant diseases contribute to production loss, which can be tackled with continuous monitoring. Manual plant disease monitoring is both laborious and error-prone. Early detection of plant diseases using computer vision and artificial intelligence (AI) can help to reduce the adverse effects of diseases and also overcome the shortcomings of continuous human monitoring. In this work, we propose the use of a deep learning architecture based on a recent convolutional neural network called EfficientNet on 18,161 plain and segmented tomato leaf images to classify tomato diseases. The performance of two segmentation models i.e., U-net and Modified U-net, for the segmentation of leaves is reported. The comparative performance of the models for binary classification (healthy and unhealthy leaves), six-class classification (healthy and various groups of diseased leaves), and ten-class classification (healthy and various types of unhealthy leaves) are also reported. The modified U-net segmentation model showed accuracy, IoU, and Dice score of 98.66%, 98.5%, and 98.73%, respectively, for the segmentation of leaf images. EfficientNet-B7 showed superior performance for the binary classification and six-class classification using segmented images with an accuracy of 99.95% and 99.12%, respectively. Finally, EfficientNet-B4 achieved an accuracy of 99.89% for ten-class classification using segmented images. It can be concluded that all the architectures performed better in classifying the diseases when trained with deeper networks on segmented images. The performance of each of the experimental studies reported in this work outperforms the existing literature.

Download Full-text

Expert guided natural language processing using one-class classification

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv010 ◽

2015 ◽

Vol 22 (5) ◽

pp. 962-966 ◽

Cited By ~ 5

Author(s):

Erel Joffe ◽

Emily J Pettigrew ◽

Jorge R Herskovic ◽

Charles F Bearden ◽

Elmer V Bernstam

Keyword(s):

Breast Cancer ◽

Language Processing ◽

Text Processing ◽

Binary Classification ◽

Model Performance ◽

Imbalanced Data ◽

Superior Performance ◽

Support Vector ◽

Free Text ◽

One Class Classification

Abstract Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing. Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text. Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%. Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model). Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.

Download Full-text

Virtual Parts Repository 2: Model-driven design of genetic regulatory circuits

10.1101/2021.04.11.439316 ◽

2021 ◽

Author(s):

Göksel Mısırlı ◽

Bill Yang ◽

Katherine James ◽

Anil Wipat

Keyword(s):

Design Patterns ◽

Computational Models ◽

Environmental Changes ◽

Computational Design ◽

Language Models ◽

Design Tools ◽

Hierarchical Systems ◽

Genetic Circuits ◽

Model Driven ◽

Regulatory Circuits

Engineering genetic regulatory circuits is key to the creation of biological applications that are responsive to environmental changes. Computational models can assist in understanding especially large and complex circuits where manual analysis is infeasible, permitting a model-driven design process. However, there are still few tools that offer the ability to simulate the system under design. One of the reasons for this is the lack of accessible model repositories or libraries that cater for the modular composition of models of synthetic systems that do not yet exist in nature. Here, we present the Virtual Parts Repository 2, a resource to facilitate the model-driven design of genetic regulatory circuits, which provides reusable, modular and composable models. The repository is service-oriented and can be utilized by design tools in computational workflows. Designs provided in Synthetic Biology Open Language documents are used to derive system-scale and hierarchical Systems Biology Markup Language models. We also present a rule-based modeling abstraction based on reaction networks to facilitate scalable and modular modeling of complex and large designs. This modeling abstraction incorporates design patterns such as roadblocking, distributed deployment of genetic circuits using plasmids and cellular resource dependency. The computational resources and the modeling abstraction presented in this paper allow computational design tools to take advantage of computational simulations and ultimately help facilitate more predictable applications.

Download Full-text

Software Engineering Paradigm Independent Design Problems, GoF 23 Design Patterns, and Aspect Design

Informatica ◽

10.15388/informatica.2011.328 ◽

2011 ◽

Vol 22 (2) ◽

pp. 289-317

Author(s):

Žilvinas Vaira ◽

Albertas Čaplinskas

Keyword(s):

Software Engineering ◽

Design Patterns ◽

Design Problems ◽

Independent Design

Download Full-text