Reading functional requirements using machine learning-based language processing

Purpose This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools. Design/methodology/approach The paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing. Findings Applications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable. Originality/value Most implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.

Download Full-text

Comparative Study of The Performance of Various Classifiers in Labeling Non-Functional Requirements

Information Technology And Control ◽

10.5755/j01.itc.48.3.21973 ◽

2019 ◽

Vol 48 (3) ◽

pp. 432-445 ◽

Cited By ~ 1

Author(s):

Laszlo Toth ◽

Laszlo Vidacs

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Software Engineering ◽

Natural Language ◽

Language Processing ◽

Text Processing ◽

Software Systems ◽

Functional Requirements ◽

Natural Languages ◽

System Analyst

Software systems are to be developed based on expectations of customers. These expectations are expressed using natural languages. To design a software meeting the needs of the customer and the stakeholders, the intentions, feedbacks and reviews are to be understood accurately and without ambiguity. These textual inputs often contain inaccuracies, contradictions and are seldom given in a well-structured form. The issues mentioned in the previous thought frequently result in the program not satisfying the expectation of the stakeholders. In particular, for non-functional requirements, clients rarely emphasize these specifications as much as they might be justified. Identifying, classifying and reconciling the requirements is one of the main duty of the System Analyst, which task, without using a proper tool, can be very demanding and time-consuming. Tools which support text processing are expected to improve the accuracy of identification and classification of requirements even in an unstructured set of inputs. System Analysts can use them also in document archeology tasks where many documents, regulations, standards, etc. have to be processed. Methods elaborated in natural language processing and machine learning offer a solid basis, however, their usability and the possibility to improve the performance utilizing the specific knowledge from the domain of the software engineering are to be examined thoroughly. In this paper, we present the results of our work adapting natural language processing and machine learning methods for handling and transforming textual inputs of software development. The major contribution of our work is providing a comparison of the performance and applicability of the state-of-the-art techniques used in natural language processing and machine learning in software engineering. Based on the results of our experiments, tools can be designed which can support System Analysts working on textual inputs.

Download Full-text

Automating Design Requirement Extraction From Text With Deep Learning

10.1115/detc2021-66898 ◽

2021 ◽

Author(s):

Haluk Akay ◽

Maria Yang ◽

Sang-Gook Kim

Keyword(s):

Machine Learning ◽

Engineering Design ◽

Language Processing ◽

Question Answering ◽

Microelectromechanical Systems ◽

Functional Requirements ◽

Analysis Task ◽

Text Passage ◽

Modern Engineering ◽

Design Documents

Abstract Nearly every artifact of the modern engineering design process is digitally recorded and stored, resulting in an overwhelming amount of raw data detailing past designs. Analyzing this design knowledge and extracting functional information from sets of digital documents is a difficult and time-consuming task for human designers. For the case of textual documentation, poorly written superfluous descriptions filled with jargon are especially challenging for junior designers with less domain expertise to read. If the task of reading documents to extract functional requirements could be automated, designers could actually benefit from the distillation of massive digital repositories of design documentation into valuable information that can inform engineering design. This paper presents a system for automating the extraction of structured functional requirements from textual design documents by applying state of the art Natural Language Processing (NLP) models. A recursive method utilizing Machine Learning-based question-answering is developed to process design texts by initially identifying the highest-level functional requirement, and subsequently extracting additional requirements contained in the text passage. The efficacy of this system is evaluated by comparing the Machine Learning-based results with a study of 75 human designers performing the same design document analysis task on technical texts from the field of Microelectromechanical Systems (MEMS). The prospect of deploying such a system on the sum of all digital engineering documents suggests a future where design failures are less likely to be repeated and past successes may be consistently used to forward innovation.

Download Full-text

NBIC and DTRA, An Interagency Partnership to Integrate Analyst Capabilities

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v9i1.7624 ◽

2017 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Wai-Ling Mui ◽

Edward P. Argenta ◽

Teresa Quitugua ◽

Christopher Kiley

Keyword(s):

Machine Learning ◽

Decision Making ◽

Language Processing ◽

Situational Awareness ◽

Virus Disease ◽

Health Surveillance ◽

Machine Learning Algorithms ◽

Joint Analysis ◽

Functional Requirements ◽

Private Industry

ObjectiveThe National Biosurveillance Integration Center (NBIC) andthe Defense Threat Reduction Agency’s Chemical and BiologicalTechnologies Department (DTRA J9 CB) have partnered to co-develop the Biosurveillance Ecosystem (BSVE), an emergingcapability that aims to provide a virtual, customizable analystworkbench that integrates health and non-health data. This partnershippromotes engagement between diverse health surveillance entities toincrease awareness and improve decision-making capabilities.IntroductionNBIC collects, analyzes, and shares key biosurveillanceinformation to support the nation’s response to biological events ofconcern. Integration of this information enables early warning andshared situational awareness to inform critical decision making, anddirect response and recovery efforts.DTRA J9 CB leads DoD S&T to anticipate, defend, and safeguardagainst chemical and biological threats for the warfighter and thenation.These agencies have partnered to meet the evolving needs of thebiosurveillance community and address gaps in technology and datasharing capabilities. High-profile events such as the 2009 H1N1pandemic, the West African Ebola outbreak, and the recent emergenceof Zika virus disease have underscored the need for integration ofdisparate biosurveillance systems to provide a more functionalinfrastructure. This allows analysts and others in the communityto collect, analyze, and share relevant data across organizationssecurely and efficiently. Leveraging existing biosurveillance effortsprovides the federal public health community, and its partners, witha comprehensive interagency platform that enables engagement anddata sharing.MethodsNBIC and DTRA are leveraging existing biosurveillance projectsto share data feeds, work processes, resources, and lessons learned.A multi-stakeholder Agile process was implemented to representthe interests of NBIC, DTRA, and their respective partners. Systemrequirements generated by both agencies were combined to form asingle backlog of prioritized needs. Functional requirements fromNBIC support the development of the prototype by refining systemcapabilities and providing an operational perspective. DTRA’stechnical expertise and research and development (R&D) portfolioensures robust analytic applications are embedded within a secure,scalable system architecture.Integration of analyst validated data from the NBIC Biofeedssystem serves as a gold-standard to improve analytic developmentin machine learning and natural language processing. Additionally,working groups are formed using NBIC and DTRA extendedpartnerships with academia and private industry to expand R&Dpossibilities. These expansions include leveraging existing ontologyefforts for improved system functionality and integrating social mediaalgorithms for improved topic analysis output.ResultsThe combined efforts of these two agencies to develop theBSVE and improve overall biosurveillance processes across thefederal government has enhanced understanding of the needs ofthe community in a variety of mission spaces. To date, co-creation ofproducts, joint analysis, and sharing of data feeds has become a majorpriority for both partners to advance biosurveillance outcomes. Withinthe larger efforts of system development, possible coordination withother agencies such as the Department of Veterans Affairs (VA) andthe US Geological Survey (USGS) could expand reach of the systemto ensure fulfillment of health surveillance requirements as a whole.ConclusionsThe NBIC and DTRA partnership has demonstrated value inimproving biosurveillance capabilities for each agency and theirpartners. BSVE will provide NBIC analysts with a collaborativetool that can leverage use of applications that visualize near real-time global epidemic and outbreak data from a range of unique andtrusted sources. The continued collaboration means ongoing accessto new data streams and analytic processes for all analysts, as wellas advanced machine learning algorithms that increase capabilitiesfor joint analysis, rapid product creation, and continuous interagencycommunication.

Download Full-text

Non-Functional Requirement Detection Using Machine Learning and Natural Language Processing

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1171 ◽

2021 ◽

Vol 12 (3) ◽

pp. 2224-2229

Author(s):

Hazlina Shariff Et.al

Keyword(s):

Machine Learning ◽

Language Processing ◽

Software Quality ◽

Software Engineer ◽

Quality Criteria ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Primary Concern ◽

Functional Requirements ◽

Prototype Tool

A key aspect of software quality is when the software has been operated functionally and meets user needs. A primary concern with non-functional requirements is that they always being neglected because their information is hidden in the documents. NFR is a tacit knowledge about the system and as a human, a user usually hardly know how to describe NFR. Hence, affect the NFR to be absent during the elicitation process. The software engineer has to act proactively to demand the software quality criteria from the user so the objective of requirements can be achieved. In order to overcome these problems, we use machine learning to detect the indicator term of NFR in textual requirements so we can remind the software engineer to elicit the missing NFR.We developed a prototype tool to support our approach to classify the textual requirements and using supervised machine learning algorithms. Survey wasdone toevaluate theeffectiveness of the prototype tool in detecting the NFR.

Download Full-text

Deep Learning Based High-Resolution Remote Sensing Image classification

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.384 ◽

2017 ◽

Vol 7 (10) ◽

pp. 22

Author(s):

Sumit Kaur

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Image Classification ◽

Language Processing ◽

Object Perception ◽

Remote Sensing Image ◽

Research Area ◽

Remote Sensing Image Classification ◽

Unsupervised Algorithms

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05

10.3115/1610230 ◽

2005 ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Feature Engineering

Download Full-text

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text