Reading functional requirements using machine learning-based language processing

CIRP Annals ◽  
2021 ◽  
Author(s):  
Haluk Akay ◽  
Sang-Gook Kim
2020 ◽  
Vol 30 (2) ◽  
pp. 155-174
Author(s):  
Tim Hutchinson

Purpose This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools. Design/methodology/approach The paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing. Findings Applications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable. Originality/value Most implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.


2019 ◽  
Vol 48 (3) ◽  
pp. 432-445 ◽  
Author(s):  
Laszlo Toth ◽  
Laszlo Vidacs

Software systems are to be developed based on expectations of customers. These expectations are expressed using natural languages. To design a software meeting the needs of the customer and the stakeholders, the intentions, feedbacks and reviews are to be understood accurately and without ambiguity. These textual inputs often contain inaccuracies, contradictions and are seldom given in a well-structured form. The issues mentioned in the previous thought frequently result in the program not satisfying the expectation of the stakeholders. In particular, for non-functional requirements, clients rarely emphasize these specifications as much as they might be justified. Identifying, classifying and reconciling the requirements is one of the main duty of the System Analyst, which task, without using a proper tool, can be very demanding and time-consuming. Tools which support text processing are expected to improve the accuracy of identification and classification of requirements even in an unstructured set of inputs. System Analysts can use them also in document archeology tasks where many documents, regulations, standards, etc. have to be processed. Methods elaborated in natural language processing and machine learning offer a solid basis, however, their usability and the possibility to improve the performance utilizing the specific knowledge from the domain of the software engineering are to be examined thoroughly. In this paper, we present the results of our work adapting natural language processing and machine learning methods for handling and transforming textual inputs of software development. The major contribution of our work is providing a comparison of the performance and applicability of the state-of-the-art techniques used in natural language processing and machine learning in software engineering. Based on the results of our experiments, tools can be designed which can support System Analysts working on textual inputs.


2021 ◽  
Author(s):  
Haluk Akay ◽  
Maria Yang ◽  
Sang-Gook Kim

Abstract Nearly every artifact of the modern engineering design process is digitally recorded and stored, resulting in an overwhelming amount of raw data detailing past designs. Analyzing this design knowledge and extracting functional information from sets of digital documents is a difficult and time-consuming task for human designers. For the case of textual documentation, poorly written superfluous descriptions filled with jargon are especially challenging for junior designers with less domain expertise to read. If the task of reading documents to extract functional requirements could be automated, designers could actually benefit from the distillation of massive digital repositories of design documentation into valuable information that can inform engineering design. This paper presents a system for automating the extraction of structured functional requirements from textual design documents by applying state of the art Natural Language Processing (NLP) models. A recursive method utilizing Machine Learning-based question-answering is developed to process design texts by initially identifying the highest-level functional requirement, and subsequently extracting additional requirements contained in the text passage. The efficacy of this system is evaluated by comparing the Machine Learning-based results with a study of 75 human designers performing the same design document analysis task on technical texts from the field of Microelectromechanical Systems (MEMS). The prospect of deploying such a system on the sum of all digital engineering documents suggests a future where design failures are less likely to be repeated and past successes may be consistently used to forward innovation.


Author(s):  
Wai-Ling Mui ◽  
Edward P. Argenta ◽  
Teresa Quitugua ◽  
Christopher Kiley

ObjectiveThe National Biosurveillance Integration Center (NBIC) andthe Defense Threat Reduction Agency’s Chemical and BiologicalTechnologies Department (DTRA J9 CB) have partnered to co-develop the Biosurveillance Ecosystem (BSVE), an emergingcapability that aims to provide a virtual, customizable analystworkbench that integrates health and non-health data. This partnershippromotes engagement between diverse health surveillance entities toincrease awareness and improve decision-making capabilities.IntroductionNBIC collects, analyzes, and shares key biosurveillanceinformation to support the nation’s response to biological events ofconcern. Integration of this information enables early warning andshared situational awareness to inform critical decision making, anddirect response and recovery efforts.DTRA J9 CB leads DoD S&T to anticipate, defend, and safeguardagainst chemical and biological threats for the warfighter and thenation.These agencies have partnered to meet the evolving needs of thebiosurveillance community and address gaps in technology and datasharing capabilities. High-profile events such as the 2009 H1N1pandemic, the West African Ebola outbreak, and the recent emergenceof Zika virus disease have underscored the need for integration ofdisparate biosurveillance systems to provide a more functionalinfrastructure. This allows analysts and others in the communityto collect, analyze, and share relevant data across organizationssecurely and efficiently. Leveraging existing biosurveillance effortsprovides the federal public health community, and its partners, witha comprehensive interagency platform that enables engagement anddata sharing.MethodsNBIC and DTRA are leveraging existing biosurveillance projectsto share data feeds, work processes, resources, and lessons learned.A multi-stakeholder Agile process was implemented to representthe interests of NBIC, DTRA, and their respective partners. Systemrequirements generated by both agencies were combined to form asingle backlog of prioritized needs. Functional requirements fromNBIC support the development of the prototype by refining systemcapabilities and providing an operational perspective. DTRA’stechnical expertise and research and development (R&D) portfolioensures robust analytic applications are embedded within a secure,scalable system architecture.Integration of analyst validated data from the NBIC Biofeedssystem serves as a gold-standard to improve analytic developmentin machine learning and natural language processing. Additionally,working groups are formed using NBIC and DTRA extendedpartnerships with academia and private industry to expand R&Dpossibilities. These expansions include leveraging existing ontologyefforts for improved system functionality and integrating social mediaalgorithms for improved topic analysis output.ResultsThe combined efforts of these two agencies to develop theBSVE and improve overall biosurveillance processes across thefederal government has enhanced understanding of the needs ofthe community in a variety of mission spaces. To date, co-creation ofproducts, joint analysis, and sharing of data feeds has become a majorpriority for both partners to advance biosurveillance outcomes. Withinthe larger efforts of system development, possible coordination withother agencies such as the Department of Veterans Affairs (VA) andthe US Geological Survey (USGS) could expand reach of the systemto ensure fulfillment of health surveillance requirements as a whole.ConclusionsThe NBIC and DTRA partnership has demonstrated value inimproving biosurveillance capabilities for each agency and theirpartners. BSVE will provide NBIC analysts with a collaborativetool that can leverage use of applications that visualize near real-time global epidemic and outbreak data from a range of unique andtrusted sources. The continued collaboration means ongoing accessto new data streams and analytic processes for all analysts, as wellas advanced machine learning algorithms that increase capabilitiesfor joint analysis, rapid product creation, and continuous interagencycommunication.


Author(s):  
Hazlina Shariff Et.al

A key aspect of software quality is when the software has been operated functionally and meets user needs. A primary concern with non-functional requirements is that they always being neglected because their information is hidden in the documents. NFR is a tacit knowledge about the system and as a human, a user usually hardly know how to describe NFR. Hence, affect the NFR to be absent during the elicitation process. The software engineer has to act proactively to demand the software quality criteria from the user so the objective of requirements can be achieved. In order to overcome these problems, we use machine learning to detect the indicator term of NFR in textual requirements so we can remind the software engineer to elicit the missing NFR.We developed a prototype tool to support our approach to classify the textual requirements and using supervised machine learning algorithms. Survey wasdone toevaluate theeffectiveness of the prototype tool in detecting the NFR.


Author(s):  
Sumit Kaur

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification. 


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Author(s):  
Rohan Pandey ◽  
Vaibhav Gautam ◽  
Ridam Pal ◽  
Harsh Bandhey ◽  
Lovedeep Singh Dhingra ◽  
...  

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable


Author(s):  
Timnit Gebru

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.


Sign in / Sign up

Export Citation Format

Share Document