Supporting Biomimetic Design by Embedding Metadata in Natural-Language Corpora

Author(s):  
J. Ke ◽  
I. Chiu ◽  
J. S. Wallace ◽  
L. H. Shu

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge in retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes how inserting metadata such as part-of-speech, word sense and lexicographical data for each word in a natural-language source can help users identify relevant biological stimuli for biomimetic design. Although this research is still exploratory, initial qualitative observations demonstrate successful identification and separation of biological phenomena relevant to either desired functions or desired qualities. In addition, by incorporating the aforementioned metadata, we can automatically remove search results where search keywords act on abstract nouns or where keywords are used in irrelevant senses. The benefits of embedding metadata are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our method can be used to hide 64% of the search results that are unlikely to contain useful biological phenomena, reducing the effort to systematically identify relevant biological analogies.

Author(s):  
Ji Ke ◽  
J. S. Wallace ◽  
L. H. Shu

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.


2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Tao Feng ◽  
Hyunmin Cheong ◽  
L. H. Shu

The natural-language approach to identifying biological analogies exploits the existing format of much biological knowledge, beyond databases created for biomimetic design. However, designers may need to select analogies from search results, during which biases may exist toward: specific words in descriptions of biological phenomena, familiar organisms and scales, and strategies that match preconceived solutions. Therefore, we conducted two experiments to study the effect of abstraction on overcoming these biases and selecting biological phenomena based on analogical similarities. Abstraction in our experiments involved replacing biological nouns with hypernyms. The first experiment asked novice designers to choose between a phenomenon suggesting a highly useful strategy for solving a given problem, and another suggesting a less-useful strategy, but featuring bias elements. The second experiment asked novice designers to evaluate the relevance of two biological phenomena that suggest similarly useful strategies to solve a given problem. Neither experiment demonstrated the anticipated benefits of abstraction. Instead, our abstraction led to: (1) participants associating nonabstracted words to design problems and (2) increased difficulty in understanding descriptions of biological phenomena. We recommend investigating other ways to implement abstraction when developing similar tools or techniques that aim to support biomimetic design.


Author(s):  
Tao Feng ◽  
Hyunmin Cheong ◽  
L. H. Shu

The natural-language approach to identifying biological analogies exploits the existing format of much biological knowledge, beyond databases created for biomimetic design. However, designers may need to select analogies from search results, during which biases may exist towards: specific words in descriptions of biological phenomena, familiar organisms and scales, and strategies that match preconceived solutions. Therefore, we conducted two experiments to study the effect of abstraction on overcoming these biases and selecting biological phenomena based on analogical similarities. Abstraction in our experiments involved replacing biological nouns with hypernyms. The first experiment asked novice designers to choose between a phenomenon suggesting a highly useful strategy for solving a given problem, and another suggesting a less-useful strategy, but featuring bias elements. The second experiment asked novice designers to evaluate the relevance of two biological phenomena that suggest similarly useful strategies to solve a given problem. Neither experiment demonstrated the anticipated benefits of abstraction. Instead, our abstraction led to: 1) novice designers associating non-abstracted words to design problems and 2) increased difficulty in understanding descriptions of biological phenomena. We recommend investigating other ways in which abstraction can be implemented when designing similar tools or techniques that aim to support biomimetic design and other design-by-analogy work.


Author(s):  
Marina Sokolova ◽  
Stan Szpakowicz

This chapter presents applications of machine learning techniques to traditional problems in natural language processing, including part-of-speech tagging, entity recognition and word-sense disambiguation. People usually solve such problems without difficulty or at least do a very good job. Linguistics may suggest labour-intensive ways of manually constructing rule-based systems. It is, however, the easy availability of large collections of texts that has made machine learning a method of choice for processing volumes of data well above the human capacity. One of the main purposes of text processing is all manner of information extraction and knowledge extraction from such large text. Machine learning methods discussed in this chapter have stimulated wide-ranging research in natural language processing and helped build applications with serious deployment potential.


Author(s):  
Hyunmin Cheong ◽  
L. H. Shu

Identifying relevant analogies from biology is a significant challenge in biomimetic design. Our natural-language approach addresses this challenge by developing techniques to search biological information in natural-language format, such as books or papers. This paper presents the application of natural-language processing techniques, such as part-of-speech tags, typed-dependency parsing, and syntactic patterns, to automatically extract and categorize causally related functions from text with biological information. Causally related functions, which specify how one action is enabled by another action, are considered important for both knowledge representation used to model biological information and analogical transfer of biological information performed by designers. An extraction algorithm was developed and scored F-measures of 0.78–0.85 in an initial development test. Because this research approach uses inexpensive and domain-independent techniques, the extraction algorithm has the potential to automatically identify patterns of causally related functions from a large amount of text that contains either biological or design information.


Author(s):  
L.H. Shu

AbstractThis paper summarizes various aspects of identifying and applying biological analogies in engineering design using a natural-language approach. To avoid the immense as well as potentially biased task of creating a biological database specifically for engineering design, the chosen approach searches biological knowledge in natural-language format, such as books and papers, for instances of keywords describing the engineering problem. Strategies developed to facilitate this search are identified, and how text descriptions of biological phenomena are used in problem solving is summarized. Several application case studies are reported to illustrate the approach. The value of the natural-language approach is demonstrated by its ability to identify relevant biological analogies that are not limited to those entered into a database specifically for engineering design.


2021 ◽  
Vol 11 (23) ◽  
pp. 11119
Author(s):  
Van-Hai Vu ◽  
Quang-Phuoc Nguyen ◽  
Ebipatei Victoria Tunyan ◽  
Cheol-Young Ock

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-the-art context-based embedding model introduced by Google, bidirectional encoder representations for transformers (BERT), has begun to appear in the neural MT (NMT) models in different ways to enhance the accuracy of MT systems. The BERT model for Vietnamese has been developed and significantly improved in natural language processing (NLP) tasks, such as part-of-speech (POS), named-entity recognition, dependency parsing, and natural language inference. Our research experimented with applying the Vietnamese BERT model to provide POS tagging and morphological analysis (MA) for Vietnamese sentences,, and applying word-sense disambiguation (WSD) for Korean sentences in our Vietnamese–Korean bilingual corpus. In the Vietnamese–Korean NMT system, with contextual embedding, the BERT model for Vietnamese is concurrently connected to both encoder layers and decoder layers in the NMT model. Experimental results assessed through BLEU, METEOR, and TER metrics show that contextual embedding significantly improves the quality of Vietnamese–Korean NMT.


Author(s):  
IVEY CHIU ◽  
L.H. SHU

Biomimetic, or biologically inspired, design uses analogous biological phenomena to develop solutions for engineering problems. Several instances of biomimetic design result from personal observations of biological phenomena. However, many engineers' knowledge of biology may be limited, thus reducing the potential of biologically inspired solutions. Our approach to biomimetic design takes advantage of the large amount of biological knowledge already available in books, journals, and so forth, by performing keyword searches on these existing natural-language sources. Because of the ambiguity and imprecision of natural language, challenges inherent to natural language processing were encountered. One challenge of retrieving relevant cross-domain information involves differences in domain vocabularies, or lexicons. A keyword meaningful to biologists may not occur to engineers. For an example problem that involved cleaning, that is, removing dirt, a biochemist suggested the keyword “defend.” Defend is not an obvious keyword to most engineers for this problem, nor are the words defend and “clean/remove” directly related within lexical references. However, previous work showed that biological phenomena retrieved by the keyword defend provided useful stimuli and produced successful concepts for the clean/remove problem. In this paper, we describe a method to systematically bridge the disparate biology and engineering domains using natural language analysis. For the clean/remove example, we were able to algorithmically generate several biologically meaningful keywords, including defend, that are not obviously related to the engineering problem. We developed a method to organize and rank the set of biologically meaningful keywords identified, and confirmed that we could achieve similar results for two other examples in encapsulation and microassembly. Although we specifically address cross-domain information retrieval from biology, the bridging process presented in this paper is not limited to biology, and can be used for any other domain given the availability of appropriate domain-specific knowledge sources and references.


2021 ◽  
Vol 8 (5) ◽  
pp. 1039
Author(s):  
Ilham Firmansyah ◽  
Putra Pandu Adikara ◽  
Sigit Adinugroho

<p class="Abstrak">Bahasa manusia adalah bahasa yang digunakan oleh manusia dalam bentuk tulisan maupun suara. Banyak teknologi/aplikasi yang mengolah bahasa manusia, bidang tersebut bernama <em>Natural Language Processing </em>yang merupakan ilmu yang mempelajari untuk mengolah dan mengekstraksi bahasa manusia pada perkembangan teknologi. Salah satu proses pada <em>Natural Language Processing </em>adalah <em>Part-Of-Speech Tagging</em>. <em>Part-Of-Speech Tagging </em>adalah klasifikasi kelas kata pada sebuah kalimat secara otomatis oleh teknologi, proses ini salah satunya berfungsi untuk mengetahui kata-kata yang memiliki lebih dari satu makna/arti (ambiguitas). <em>Part-Of-Speech Tagging</em> merupakan dasar dari <em>Natural Language Processing</em> lainnya, seperti penerjemahan mesin (<em>machine translation</em>), penghilangan ambiguitas makna kata (<em>word sense disambiguation</em>), dan analisis sentimen. <em>Part-Of-Speech Tagging</em> dilakukan pada bahasa manusia, salah satunya adalah bahasa Madura. Bahasa Madura adalah bahasa daerah yang digunakan oleh suku Madura dan memiliki morfologi yang mirip dengan bahasa Indonesia. Penelitian pada <em>Part-Of-Speech Tagging </em>pada bahasa Madura ini menggunakan algoritme Viterbi, terdapat 3 proses untuk implementasi algoritme Viterbi pada pada <em>Part-Of-Speech Tagging</em> bahasa Madura, yaitu <em>pre-processing </em>pada data<em> training </em>dan <em>testing</em>, perhitungan data latih dengan <em>Hidden Markov Model </em>dan klasifikasi kelas kata menggunakan algoritme Viterbi. Kelas kata (<em>tagset</em>) yang digunakan untuk klasifikasi kata pada bahasa Madura sebanyak 19 kelas, kelas kata tersebut dirancang oleh pakar. Pengujian sistem pada penelitian ini menggunakan perhitungan <em>Multiclass Confusion Matrix</em>. Hasil pengujian sistem mendapatkan nilai <em>micro average</em> <em>accuracy </em>sebesar 0,96 dan nilai <em>micro average</em> <em>precision </em>dan <em>recall </em>yang sama sebesar 0,68. <em>Precision</em> dan <em>recall</em> masih dapat ditingkatkan dengan menambahkan data yang lebih banyak lagi untuk pelatihan.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>Natural language is a form of language used by human, either in writing or speaking form. There is a specific field in computer science that processes natural language, which is called Natural Language Processing. It is a study of how to process and extract natural language on technology development. Part-Of-Speech Tagging is a method to assign a predefined set of tags (word classes) into a word or a phrase. This process is useful to understand the true meaning of a word with ambiguous meaning, which may have different meanings depending on the context. Part-Of-Speech Tagging is the basis of the other Natural Language Processing methods, such as machine translation, word sense disambiguation, and sentiment analysis. Part-Of-Speech Tagging used in natural languages, such as Madurese language. Madurese language is a local language used by Madurese and has a similar morphology as Indonesian language. Part-Of-Speech Tagging research on Madurese language using Viterbi algorithm, consists of 3 processes, which are training and testing corpus pre-processing, training the corpus by Hidden Markov Model, and tag classification using Viterbi algorithm. The number of tags used for words classification (tagsets) on Madurese language are 19 class, those tags were designed by an expert. Performance assessment was conducted using Multiclass Confusion Matrix calculation. The system achieved a micro average accuracy score of 0,96, and micro average precision score is equal to recall of 0,68. Precision and recall can still be improved by adding more data for training.</em></p><p class="Abstrak"><em><strong><br /></strong></em></p>


Sign in / Sign up

Export Citation Format

Share Document