KoRASA: Pipeline Optimization for Open-Source Korean Natural Language Understanding Framework Based on Deep Learning

Since the emergence of deep learning-based chatbots for knowledge services, numerous research and development projects have been conducted in various industries. A high demand for chatbots has drastically increased the global market size; however, the limited functional scalability of open-domain chatbots is a challenge to their application to industries. Moreover, as most chatbot frameworks employ English, it is necessary to create chatbots customized for other languages. To address this problem, this paper proposes KoRASA as a pipeline-optimization method, which uses a deep learning-based open-source chatbot framework to understand the Korean language. KoRASA is a closed-domain chatbot that is applicable across a wide range of industries in Korea. KoRASA’s operation consists of four stages: tokenization, featurization, intent classification, and entity extraction. The accuracy and F1-score of KoRASA were measured based on datasets taken from common tasks carried out in most industrial fields. The algorithm for intent classification and entity extraction was optimized. The accuracy and F1-score were 98.2% and 98.4% for intent classification and 97.4% and 94.7% for entity extraction, respectively. Furthermore, these results are better than those achieved by existing models. Accordingly, KoRASA can be applied to various industries, including mobile services based on closed-domain chatbots using Korean, robotic process automation (RPA), edge computing, and Internet of Energy (IoE) services.

Download Full-text

On Empirically Examining The Effectiveness Of Deep Learning-Based Bug Localization Models

10.32920/ryerson.14648622.v1 ◽

2021 ◽

Author(s):

Sravya Sravya ◽

Andriy Miranskyy ◽

Ayse Bener

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Open Source ◽

State Of The Art ◽

Bug Localization ◽

Baseline Model ◽

Software Developer ◽

The Past ◽

Software Bug ◽

Better Than

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.

Download Full-text

On Empirically Examining The Effectiveness Of Deep Learning-Based Bug Localization Models

10.32920/ryerson.14648622 ◽

2021 ◽

Author(s):

Sravya Sravya ◽

Andriy Miranskyy ◽

Ayse Bener

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Open Source ◽

State Of The Art ◽

Bug Localization ◽

Baseline Model ◽

Software Developer ◽

The Past ◽

Software Bug ◽

Better Than

Download Full-text

Research of the activity and viability of spermatozoa at different concentrations and proportions of diluents after incubation

Pig breeding the interdepartmental subject scientific digest ◽

10.37143/0371-4365-2020-74-11 ◽

2020 ◽

pp. 88-95

Author(s):

Svitlana Lobchenko ◽

Tetiana Husar ◽

Viktor Lobchenko

Keyword(s):

Plasma Concentration ◽

Incubation Time ◽

Coefficient Of Variation ◽

Incubation Medium ◽

Wide Range ◽

Series Of Experiments ◽

The Subject ◽

Boar Spermatozoa ◽

Native Plasma ◽

Better Than

The results of studies of the viability of spermatozoa with different incubation time at different concentrations and using different diluents are highlighted in the article. (Un) concentrated spermatozoa were diluented: 1) with their native plasma; 2) medium 199; 3) a mixture of equal volumes of plasma and medium 199. The experiment was designed to generate experimental samples with spermatozoa concentrations prepared according to the method, namely: 0.2; 0.1; 0.05; 0.025 billion / ml. The sperm was evaluated after 2, 4, 6 and 8 hours. The perspective of such a study is significant and makes it possible to research various aspects of the subject in a wide range. In this regard, a series of experiments were conducted in this area. The data obtained are statistically processed and allow us to highlight the results that relate to each stage of the study. In particular, in this article it was found out some regularities between the viability of sperm, the type of diluent and the rate of rarefaction, as evidenced by the data presented in the tables. As a result of sperm incubation, the viability of spermatozoa remains at least the highest trend when sperm are diluted to a concentration of 0.1 billion / ml, regardless of the type of diluent used. To maintain the viability of sperm using this concentration of medium 199 is not better than its native plasma, and its mixture with an equal volume of plasma through any length of time incubation of such sperm. Most often it is at this concentration of sperm that their viability is characterized by the lowest coefficient of variation, regardless of the type of diluent used, which may indicate the greatest stability of the result under these conditions. The viability of spermatozoa with a concentration of 0.1 billion / ml is statistically significantly reduced only after 6 or even 8 hours of incubation. If the sperm are incubated for only 2 hours, regardless of the type of diluent used, the sperm concentrations tested do not affect the viability of the sperm. Key words: boar, spermatozoa, sperm plasma, concentration, incubation, medium 199, activity, viability, rarefaction.

Download Full-text

A Comparison of Deep Learning Methods for Language Understanding

10.21437/interspeech.2019-1262 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mandy Korpusik ◽

Zoe Liu ◽

James Glass

Keyword(s):

Deep Learning ◽

Language Understanding ◽

Learning Methods

Download Full-text

Effect of Contact Ratio on Spur Gear Dynamic Load With No Tooth Profile Modifications

Journal of Mechanical Design ◽

10.1115/1.2826905 ◽

1996 ◽

Vol 118 (3) ◽

pp. 439-443 ◽

Cited By ~ 34

Author(s):

Chuen-Huei Liou ◽

Hsiang Hsi Lin ◽

F. B. Oswald ◽

D. P. Townsend

Keyword(s):

Dynamic Load ◽

Spur Gear ◽

Tooth Size ◽

Contact Ratio ◽

Gear Dynamics ◽

Wide Range ◽

Gear Contact ◽

Center Distance ◽

Selection Of ◽

Better Than

This paper presents a computer simulation showing how the gear contact ratio affects the dynamic load on a spur gear transmission. The contact ratio can be affected by the tooth addendum, the pressure angle, the tooth size (diametral pitch), and the center distance. The analysis presented in this paper was performed by using the NASA gear dynamics code DANST. In the analysis, the contact ratio was varied over the range 1.20 to 2.40 by changing the length of the tooth addendum. In order to simplify the analysis, other parameters related to contact ratio were held constant. The contact ratio was found to have a significant influence on gear dynamics. Over a wide range of operating speeds, a contact ratio close to 2.0 minimized dynamic load. For low-contact-ratio gears (contact ratio less than two), increasing the contact ratio reduced gear dynamic load. For high-contact-ratio gears (contact ratio equal to or greater than 2.0), the selection of contact ratio should take into consideration the intended operating speeds. In general, high-contact-ratio gears minimized dynamic load better than low-contact-ratio gears.

Download Full-text

Heat Exchanger Network Retrofit of an Oleochemical Plant through a Cost and Energy Efficiency Approach

ChemEngineering ◽

10.3390/chemengineering5020017 ◽

2021 ◽

Vol 5 (2) ◽

pp. 17

Author(s):

Valli Trisha ◽

Kai Seng Koh ◽

Lik Yin Ng ◽

Vui Soon Chok

Keyword(s):

Energy Consumption ◽

Heat Exchanger ◽

Cost Effective ◽

Capital Cost ◽

Global Market ◽

Annual Saving ◽

Heat Exchanger Network ◽

First Case ◽

Payback Time ◽

Better Than

Limited research of heat integration has been conducted in the oleochemical field. This paper attempts to evaluate the performance of an existing heat exchanger network (HEN) of an oleochemical plant at 600 tonnes per day (TPD) in Malaysia, in which the emphases are placed on the annual saving and reduction in energy consumption. Using commercial HEN numerical software, ASPEN Energy Analyzer v10.0, it was found that the performance of the current HEN in place is excellent, saving over 80% in annual costs and reducing energy consumption by 1,882,711 gigajoule per year (GJ/year). Further analysis of the performance of the HEN was performed to identify the potential optimisation of untapped heating/cooling process streams. Two cases, which are the most cost-effective and energy efficient, were proposed with positive results. However, the second case performed better than the first case, at a lower payback time (0.83 year) and higher annual savings (0.20 million USD/year) with the addition of one heat exchanger at a capital cost of USD 134,620. The first case had a higher payback time (4.64 years), a lower annual saving (0.05 million USD/year) and three additional heaters at a capital cost of USD 193,480. This research has provided a new insight into the oleochemical industry in which retrofitting the HEN can further reduce energy consumption, which in return will reduce the overall production cost of oleochemical commodities. This is particularly crucial in making the product more competitive in its pricing in the global market.

Download Full-text

Deep learning for intelligent diagnosis in thyroid scintigraphy

Journal of International Medical Research ◽

10.1177/0300060520982842 ◽

2021 ◽

Vol 49 (1) ◽

pp. 030006052098284

Author(s):

Tingting Qiao ◽

Simin Liu ◽

Zhijun Cui ◽

Xiaqing Yu ◽

Haidong Cai ◽

...

Keyword(s):

Deep Learning ◽

Disease Diagnosis ◽

Kappa Coefficient ◽

Diagnostic Assessment ◽

First Year ◽

Thyroid Scintigraphy ◽

Diagnostic Ability ◽

The Third ◽

Classification Time ◽

Better Than

Objective To construct deep learning (DL) models to improve the accuracy and efficiency of thyroid disease diagnosis by thyroid scintigraphy. Methods We constructed DL models with AlexNet, VGGNet, and ResNet. The models were trained separately with transfer learning. We measured each model’s performance with six indicators: recall, precision, negative predictive value (NPV), specificity, accuracy, and F1-score. We also compared the diagnostic performances of first- and third-year nuclear medicine (NM) residents with assistance from the best-performing DL-based model. The Kappa coefficient and average classification time of each model were compared with those of two NM residents. Results The recall, precision, NPV, specificity, accuracy, and F1-score of the three models ranged from 73.33% to 97.00%. The Kappa coefficient of all three models was >0.710. All models performed better than the first-year NM resident but not as well as the third-year NM resident in terms of diagnostic ability. However, the ResNet model provided “diagnostic assistance” to the NM residents. The models provided results at speeds 400 to 600 times faster than the NM residents. Conclusion DL-based models perform well in diagnostic assessment by thyroid scintigraphy. These models may serve as tools for NM residents in the diagnosis of Graves’ disease and subacute thyroiditis.

Download Full-text

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text

Spectroscopic and deep learning-based approaches to identify and quantify cerebral microhemorrhages

Scientific Reports ◽

10.1038/s41598-021-88236-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Christian Crouzet ◽

Gwangjin Jeong ◽

Rachel H. Chae ◽

Krystal T. LoPresti ◽

Cody E. Dunn ◽

...

Keyword(s):

Deep Learning ◽

Prussian Blue ◽

Processing Speed ◽

Digital Pathology ◽

Ground Truth ◽

Individual Variability ◽

Rgb Images ◽

Cerebral Microhemorrhages ◽

Phasor Analysis ◽

Better Than

AbstractCerebral microhemorrhages (CMHs) are associated with cerebrovascular disease, cognitive impairment, and normal aging. One method to study CMHs is to analyze histological sections (5–40 μm) stained with Prussian blue. Currently, users manually and subjectively identify and quantify Prussian blue-stained regions of interest, which is prone to inter-individual variability and can lead to significant delays in data analysis. To improve this labor-intensive process, we developed and compared three digital pathology approaches to identify and quantify CMHs from Prussian blue-stained brain sections: (1) ratiometric analysis of RGB pixel values, (2) phasor analysis of RGB images, and (3) deep learning using a mask region-based convolutional neural network. We applied these approaches to a preclinical mouse model of inflammation-induced CMHs. One-hundred CMHs were imaged using a 20 × objective and RGB color camera. To determine the ground truth, four users independently annotated Prussian blue-labeled CMHs. The deep learning and ratiometric approaches performed better than the phasor analysis approach compared to the ground truth. The deep learning approach had the most precision of the three methods. The ratiometric approach has the most versatility and maintained accuracy, albeit with less precision. Our data suggest that implementing these methods to analyze CMH images can drastically increase the processing speed while maintaining precision and accuracy.

Download Full-text