related text
Recently Published Documents


TOTAL DOCUMENTS

107
(FIVE YEARS 31)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Elijah Pelofske ◽  
Lorie M. Liebrock ◽  
Vincent Urias

In this research, we use user defined labels from three internet text sources (Reddit, Stackexchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural text. We analyze the false positive and false negative rates of each of the 21 model’s in a cross validation experiment. Then we present a Cybersecurity Topic Classification (CTC) tool, which takes the majority vote of the 21 trained machine learning models as the decision mechanism for detecting cybersecurity related text. We also show that the majority vote mechanism of the CTC tool provides lower false negative and false positive rates on average than any of the 21 individual models. We show that the CTC tool is scalable to the hundreds of thousands of documents with a wall clock time on the order of hours.


2021 ◽  
Author(s):  
Masataka Nakayama ◽  
Hatanaka Chihiro ◽  
Hisae Konakawa ◽  
Yuka Suzuki ◽  
Alethea Hui Qin Koh ◽  
...  

Chat-based counselling has become increasingly popular in the era of telecommunication. The need for accessible therapy has been exacerbated by the COVID-19 pandemic. Given its text-based nature, chat-based counselling provides an opportunity for machine-based analysis. It even has the potential to provide machine-based counselling services. However, the informational resources for machine-based analysis and interaction are rather scarce especially in a Japanese-language context. We created a Japanese dictionary for sentiment analysis, using a technique via machine-based text analysis, tailored for counselling related text. It includes 2389 words that were frequently used in chat-based counselling corpora. The following attributes were included for each word: (1) valence rating by the general public, (2) valence rating by clinical psychologists, (3) emotionality, and (4) body-relatedness.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Javier Pérez-Guerra

Abstract Although Verb-Object (VO) is the basic unmarked constituent order of predicates in Present-Day English, in earlier stages of the language Object-Verb (OV) is the preferred pattern in some syntactic contexts. OV predicates are significantly frequent in Old and Middle English, and are still attested up to 1550, when they “appear to dwindle away” (Moerenhout & van der Wurff 2005: 83). This study looks at OV in Early Modern English (EModE), using a corpus-based perspective and statistical modelling to explore a number of textual, syntactic, and semantic/processing variables which may account for what by that time had already become a marked, though not yet archaic, word-order pattern. The data for the study were retrieved from the Penn-Helsinki Parsed Corpus of Early Modern English (1500–1710) and the Parsed Corpus of Early English Correspondence (c.1410–1695), the largest electronic parsed collections of EModE texts. The findings reveal a preference for OV in speech-related text types, which are less constrained by the rules of grammar, in marked syntactic contexts, and in configurations not subject to the general linearisation principles of end-weight and given-new. Where these principles are complied with, the probability of VO increases.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Marcello Mariani ◽  
Matteo Borghi

Purpose This paper aims to analyze if and to what extent mechanical artificial intelligence (AI)-embedded in hotel service robots-influences customers’ evaluation of AI-enabled hotel service interactions. This study deploys online reviews (ORs) analytics to understand if the presence of mechanical AI-related text in ORs influences customers’ OR valence across 19 leading international hotels that have integrated mechanical AI – in the guise of service robots – into their operations. Design/methodology/approach First, the authors identified the 19 leading hotels across three continents that have pioneered the adoption of service robots. Second, by deploying big data techniques, the authors gathered the entire population of ORs hosted on TripAdvisor (almost 50,000 ORs) and generated OR analytics. Subsequently, the authors used ordered logistic regressions analyses to understand if and to what extent AI-enabled hospitality service interactions are evaluated by service customers. Findings The presence of mechanical AI-related text (text related to service robots) in ORs influences positively electronic word-of-mouth (e-WOM) valence. Hotel guests writing ORs explicitly mentioning their interactions with the service robots are more prone to associate high online ratings to their ORs. The presence of the robot’s proper name (e.g., Alina, Wally) in the OR moderates positively the positive effect of mechanical AI-related text on ORs ratings. Research limitations/implications Hospitality practitioners should evaluate the possibility to introduce service robots into their operations and develop tailored strategies to name their robots (such as using human-like and short names). Moreover, hotel managers should communicate more explicitly their initiatives and investments in AI, monitor AI-related e-WOM and invest in educating their non-tech-savvy customers to understand and appreciate AI technology. Platform developers might create a robotic tag to be attached to ORs mentioning service robots to signal the presence of this specific element and might design and develop an additional service attribute that might be tentatively named “service robots.” Originality/value The current study represents the first attempt to understand if and to what extent mechanical AI in the guise of hotel service robots influences customers’ evaluation of AI-enabled hospitality service interactions.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1307
Author(s):  
Haoriqin Wang ◽  
Huaji Zhu ◽  
Huarui Wu ◽  
Xiaomin Wang ◽  
Xiao Han ◽  
...  

In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Document Frequency) method to process and analyze the text data and compare it with the Word2vec, GloVe, and TF-IDF methods. Combined with the agricultural word segmentation dictionary, we applied Word2vec with the TF-IDF method, effectively solving the problem of high dimension and sparse data in the rice-related text. Each network layer employed the connection information of features and all previous recursive layers’ hidden features. To alleviate the problem of feature vector size increasing due to dense splicing, an autoencoder was used after dense concatenation. The experimental results show that rice-related question similarity matching based on Coattention-DenseGRU can improve the utilization of text features, reduce the loss of features, and achieve fast and accurate similarity matching of the rice-related question dataset. The precision and F1 values of the proposed model were 96.3% and 96.9%, respectively. Compared with seven other kinds of question similarity matching models, we present a new state-of-the-art method with our rice-related question dataset.


2021 ◽  
Author(s):  
Rishit Dagli ◽  
Ali Mustufa Shaikh ◽  
Hussain Mahdi ◽  
Sameer Nanivadekar

In this paper, we focus on creating a keywords extractor especially for a given job description job-related text corpus for better search engine optimization using attention based deep learning techniques. Millions of jobs are posted but most of them end up not being located due to improper SEO and keyword management. We aim to make this as easy to use as possible and allow us to use this for a large number of job descriptions very easily. We also make use of these algorithms to screen or get insights from large number of resumes, summarize and create keywords for a general piece of text or scientific articles. We also investigate the modeling power of BERT (Bidirectional Encoder Representations from Transformers) for the task of keyword extraction from job descriptions. We further validate our results by providing a fully-functional API and testing out the model with real-time job descriptions.


2021 ◽  
Author(s):  
Rishit Dagli ◽  
Ali Mustufa Shaikh ◽  
Hussain Mahdi ◽  
Sameer Nanivadekar

In this paper, we focus on creating a keywords extractor especially for a given job description job-related text corpus for better search engine optimization using attention based deep learning techniques. Millions of jobs are posted but most of them end up not being located due to improper SEO and keyword management. We aim to make this as easy to use as possible and allow us to use this for a large number of job descriptions very easily. We also make use of these algorithms to screen or get insights from large number of resumes, summarize and create keywords for a general piece of text or scientific articles. We also investigate the modeling power of BERT (Bidirectional Encoder Representations from Transformers) for the task of keyword extraction from job descriptions. We further validate our results by providing a fully-functional API and testing out the model with real-time job descriptions.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 734
Author(s):  
Insu Choi ◽  
Woo Chang Kim

Politically-themed stocks mainly refer to stocks that benefit from the policies of politicians. This study gave the empirical analysis of the politically-themed stocks in the Republic of Korea and constructed politically-themed stock networks based on the Republic of Korea’s politically-themed stocks, derived mainly from politicians. To select politically-themed stocks, we calculated the daily politician sentiment index (PSI), which means politicians’ daily reputation using politicians’ search volume data and sentiment analysis results from politician-related text data. Additionally, we selected politically-themed stock candidates from politician-related search volume data. To measure causal relationships, we adopted entropy-based measures. We determined politically-themed stocks based on causal relationships from the rates of change of the PSI to their abnormal returns. To illustrate causal relationships between politically-themed stocks, we constructed politically-themed stock networks based on causal relationships using entropy-based approaches. Moreover, we experimented using politically-themed stocks in real-world situations from the schematized networks, focusing on politically-themed stock networks’ dynamic changes. We verified that the investment strategy using the PSI and politically-themed stocks that we selected could benchmark the main stock market indices such as the KOSPI and KOSDAQ around political events.


Sign in / Sign up

Export Citation Format

Share Document