The Distribution Patterns of Valency-changing Verbs: An Approach of Quantitative Linguistics

Da Qi; Hua Wang

doi:10.32996/ijls.2021.1.2.7

The Distribution Patterns of Valency-changing Verbs: An Approach of Quantitative Linguistics

International Journal of Linguistics Studies ◽

10.32996/ijls.2021.1.2.7 ◽

2021 ◽

Vol 1 (2) ◽

pp. 43-51

Author(s):

Da Qi ◽

Hua Wang

Keyword(s):

Language Processing ◽

Statistical Data ◽

Probability Distributions ◽

Distribution Patterns ◽

Frequency Distributions ◽

Spoken English ◽

Quantitative Linguistics ◽

Natural Language Processing Tool ◽

Language Data ◽

Mixed Poisson Distribution

The present study attempts to explore the distribution patterns of the valency-changing verbs from the perspective of quantitative linguistics. We took authentic spoken language data as the research materials. The corpus used in this paper is a self-built spoken English corpus containing about 21,000 words. We half-manually annotated the corpus with the help of SpaCy, a natural language processing tool. According to the annotation results and statistical data, we obtained a total of 217 valency-changing English verbs and 248 sentence components governed by them. After analysis, the current study came to the following conclusions: First, bivalent verbs are most frequent among the three types of valency-changing verbs; second, after fitting all the language data to different probability distributions, we found that the rank-frequency distributions of all the valency-changing English verbs with different numbers of obligatory arguments obey the power law, and the frequencies of bivalent valency-changing verbs obey other kinds of distributions such as the mixed Poisson distribution.

Download Full-text

STRUCTURAL CONNECTIONS AND INTERCHANGEABILITY OF ASYMMETRIC TYPES OF THEORETICAL DISTRIBUTIONS

EurasianUnionScientists ◽

10.31618/esu.2413-9335.2020.6.77.999 ◽

2020 ◽

Vol 6 (8(77)) ◽

pp. 13-17

Author(s):

Azimkhan Kurmankozhayev ◽

Elmira Seilbekovna Yesbergenova

Keyword(s):

Weibull Distribution ◽

Empirical Analysis ◽

Gamma Distribution ◽

Statistical Data ◽

Distribution Patterns ◽

Structural Relationships ◽

Statistical Regularities ◽

Structural Connections ◽

Characteristic Features ◽

Structural Connection

Presented the results of evaluation of structural connection, identity and interchangeability of main asymmetric types of theoretical distributions most often acceptable for assessing the distributions of various indicators in geology and technology. The method of empirical analysis and statistical inference was used with the involvement of nonparametric facts according to the distribution patterns. The analysis of the empirical results of the application of the lognormal, gamma distribution and the Weibull distribution with the involvement of extensive statistical data from literary and research sources is carried out. The characteristic features and statistical regularities of distributions inherent to them are revealed, estimated statistical conclusions are obtained, according to which structural relationships between the functions of the lognormal, gamma and Weibull distributions are revealed. The identity and authenticity of the development of probabilistic frequencies in their application have been established, the complex geometric "image" of asymmetry inherent to these types of distributions is generalized. Structural relationships and interchangeability of asymmetric types of distributions are recommended to increase the reliability and credibility of the estimated choice of distribution in conditions of uncertainty and insignificance of statistical data when solving problems associated with forecasts, technological and computer developments.

Download Full-text

A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical Systems

ACM Transactions on Internet Technology ◽

10.1145/3418208 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-25

Author(s):

Pin Ni ◽

Yuming Li ◽

Gangmin Li ◽

Victor Chang

Keyword(s):

Natural Language ◽

Language Processing ◽

Short Term Memory ◽

Physical World ◽

Heterogeneous Data ◽

Cyber Physical Systems ◽

Physical Systems ◽

Language Data ◽

Text Language ◽

Different Sources

Cyber-Physical Systems (CPS), as a multi-dimensional complex system that connects the physical world and the cyber world, has a strong demand for processing large amounts of heterogeneous data. These tasks also include Natural Language Inference (NLI) tasks based on text from different sources. However, the current research on natural language processing in CPS does not involve exploration in this field. Therefore, this study proposes a Siamese Network structure that combines Stacked Residual Long Short-Term Memory (bidirectional) with the Attention mechanism and Capsule Network for the NLI module in CPS, which is used to infer the relationship between text/language data from different sources. This model is mainly used to implement NLI tasks and conduct a detailed evaluation in three main NLI benchmarks as the basic semantic understanding module in CPS. Comparative experiments prove that the proposed method achieves competitive performance, has a certain generalization ability, and can balance the performance and the number of trained parameters.

Download Full-text

Some Information Geometric Aspects of Cyber Security by Face Recognition

Entropy ◽

10.3390/e23070878 ◽

2021 ◽

Vol 23 (7) ◽

pp. 878

Author(s):

C. T. J. Dodson ◽

John Soldera ◽

Jacob Scharcanski

Keyword(s):

Face Recognition ◽

Cyber Security ◽

Probability Distributions ◽

Geodesic Distance ◽

Recognition Method ◽

Frequency Distributions ◽

Face Images ◽

User Access ◽

Multidimensional Methods ◽

Joint Probabilities

Secure user access to devices and datasets is widely enabled by fingerprint or face recognition. Organization of the necessarily large secure digital object datasets, with objects having content that may consist of images, text, video or audio, involves efficient classification and feature retrieval processing. This usually will require multidimensional methods applicable to data that is represented through a family of probability distributions. Then information geometry is an appropriate context in which to provide for such analytic work, whether with maximum likelihood fitted distributions or empirical frequency distributions. The important provision is of a natural geometric measure structure on families of probability distributions by representing them as Riemannian manifolds. Then the distributions are points lying in this geometrical manifold, different features can be identified and dissimilarities computed, so that neighbourhoods of objects nearby a given example object can be constructed. This can reveal clustering and projections onto smaller eigen-subspaces which can make comparisons easier to interpret. Geodesic distances can be used as a natural dissimilarity metric applied over data described by probability distributions. Exploring this property, we propose a new face recognition method which scores dissimilarities between face images by multiplying geodesic distance approximations between 3-variate RGB Gaussians representative of colour face images, and also obtaining joint probabilities. The experimental results show that this new method is more successful in recognition rates than published comparative state-of-the-art methods.

Download Full-text

Creation of a simple natural language processing tool to support an imaging utilization quality dashboard

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2017.02.011 ◽

2017 ◽

Vol 101 ◽

pp. 93-99 ◽

Cited By ~ 10

Author(s):

Jordan Swartz ◽

Christian Koziatek ◽

Jason Theobald ◽

Silas Smith ◽

Eduardo Iturrate

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Language Processing Tool ◽

Imaging Utilization

Download Full-text

The Convergence of IoT, Machine Learning, and Big Data for Advancing Flood Analytics Knowledge

10.5194/egusphere-egu21-7782 ◽

2021 ◽

Author(s):

Vidya Samadi ◽

Rakshit Pally

Keyword(s):

Machine Learning ◽

Big Data ◽

Language Processing ◽

Situational Awareness ◽

Critical Infrastructure ◽

Probability Distributions ◽

Data Gathering ◽

Flood Frequency ◽

Flood Frequency Analysis ◽

Crowd Intelligence

<p>Floods are among the most destructive natural hazard that affect millions of people across the world leading to severe loss of life and damage to property, critical infrastructure, and agriculture. Internet of Things (IoTs), machine learning (ML), and Big Data are exceptionally valuable tools for collecting the catastrophic readiness and countless actionable data. The aim of this presentation is to introduce Flood Analytics Information System (FAIS) as a data gathering and analytics system. &#160;FAIS application is smartly designed to integrate crowd intelligence, ML, and natural language processing of tweets to provide warning with the aim to improve flood situational awareness and risk assessment. FAIS has been Beta tested during major hurricane events in US where successive storms made extensive damage and disruption. The prototype successfully identifies a dynamic set of at-risk locations/communities using the USGS river gauge height readings and geotagged tweets intersected with watershed boundary. The list of prioritized locations can be updated, as the river monitoring system and condition change over time (typically every 15 minutes).&#160; The prototype also performs flood frequency analysis (FFA) using various probability distributions with the associated uncertainty estimation to assist engineers in designing safe structures. This presentation will discuss about the FAIS functionalities and real-time implementation of the prototype across south and southeast USA. This research is funded by the US National Science Foundation (NSF).</p>

Download Full-text

Machine-learning as a validated tool to characterize individual differences in free recall of naturalistic events.

10.31234/osf.io/uygzv ◽

2021 ◽

Author(s):

Xinxu Shen ◽

Troy Houser ◽

David Victor Smith ◽

Vishnu P. Murty

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Individual Difference ◽

Language Processing ◽

Large Scale ◽

High Reliability ◽

Difference Analysis ◽

Universal Sentence ◽

Natural Language Processing Tool

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.

Download Full-text

Developing and Analyzing a Spanish Corpus for Forensic Purposes

Linguistic Evidence in Security Law and Intelligence ◽

10.5195/lesli.2019.19 ◽

2019 ◽

Vol 3 ◽

Author(s):

Ángela Almela ◽

Gema Alcaraz-Mármol ◽

Arancha García-Pinar ◽

Clara Pallejá

Keyword(s):

Data Collection ◽

Language Processing ◽

Ad Hoc ◽

Semantic Analysis ◽

Linguistic Features ◽

Natural Language Processing Tool ◽

Check Method ◽

Spanish Universities ◽

Gender Based ◽

Main Instrument

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish. Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.

Download Full-text

Retrieving Linguistic Information from a Corpus on the Example of Negation in Chinese

Acta Linguistica Asiatica ◽

10.4312/ala.9.2.103-115 ◽

2019 ◽

Vol 9 (2) ◽

pp. 103-115

Author(s):

Ľuboš GAJDOŠ

Keyword(s):

Second Language Acquisition ◽

Language Acquisition ◽

Empirical Evidence ◽

Statistical Data ◽

Corpus Analysis ◽

Linguistic Information ◽

New Approach ◽

Language Data ◽

Authentic Language ◽

Practical Implications

The paper deals with corpus analysis of negation in Chinese, namely the negatives bù 不 and méi/ méiyǒu没/没有. The adverbs BU and MEI are two of the most frequent negatives in Chinese. The aim of this study is to present statistical data together with linguistics analysis. The results provide empirical evidence of discrepancy between “authentic” language data versus linguistic prescription with practical implications for second-language acquisition. The findings inter alia suggest a new approach to verb categorisation.

Download Full-text

A Natural Language Processing Tool to Support the Electronic Invoicing Process in Italy

10.1109/idaacs53288.2021.9660987 ◽

2021 ◽

Author(s):

Luigi Di Puglia Pugliese ◽

Francesca Guerriero ◽

Giusy Macrina ◽

Enza Messina

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Language Processing Tool

Download Full-text

03:27 PM Abstract No. 144 A natural language processing tool for real-time cost assessment in interventional radiology

Journal of Vascular and Interventional Radiology ◽

10.1016/j.jvir.2018.12.194 ◽

2019 ◽

Vol 30 (3) ◽

pp. S66-S67

Author(s):

K. Seals ◽

A. Taylor ◽

C. Sanders ◽

E. Lehrman ◽

N. Fidelman ◽

...

Keyword(s):

Interventional Radiology ◽

Natural Language Processing ◽

Natural Language ◽

Real Time ◽

Language Processing ◽

Cost Assessment ◽

Time Cost ◽

Natural Language Processing Tool

Download Full-text