sentence length
Recently Published Documents


TOTAL DOCUMENTS

361
(FIVE YEARS 113)

H-INDEX

29
(FIVE YEARS 4)

Author(s):  
Harsh Khatter ◽  
Anil Ahlawat

The internet content increases exponentially day-by-day leading to the pop-up of irrelevant data while searching. Thus, the vast availability of web data requires curation to enhance the results of the search in relevance to searched topics. The proposed F-CapsNet deals with the content curation of web blog data through the novel integration of fuzzy logic with a machine learning algorithm. The input content to be curated is initially pre-processed and seven major features such as sentence position, bigrams, TF-IDF, cosine similarity, sentence length, proper noun score and numeric token are extracted. Then the fuzzy rules are applied to generate the extractive summary. After the extractive curation, the output is passed to the novel capsule network based deep auto-encoder where the abstractive summary is produced. The performance measures such as precision, recall, F1-score, accuracy and specificity are computed and the results are compared with the existing state-of-the-art methods. From the simulations performed, it has been proven that the proposed method for content curation is more efficient than any other method.


2022 ◽  
pp. 088626052110635
Author(s):  
Solveig K. B. Vatnar ◽  
Christine Friestad ◽  
Stål Bjørkly

Intimate partner homicide (IPH) is an extreme outcome of intimate partner violence (IPV). It is a societal challenge that needs to be investigated over time to see whether changes occur concerning the incidence of IPH, IPH characteristics, socioeconomic factors, and contact with service providers. This study includes the total Norwegian cohort of IPHs between 1990 and 2019 with a final conviction ( N = 224). Poisson regression was applied to model the incidence rate of homicide and IPH between 1990 and 2020 as well as the incidence rates of immigrant perpetrators and victims. Multivariate logistic regression analyses were used to test the association between characteristics and period 1990–2012 compared to after 2012 as dependent variable. The results show that though homicide incidence rates in Norway declined steadily and significantly after 1990, IPH rates did not begin to decline until 2015. The following IPH characteristics showed reduced incidence after 2012: IPH-suicide, perpetrators with a criminal record, and IPHs perpetrated subsequent to preventive interventions towards the perpetrator. Sentence length in IPH cases had increased. Changes were not observed for any of the other IPH characteristics investigated. IPH is often the culmination of long-term violence and can be prevented, even if risk assessment is challenging due to the low base rates.


2022 ◽  
Vol 0 (0) ◽  
Author(s):  
Ghulam Abbas Khushik ◽  
Ari Huhta

Abstract The increasing importance of the Common European Framework of Reference (CEFR) has led to research on the linguistic characteristics of its levels, as this would help the application of the CEFR in the design of teaching materials, courses, and assessments. This study investigated whether CEFR levels can be distinguished with reference to syntactic complexity (SC). 14- and 17-year-old Finnish learners of English (N=397) wrote three writing tasks which were rated against the CEFR levels. The ratings were analysed with multi-facet Rasch analysis and the texts were analysed with automated tools. Findings suggest that the clearest separators at lower CEFR levels (A1–A2) were the mean sentence and T-unit length, variation in sentence length, infinitive density, clauses per sentence or T-unit, and verb phrases per T-unit. For higher levels (B1–B2) they were modifiers per noun phrase, mean clause length, complex nominals per clause, and left embeddedness. The results support previous findings that the length of and variation in the longer production units (sentences, T-units) are the SC indices that most clearly separate the lower CEFR levels, whereas the higher levels are best distinguished in terms of complexity at the clausal and phrasal levels.


Author(s):  
Timur Radbil ◽  
Marina Markina

The article discusses intermediate research results in the development and improvement of a computerized model of Russian texts authorization, which is based on complex application of probabilistic-and-statistical methods. The study aims to describe the new capabilities of the created system in the aspect of its application to diagnostic examinations in text authorization for detection of the gender of the alleged author of the text. The work presents the next stage of fine-tuning and testing of the improved version of the computer program "CTA" (computerized text authorization), which at this stage was adapted for the task of determining and comparing stable relative frequencies of correlation coefficients (the ratio of specified linguistic phenomena of different levels of the language system) in the texts, the authors of which are men and women. The research material is the continuously updated primary bases of literary texts of the 19 th and 21 st centuries (4 bases, respectively). The work shows that for the texts written by men and women, significant differences can be noted in such correlation coefficients as average word length, average sentence length, objectivity coefficient, quality coefficient, activity coefficient, dynamism coefficient, connectivity coefficient, etc. Verification of the results obtained experimentally has demonstrated that the accuracy of gender determining at this stage of the study is approximately 65%. This indicator can be significantly exceeded with an increase in the volume and quality specification of databases and/or when using new models for calculating the correlation coefficients (Spearman's model, etc.).


2021 ◽  
Vol 30 (1) ◽  
pp. 97-121
Author(s):  
Tien-Ping Tan ◽  
Chai Kim Lim ◽  
Wan Rose Eliza Abdul Rahman

A parallel text corpus is an important resource for building a machine translation (MT) system. Existing resources such as translated documents, bilingual dictionaries, and translated subtitles are excellent resources for constructing parallel text corpus. A sentence alignment algorithm automatically aligns source sentences and target sentences because manual sentence alignment is resource-intensive. Over the years, sentence alignment approaches have improved from sentence length heuristics to statistical lexical models to deep neural networks. Solving the alignment problem as a classification problem is interesting as classification is the core of machine learning. This paper proposes a parallel long-short-term memory with attention and convolutional neural network (parallel LSTM+Attention+CNN) for classifying two sentences as parallel or non-parallel sentences. A sliding window approach is also proposed with the classifier to align sentences in the source and target languages. The proposed approach was compared with three classifiers, namely the feedforward neural network, CNN, and bi-directional LSTM. It is also compared with the BleuAlign sentence alignment system. The classification accuracy of these models was evaluated using Malay-English parallel text corpus and UN French-English parallel text corpus. The Malay-English sentence alignment performance was then evaluated using research documents and the very challenging Classical Malay-English document. The proposed classifier obtained more than 80% accuracy in categorizing parallel/non-parallel sentences with a model built using only five thousand training parallel sentences. It has a higher sentence alignment accuracy than other baseline systems.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shin'ichiro Ishikawa

PurposeUsing a newly compiled corpus module consisting of utterances from Asian learners during L2 English interviews, this study examined how Asian EFL learners' L1s (Chinese, Indonesian, Japanese, Korean, Taiwanese and Thai), their L2 proficiency levels (A2, B1 low, B1 upper and B2+) and speech task types (picture descriptions, roleplays and QA-based conversations) affected four aspects of vocabulary usage (number of tokens, standardized type/token ratio, mean word length and mean sentence length).Design/methodology/approachFour aspects concern speech fluency, lexical richness, lexical complexity and structural complexity, respectively.FindingsSubsequent corpus-based quantitative data analyses revealed that (1) learner/native speaker differences existed during the conversation and roleplay tasks in terms of the number of tokens, type/token ratio and sentence length; (2) an L1 group effect existed in all three task types in terms of the number of tokens and sentence length; (3) an L2 proficiency effect existed in all three task types in terms of the number of tokens, type-token ratio and sentence length; and (4) the usage of high-frequency vocabulary was influenced more strongly by the task type and it was classified into four types: Type A vocabulary for grammar control, Type B vocabulary for speech maintenance, Type C vocabulary for negotiation and persuasion and Type D vocabulary for novice learners.Originality/valueThese findings provide clues for better understanding L2 English vocabulary usage among Asian learners during speech.


2021 ◽  
pp. 136216882110575
Author(s):  
Nobuhiro Kamiya

In this study, 118 native speakers of Japanese watched 48 separate video clips in which a teacher provided recasts on phonological or lexical errors to students in Portuguese, a language with which the participants were unfamiliar. In the video clips, six recast characteristics were manipulated: length, segmentation (segmented/whole), prosodic emphasis (stressed/non-stressed), intonation (declarative/interrogative), head movements (nodding/shaking), and gestures (beat/deictic/metaphoric). Participants judged whether or not the teacher had corrected errors and stated the reasons for their decisions. Multiple regressions extracted segmentation and gestures as being significant variables for both phonological and lexical errors. Precisely speaking, recasts were more likely to be accurately perceived as correction when they were provided in sentence-length discourse along with deictic gestures. Additionally, head-shaking and beat also contributed to improved accuracy of phonological errors. The analysis of their reasoning indicates that the participants actively compared errors with recasts when judging the presence of a recast. The overall results indicate that contrary to the common belief suggesting that shorter recasts are better than longer ones, when true beginners overhear recasts, they may find it easier to notice the corrections when they are provided in sentence-length discourse; as such recasts facilitate more accurate perceptions.


2021 ◽  
Author(s):  
◽  
Wayne Goodall

<p>This thesis examines the consistency of sentencing between the circuits of the New Zealand District Courts. Four predictions based on a sequence or chain of theories incorporating the concept of bounded rationality from decision making theory, the influence of formal and substantive rationalities on sentencing decisions, court community theory, and personal construct psychology were tested. The circuit in which sentencing took place was expected to affect the likelihood of incarceration and to affect the length of incarceration. If these predictions were met, it was further predicted that the weight applied to some or all of the sentencing factors would vary between circuits. It is understood to be the first study controlling for a wide range of sentencing factors examining the consistency of sentencing between locations in New Zealand and one of the first from anywhere outside of the United States. The four predictions were tested using sentencing data from the two years 2008-2009 for three high volume offences (aggravated drink driving, male assaults female and burglary). Sentencing was treated as a two part decision process, with the selection of a sentence type followed by the determination of the sentence amount. Each prediction was separately modelled for each offence. Different types of model were chosen as being more suitable for the specific predictions: logistic regression for the likelihood of incarceration; linear regression for the length of incarceration; multi-level generalised linear regression with random co-efficients to determine if the weight applied to specific factors varied by circuit in the determination of whether or not to incarcerate; and multi-level linear regression with random co-efficients to determine if the weight applied to specific factors varied by circuit in the determination of sentence length. The logistic regression and linear regression models demonstrated that there were statistically significant and substantively significant differences between circuits in the likelihood and length of incarceration. The extent of inconsistency varied by offence type with the most marked differences occurring for aggravated drink driving and burglary. Offence seriousness and criminal history factors were found to be the principal influences on both sentence decisions for all three offences. The multi-level models for aggravated drink driving and burglary revealed a core of seriousness and criminal history factors whose influence varied across the circuits. The models for male assaults female were less informative, highlighting the likelihood that these models were limited by the omission of key sentencing variables and the narrow scope of this particular assault type within the wider spectrum of assaults. The findings should not be interpreted as if they are a critique of the sentence imposed in any individual case or of the sentencing by any judge or in any circuit. It is a critique of sentencing guidance in New Zealand and its ability to achieve a fundamental tenet of justice: the similar treatment of similar offenders being sentenced in similar circumstances. In addition to testing the predictions the multi-level models were extended to address whether the observed variation in sentencing was associated with variations in circuit context. Due to the limited number of circuits (17) and multi-collinearity between the contextual variables, bivariate analyses had to be employed. The modelling revealed a consistent difference between provincial and metropolitan circuits; the two categories of circuit were distinguished from one another by many of the other more specific variables that had a significant association with sentencing approaches. The provincial circuits were more likely to incarcerate and to impose longer sentences. However, the small number of circuits and multi-collinearity between the variables precluded more detailed analysis to identify which of the contextual variables distinguishing metropolitan and provincial circuits had the greatest influence. These findings have significant implications for the judiciary and for sentencing policy makers. Urgent attention should be given to addressing opportunities to increase the availability of sentencing guidance to reduce the degree of inconsistency. More detailed offence based sentencing guidance is required; in the current context there are two options that could be used. The Court of Appeal could issue a broader range of guideline judgments or the legislation for the Sentencing Council and the process for developing and promulgating guidelines could be implemented. For logistical and public policy reasons the Sentencing Council approach is preferred. There is a risk that failure to address the levels of inconsistency will result in the sentencing system falling into disrepute.</p>


2021 ◽  
Author(s):  
◽  
Wayne Goodall

<p>This thesis examines the consistency of sentencing between the circuits of the New Zealand District Courts. Four predictions based on a sequence or chain of theories incorporating the concept of bounded rationality from decision making theory, the influence of formal and substantive rationalities on sentencing decisions, court community theory, and personal construct psychology were tested. The circuit in which sentencing took place was expected to affect the likelihood of incarceration and to affect the length of incarceration. If these predictions were met, it was further predicted that the weight applied to some or all of the sentencing factors would vary between circuits. It is understood to be the first study controlling for a wide range of sentencing factors examining the consistency of sentencing between locations in New Zealand and one of the first from anywhere outside of the United States. The four predictions were tested using sentencing data from the two years 2008-2009 for three high volume offences (aggravated drink driving, male assaults female and burglary). Sentencing was treated as a two part decision process, with the selection of a sentence type followed by the determination of the sentence amount. Each prediction was separately modelled for each offence. Different types of model were chosen as being more suitable for the specific predictions: logistic regression for the likelihood of incarceration; linear regression for the length of incarceration; multi-level generalised linear regression with random co-efficients to determine if the weight applied to specific factors varied by circuit in the determination of whether or not to incarcerate; and multi-level linear regression with random co-efficients to determine if the weight applied to specific factors varied by circuit in the determination of sentence length. The logistic regression and linear regression models demonstrated that there were statistically significant and substantively significant differences between circuits in the likelihood and length of incarceration. The extent of inconsistency varied by offence type with the most marked differences occurring for aggravated drink driving and burglary. Offence seriousness and criminal history factors were found to be the principal influences on both sentence decisions for all three offences. The multi-level models for aggravated drink driving and burglary revealed a core of seriousness and criminal history factors whose influence varied across the circuits. The models for male assaults female were less informative, highlighting the likelihood that these models were limited by the omission of key sentencing variables and the narrow scope of this particular assault type within the wider spectrum of assaults. The findings should not be interpreted as if they are a critique of the sentence imposed in any individual case or of the sentencing by any judge or in any circuit. It is a critique of sentencing guidance in New Zealand and its ability to achieve a fundamental tenet of justice: the similar treatment of similar offenders being sentenced in similar circumstances. In addition to testing the predictions the multi-level models were extended to address whether the observed variation in sentencing was associated with variations in circuit context. Due to the limited number of circuits (17) and multi-collinearity between the contextual variables, bivariate analyses had to be employed. The modelling revealed a consistent difference between provincial and metropolitan circuits; the two categories of circuit were distinguished from one another by many of the other more specific variables that had a significant association with sentencing approaches. The provincial circuits were more likely to incarcerate and to impose longer sentences. However, the small number of circuits and multi-collinearity between the variables precluded more detailed analysis to identify which of the contextual variables distinguishing metropolitan and provincial circuits had the greatest influence. These findings have significant implications for the judiciary and for sentencing policy makers. Urgent attention should be given to addressing opportunities to increase the availability of sentencing guidance to reduce the degree of inconsistency. More detailed offence based sentencing guidance is required; in the current context there are two options that could be used. The Court of Appeal could issue a broader range of guideline judgments or the legislation for the Sentencing Council and the process for developing and promulgating guidelines could be implemented. For logistical and public policy reasons the Sentencing Council approach is preferred. There is a risk that failure to address the levels of inconsistency will result in the sentencing system falling into disrepute.</p>


2021 ◽  
Vol 12 (5) ◽  
pp. 1-21
Author(s):  
Changsen Yuan ◽  
Heyan Huang ◽  
Chong Feng

The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences’ syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN’s performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction.


Sign in / Sign up

Export Citation Format

Share Document