sentence segmentation Latest Research Papers

Memodelkan Entity Relationship Diagram (ERD) dapat dilakukan secara manual, namun umumnya memperoleh pemodelan ERD secara manual membutuhkan waktu yang lama. Maka, dibutuhkan pembangkit ERD dari spesifikasi kebutuhan untuk mempermudah dalam melakukan pemodelan ERD. Penelitian ini bertujuan untuk mengembangkan sebuah sistem pembangkit ERD dari spesifikasi kebutuhan dalam Bahasa Indonesia dengan menerapkan beberapa tahapan-tahapan dari Natural Language Processing (NLP) sesuai kebutuhan penelitian. Spesifikasi kebutuhan yang digunakan tim peneliti menggunakan teknik document analysis. Untuk tahapan-tahapan dari NLP yang digunakan oleh peneliti yaitu: case folding, sentence segmentation, tokenization, POS tagging, chunking dan parsing. Kemudian peneliti melakukan identifikasi terhadap kata-kata dari teks yang sudah diproses pada tahapan-tahapan dari NLP dengan metode rule-based untuk menemukan daftar kata-kata yang memenuhi dalam komponen ERD seperti: entitas, atribut, primary key dan relasi. ERD kemudian digambarkan menggunakan Graphviz berdasarkan komponen ERD yang telah diperoleh Evaluasi hasil ERD yang berhasil dibangkitkan kemudian di evaluasi menggunakan metode evaluasi expert judgement. Dari hasil evaluasi berdasarkan beberapa studi kasus diperoleh hasil rata-rata precision, recall, F1 score berturut-turut dari tiap ahli yaitu: pada ahli 1 diperoleh 91%, 90%, 90%; pada ahli 2 diperoleh 90%, 90%, 90%; pada ahli 3 diperoleh 98%, 94%, 96%; pada ahli 4 diperoleh 93%, 93%, 93%; dan pada ahli 5 diperoleh 98%, 83%, 90%.

Download Full-text

Long Sentence Segmentation Model based on Machine Translation

10.1109/icscde54196.2021.00054 ◽

2021 ◽

Author(s):

Hui Cui

Keyword(s):

Machine Translation ◽

Model Based ◽

Sentence Segmentation

Download Full-text

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Engineering Journal ◽

10.4186/ej.2021.25.6.15 ◽

2021 ◽

Vol 25 (6) ◽

pp. 15-33

Author(s):

Chanatip Saetia ◽

Ekapol Chuangsuwanich ◽

Tawunrat Chalothorn ◽

Peerapon Vateekul

Keyword(s):

Sentence Segmentation

Download Full-text

Better Chinese Sentence Segmentation with Reinforcement Learning

10.18653/v1/2021.findings-acl.25 ◽

2021 ◽

Author(s):

Srivatsan Srinivasan ◽

Chris Dyer

Keyword(s):

Reinforcement Learning ◽

Sentence Segmentation

Download Full-text

AUTOMATED SYSTEM FOR DETECTION OF NON-STANDARD ACTIONS USING SCRIPTURAL ANALYSIS OF THE TEXT

Cybersecurity Education Science Technique ◽

10.28925/2663-4023.2021.13.92101 ◽

2021 ◽

Vol 1 (13) ◽

pp. 92-101

Author(s):

Serhii Krivenko ◽

Natalya Rotaniova ◽

Yulianna Lazarevska

Keyword(s):

Language Processing ◽

Complete Information ◽

Target Sentence ◽

Text Messages ◽

Automated System ◽

High Tech ◽

Network Teaching ◽

Part Of Speech ◽

Sentence Segmentation ◽

Data Source

The scenario (narrative schemas) is some established (in society) sequence of steps to achieve the set goal and contains the most complete information about all possible ways of development of the described situation (with selection points and branches). The creation of the XML platform gave rise to a new high-tech and technologically more advanced stage in the development of the Web. As a result, the XML platform becomes a significant component in the technology of information systems development, and the tendency of their integration at the level of corporations, agencies, ministries only strengthens the position of XML in the field of information technology in general. A system for automatic detection of non-standard scripts in text messages has been developed. System programming consists of stages of ontology formation, sentence parsing and scenario comparison. the classic natural language processing (NLP) method, which supports the most common tasks such as tokenization, sentence segmentation, tagging of a part of speech, extraction of named entities, partitioning, parsing and co-referential resolution, is used for parsing sentences in the system. Maximum entropy and machine learning based on perceptrons are also possible. Ontologies are stored using OWL technology. The object-target sentence parsers with the described OWL are compared in the analysis process. From a SPARQL query on a source object, query models are returned to the table object. The table class is the base class for all table objects and provides an interface for accessing values in the rows and columns of the results table. If the table object has exactly three columns, it can be used to build a new data source object. This provides a convenient mechanism for retrieving a subset of data from one data source and adding them to another. In the context of the RDF API, a node is defined as all statements about the subject of a URI. The content of the table is compared with the semantics of the sentence. If the sentence scenario does not match the OWL ontology model, there is a possibility of atypical object actions. In this case, a conclusion is formed about the suspicion of the message. For more correct use of possibilities of the analysis of the text it is necessary to form the case of ontologies or to use existing (Akutan, Amazon, etc.) taking into account their features. To increase the ontologies of objects, it is possible to use additional neural network teaching methods.

Download Full-text

A unified approach to sentence segmentation of punctuated text in many languages

10.18653/v1/2021.acl-long.309 ◽

2021 ◽

Author(s):

Rachel Wicks ◽

Matt Post

Keyword(s):

Unified Approach ◽

Sentence Segmentation

Download Full-text

The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics

PLoS ONE ◽

10.1371/journal.pone.0241979 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241979

Author(s):

Friederike Tegge ◽

Katharina Parry

Keyword(s):

Language Processing ◽

Text Segmentation ◽

Extensive Experience ◽

Written Text ◽

Sentence Segmentation ◽

Sentence Level ◽

Textual Data ◽

Song Lyrics ◽

The Impact ◽

Text Normalization

The text-evaluation application Coh-Metrix and natural language processing rely on the sentence for text segmentation and analysis and frequently detect sentence limits by means of punctuation. Problems arise when target texts such as pop song lyrics do not follow formal standards of written text composition and lack punctuation in the original. In such cases it is common for human transcribers to prepare texts for analysis, often following unspecified or at least unreported rules of text normalization and relying potentially on an assumed shared understanding of the sentence as a text-structural unit. This study investigated whether the use of different transcribers to insert typographical symbols into song lyrics during the pre-processing of textual data can result in significant differences in sentence delineation. Results indicate that different transcribers (following commonly agreed-upon rules of punctuation based on their extensive experience with language and writing as language professionals) can produce differences in sentence segmentation. This has implications for the analysis results for at least some Coh-Metrix measures and highlights the problem of transcription, with potential consequences for quantification at and above sentence level. It is argued that when analyzing non-traditional written texts or transcripts of spoken language it is not possible to assume uniform text interpretation and segmentation during pre-processing. It is advisable to provide clear rules for text normalization at the pre-processing stage, and to make these explicit in documentation and publication.

Download Full-text

Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches

2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T) ◽

10.1109/picst51311.2020.9468084 ◽

2020 ◽

Author(s):

Ievgen Iosifov ◽

Olena Iosifova ◽

Volodymyr Sokolov

Keyword(s):

Language Modeling ◽

Sequence Labeling ◽

Sentence Segmentation

Download Full-text

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200620 ◽

2020 ◽

Author(s):

Kairit Sirts ◽

Kairit Peekman

Keyword(s):

Word Boundary ◽

Manual Annotation ◽

Test Set ◽

Sentence Segmentation

Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules. Thus, sentence segmentation and word tokenization systems that have been developed on well-formed texts might not perform so well on unedited web texts. In this paper, we first describe the manual annotation of sentence boundaries of an Estonian web dataset and then present the evaluation results of three existing sentence segmentation and word tokenization systems on this corpus: EstNLTK, Stanza and UDPipe. While EstNLTK obtains the highest performance compared to other systems on sentence segmentation on this dataset, the sentence segmentation performance of Stanza and UDPipe remains well below the results obtained on the more well-formed Estonian UD test set.

Download Full-text

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3389037 ◽

2020 ◽

Author(s):

Chanatip Saetia ◽

Tawunrat Chalothorn ◽

Ekapol Chuangsuwanich ◽

Peerapon Vateekul

Keyword(s):

Sentence Segmentation

Download Full-text

sentence segmentation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

PEMBANGKIT ENTITY RELATIONSHIP DIAGRAM DARI SPESIFIKASI KEBUTUHAN MENGGUNAKAN NATURAL LANGUAGE PROCESSING UNTUK BAHASA INDONESIA

Long Sentence Segmentation Model based on Machine Translation

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Better Chinese Sentence Segmentation with Reinforcement Learning

AUTOMATED SYSTEM FOR DETECTION OF NON-STANDARD ACTIONS USING SCRIPTURAL ANALYSIS OF THE TEXT

A unified approach to sentence segmentation of punctuated text in many languages

The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics

Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Export Citation Format

sentence segmentationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

PEMBANGKIT ENTITY RELATIONSHIP DIAGRAM DARI SPESIFIKASI KEBUTUHAN MENGGUNAKAN NATURAL LANGUAGE PROCESSING UNTUK BAHASA INDONESIA

Long Sentence Segmentation Model based on Machine Translation

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Better Chinese Sentence Segmentation with Reinforcement Learning

AUTOMATED SYSTEM FOR DETECTION OF NON-STANDARD ACTIONS USING SCRIPTURAL ANALYSIS OF THE TEXT

A unified approach to sentence segmentation of punctuated text in many languages

The impact of differences in text segmentation on the automated quantitative evaluation of song-lyrics

Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

sentence segmentation
Recently Published Documents