sentence classification
Recently Published Documents


TOTAL DOCUMENTS

129
(FIVE YEARS 17)

H-INDEX

12
(FIVE YEARS 0)

Author(s):  
Raj Nath Patel ◽  
Edward Burgin ◽  
Haytham Assem ◽  
Sourav Dutta


2021 ◽  
Author(s):  
Huihui Xu ◽  
Jaromir Savelka ◽  
Kevin D. Ashley

In this paper, we treat sentence annotation as a classification task. We employ sequence-to-sequence models to take sentence position information into account in identifying case law sentences as issues, conclusions, or reasons. We also compare the legal domain specific sentence embedding with other general purpose sentence embeddings to gauge the effect of legal domain knowledge, captured during pre-training, on text classification. We deployed the models on both summaries and full-text decisions. We found that the sentence position information is especially useful for full-text sentence classification. We also verified that legal domain specific sentence embeddings perform better, and that meta-sentence embedding can further enhance performance when sentence position information is included.



2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Malik Daler Ali Awan ◽  
Sikandar Ali ◽  
Ali Samad ◽  
Nadeem Iqbal ◽  
Malik Muhammad Saad Missen ◽  
...  

The usage of local languages is being common in social media and news channels. The people share the worthy insights about various topics related to their lives in different languages. A bulk of text in various local languages exists on the Internet that contains invaluable information. The analysis of such type of stuff (local language’s text) will certainly help improve a number of Natural Language Processing (NLP) tasks. The information extracted from local languages can be used to develop various applications to add new milestone in the field of NLP. In this paper, we presented an applied research task, “multiclass sentence classification for Urdu language text at sentence level existing on the social networks, i.e., Twitter, Facebook, and news channels by using N-grams features.” Our dataset consists of more than 1,00000 instances of twelve (12) different types of topics. A famous machine learning classifier Random Forest is used to classify the sentences. It showed 80.15%, 76.88%, and 64.41% accuracy for unigram, bigram, and trigram features, respectively.





2021 ◽  
Author(s):  
Qianying Wang ◽  
Jing Liao ◽  
Mirella Lapata ◽  
Malcolm Macleod

Abstract Background: Natural language processing could assist multiple tasks in systematic reviews to reduce workflow, including the extraction of PICO elements such as study populations, interventions and outcomes. The PICO framework provides a basis for the retrieval and selection for inclusion of published evidence relevant to a specific systematic review question, and automatic approaches of PICO extraction have been developed particularly for reviews of clinical trial findings. Considering the difference between preclinical animal studies and clinical trials, developing separate approaches are necessary. Facilitating preclinical systematic reviews will inform the translation from preclinical to clinical research. Methods: We randomly selected 400 abstracts from the PubMed Central Open Access database which described in vivo animal research and manually annotated these with PICO phrases for Species, Strain, model Induction, Intervention, Comparator and Outcome. We developed a two-stage workflow for preclinical PICO extraction. Firstly we fine-tuned BERT with different pre-trained modules for PICO sentence classification. Then, after removing text irrelevant to PICO features, we explored LSTM, CRF and BERT-based models for PICO entity recognition. We also explored a self-training approach because of the small training corpus.Results: For PICO sentence classification, BERT models using all pre-trained modules achieved an F1 score over 80%, and models pre-trained on PubMed abstracts achieved the highest F1 of 85%. For PICO entity recognition, fine-tuning BERT pre-trained on PubMed abstracts achieved an overall F1 of 71%, and satisfactory F1 for Species (98%), Strain (70%), Intervention (70%) and Outcome (67%). The score of Induction and Comparator is less satisfactory, but F1 of Comparator can be improved to 50% by applying self-training. Conclusions: Our study indicates that of the approaches tested, BERT pre-trained on PubMed abstracts is the best for both PICO sentence classification and PICO entity recognition in the preclinical abstracts. Self-training yields better performance for identifying comparators and strains.



2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Wentao Yu ◽  
Xiaohui Huang ◽  
Qingjun Yuan ◽  
Mianzhu Yi ◽  
Sen An ◽  
...  

Detecting information security events from multimodal data can help analyze the evolution of events in the security field. The Tree-LSTM network that introduces the self-attention mechanism was used to construct the sentence-vectorized representation model (SAtt-LSTM: Tree-LSTM with self-attention) and then classify the candidate event sentences through the representation results of the SAtt-LSTM model to obtain the event of the candidate event sentence types. Event detection using sentence classification methods can solve the problem of error cascade based on pipeline methods, and the problem of CNN or RNN cannot make full use of the syntactic information of candidate event sentences in methods based on joint learning. The paper treats the event detection task as a sentence classification task. In order to verify the effectiveness and superiority of the method in this paper, the DuEE data set was used for experimental verification. Experimental results show that this model has better performance than methods that use chain structure LSTM, CNN, or only Tree-LSTM.



Author(s):  
Antara Biswas ◽  
Musfiqur Rahman ◽  
Zahura Jebin Orin ◽  
Zahid Hasan


Author(s):  
Md. Hasan Imam Bijoy ◽  
Mehedi Hasan ◽  
Abdur Nur Tusher ◽  
Md. Mahbubur Rahman ◽  
Md. Jueal Mia ◽  
...  


Author(s):  
Brendan Rogers ◽  
Nasimul Noman ◽  
Stephan Chalup ◽  
Pablo Moscato


Sign in / Sign up

Export Citation Format

Share Document