Unsupervised Technique for Automatically Extracting Components of References

The automatic extraction of bibliographic data remains a difficult task to the present day, when it's realized that the scientific publications are not in a standard format and every publications has its own template. There are many “regular expression” techniques and “supervised machine learning” techniques for extracting the entire details of the references mentioned within the bibliographic section. But there's no much difference within the percentage of their success. Our idea is to seek out whether unsupervised machine learning techniques can help us in increasing the share of success. This paper presents a technique for segregating and automatically extracting the individual components of references like Authors, Title of the references, publications details, etc., using “Unsupervised technique”, “Named-Entity recognition”(NER) technique and link these references to their corresponding full text article with the assistance of google

Download Full-text

Named Entity Recognition using Machine learning techniques for Telugu language

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2016.7883220 ◽

2016 ◽

Cited By ~ 1

Author(s):

M. Humera Khanam ◽

Md.A. Khudhus ◽

M.S. Prasad Babu

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Named Entity ◽

Learning Techniques

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

An Improved Word Representation for Deep Learning Based NER in Indian Languages

Information ◽

10.3390/info10060186 ◽

2019 ◽

Vol 10 (6) ◽

pp. 186 ◽

Cited By ~ 1

Author(s):

Ajees A P ◽

Manju K ◽

Sumam Mary Idicula

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Indian Languages ◽

Named Entity ◽

Text Document ◽

Learning Techniques ◽

Word Representation

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Download Full-text

A survey of named entity recognition and classification

Lingvisticae Investigationes ◽

10.1075/li.30.1.03nad ◽

2007 ◽

Vol 30 (1) ◽

pp. 3-26 ◽

Cited By ~ 761

Author(s):

David Nadeau ◽

Satoshi Sekine

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Exact Match ◽

Named Entity ◽

Word Level ◽

Learning Techniques ◽

Matching Techniques ◽

Evaluation Techniques ◽

Critical Aspects

This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.

Download Full-text

Analysis of Supervised Machine Learning Techniques for Diagnosis of Disease of Infant Baby

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst0706002 ◽

2021 ◽

Vol 7 (6) ◽

pp. 4-9

Keyword(s):

Machine Learning ◽

Humidity Sensor ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Smoke Detector ◽

Learning Techniques ◽

Machine Learning Model ◽

Separate System ◽

Sensor Temperature ◽

The Individual

A system for monitoring an infant's health is developed and described in this paper. In this system, smoke detector, sound sensor, temperature and humidity sensor, are interfaced with the controller Node MCU-ESP8266. In the system, ThingSpeak Cloud is used for the data processing. ThingSpeak Cloud is connected to the Wi-Fi based microcontroller. The behavior and the problems that are being detected can be easily notified to the parents apart from the doctors and nurses, So, that even the nurses or the doctors misses out by chance, the parents can handle the scenario. The collected data can be taken out as in the form of the csv format. This data can be easily put into the Machine Learning Model in order to predict the various problems that a baby might be suffering from. These predictions have been done solely upon the data collected from the individual baby. Furthermore, separate system-based report would be facilitated by the model itself

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Local mortality estimates during the COVID-19 pandemic in Italy

Journal of Population Economics ◽

10.1007/s00148-021-00857-y ◽

2021 ◽

Author(s):

Augusto Cerqua ◽

Roberta Di Stefano ◽

Marco Letta ◽

Sara Miccoli

Keyword(s):

Machine Learning ◽

Excess Mortality ◽

Control Method ◽

Local Level ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Mortality Data ◽

Official Method ◽

Learning Techniques ◽

Mortality Estimates

AbstractEstimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The “official” approach adopted by public institutions to estimate the “excess mortality” during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in “ordinary” years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community.

Download Full-text

Malicious URL Detection Using Supervised Machine Learning Techniques

13th International Conference on Security of Information and Networks ◽

10.1145/3433174.3433592 ◽

2020 ◽

Author(s):

Vara Vundavalli ◽

Farhat Barsha ◽

Mohammad Masum ◽

Hossain Shahriar ◽

Hisham Haddad

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Research Paper Classification using Supervised Machine Learning Techniques

2020 Intermountain Engineering, Technology and Computing (IETC) ◽

10.1109/ietc47856.2020.9249211 ◽

2020 ◽

Author(s):

Shovan Chowdhury ◽

Marco P. Schoen

Keyword(s):

Machine Learning ◽

Research Paper ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Paper Classification

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text