scholarly journals Critical Assessment of Information Extraction Systems in Biology

2003 ◽  
Vol 4 (6) ◽  
pp. 674-677 ◽  
Author(s):  
Christian Blaschke ◽  
Lynette Hirschman ◽  
Alexander Yeh ◽  
Alfonso Valencia

An increasing number of groups are now working in the area of text mining, focusing on a wide range of problems and applying both statistical and linguistic approaches. However, it is not possible to compare the different approaches, because there are no common standards or evaluation criteria; in addition, the various groups are addressing different problems, often using private datasets. As a result, it is impossible to determine how well the existing systems perform, and particularly what performance level can be expected in real applications. This is similar to the situation in text processing in the late 1980s, prior to the Message Understanding Conferences (MUCs). With the introduction of a common evaluation and standardized evaluation metrics as part of these conferences, it became possible to compare approaches, to identify those techniques that did or did not work and to make progress. This progress has resulted in a common pipeline of processes and a set of shared tools available to the general research community. The field of biology is ripe for a similar experiment. Inspired by this example, the BioLINK group (Biological Literature, Information and Knowledge [1]) is organizing a CASP-like evaluation for the text data-mining community applied to biology. The two main tasks specifically address two major bottlenecks for text mining in biology: (1) the correct detection of gene and protein names in text; and (2) the extraction of functional information related to proteins based on the GO classification system. For further information and participation details, see http://www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html


With the development of web technologies, databases and social networks etc. a large amount of text data is generated each day. Mostof the data on the internet is in unstructured form. This unstructured data can provide valuable knowledge. For getting valuable knowledge from text data text mining techniques are used widely. As each day large amounts of research papers were published in journals and conferences. These research papers are very valuable for future research and investigations. These research papers act as a source for future innovations. Researchers write review papers to give updated knowledge about the specific field. But review papers used a limited number of papers and involved manually reading each paper. Due to the large volume of research papers published each day, it is not possible for the researchers to go through each paper to find the updated knowledge about their field of interest. To automate the literature analysis process different techniques of text mining were used. This paper provides a review of text mining techniques used in automatic literature analysis. We collected papers in which previous literature is used with text mining techniques to get valuable knowledge. This review paper presented an overview of text mining techniques, their evaluation criteria, their limitations and challenges for exploring literature to find research trends.



2018 ◽  
Vol 12 (4) ◽  
pp. 482-491 ◽  
Author(s):  
Ruriko Watanabe ◽  
Nobutada Fujii ◽  
Daisuke Kokuryo ◽  
Toshiya Kaihara ◽  
Yoichi Abe ◽  
...  

This study aims to build a support method for consulting service companies allowing them to respond to client demands regardless of the expertise of the consultants. With an emphasis on the revitalization of small and medium-sized enterprises, the importance of support systems for consulting services for small and medium-sized enterprises, which support solving problems that are difficult to deal with by an enterprise, is increasing. Consulting companies can respond to a wide range of management consultations; however, because the contents of a consultation are widely and highly specialized, a service proposal and the problem detection depend on the experience and intuition of the consultant, and thus a stable service may occasionally not be provided. Therefore, a support system for providing stable services independent of the ability of consultants is desired. In this research, as the first step in constructing a support system, an analysis of customer information describing the content of a consultation with the client companies is conducted to predict the occurrence of future problems. Text data such as the consultant’s visitation history, consultation content by e-mail, and call center content are used in the analysis because the contents explain not only the current problems but also possibly contain future problems. This paper describes a method for analyzing the text data by employing text mining. In the proposed method, by combining a correspondence analysis with a DEA (Data Envelopment Analysis) discriminant analysis, words that are strongly related to problem detection are extracted from a large number of words obtained from text data, and variables of the DEA discriminant analysis are reduced and analyzed. The proposed method focuses on a cancellation of contract problems. The cancellation problem does not include uncertainty; it is clearly known whether the contract of the consulting service is being updated or cancelled. In this study, computer experiments were conducted to verify the effectiveness of the proposed method through a comparison with an existing method. The results of the verification experiment are as follows. First, there is a possibility of discovering new factors that cannot be determined from the intuition and experience of the consultant regarding the target problem. Second, through a comparison with the existing method, the effectiveness of the proposed method was confirmed.



2021 ◽  
Author(s):  
Dmitry Evgenievich Prokudin ◽  
Olga Vitalievna Kononova ◽  
Georgy Semeonovich Levit

The objective of this research is to study methods of search, explication and analysis of text data of scientific publications with information and communication technologies for use in scientific research. The research is based on Russian-language scientific publications reflecting the scientific heritage of G. F. Gause. The proposed study is based on the results of case studies conducted to assess the possibilities of using digital information resources in scientific research, extracting metadata from digital electronic resources using methods of their subsequent quantitative processing. The study examined the methods of explication and analysis of text data extracted from digital scientific resources (for example, Elibrary). For the analysis, the information system Sketch Engine was used, which provides natural language text processing (NLP) tools. Based on the analysis of the obtained results, conclusions are drawn about the possibility of using the studied methods not only in scientific research, but also in a wide range of scientific research on various topics.



2021 ◽  
Vol 14 (3) ◽  
pp. 1-26
Author(s):  
Andrea Asperti ◽  
Stefano Dal Bianco

We provide a syllabification algorithm for the Divine Comedy using techniques from probabilistic and constraint programming. We particularly focus on the synalephe , addressed in terms of the "propensity" of a word to take part in a synalephe with adjacent words. We jointly provide an online vocabulary containing, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity, on the left and right sides. The algorithm is intrinsically nondeterministic, producing different possible syllabifications for each verse, with different likelihoods; metric constraints relative to accents on the 10th, 4th, and 6th syllables are used to further reduce the solution space. The most likely syllabification is hence returned as output. We believe that this work could be a major milestone for a lot of different investigations. From the point of view of digital humanities it opens new perspectives on computer-assisted analysis of digital sources, comprising automated detection of anomalous and problematic cases, metric clustering of verses and their categorization, or more foundational investigations addressing, e.g., the phonetic roles of consonants and vowels. From the point of view of text processing and deep learning, information about syllabification and the location of accents opens a wide range of exciting perspectives, from the possibility of automatic learning syllabification of words and verses to the improvement of generative models, aware of metric issues, and more respectful of the expected musicality.



Animals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 1009
Author(s):  
Javiera Lagos ◽  
Manuel Rojas ◽  
Joao B. Rodrigues ◽  
Tamara Tadich

Mules are essential for pack work in mountainous areas, but there is a lack of research on this species. This study intends to assess the perceptions, attitudes, empathy and pain perception of soldiers about mules, to understand the type of human–mule relationship. For this, a survey was applied with closed-ended questions where the empathy and pain perception tools were included and later analyzed through correlations. Open-ended questions were analyzed through text mining. A total of 73 soldiers were surveyed. They had a wide range of ages and years of experience working with equids. Significant positive correlations were found between human empathy, animal empathy and pain perception. Soldiers show a preference for working with mules over donkeys and horses. Text mining analysis shows three clusters associated with the mules’ nutritional, environmental and health needs. In the same line, relevant relations were found for the word “attention” with “load”, “food”, and “harness”. When asked what mules signify for them, two clusters were found, associated with mules’ working capacity and their role in the army. Relevant relations were found between the terms “mountain”, “support”, and “logistics”, and also between “intelligent” and “noble”. To secure mules’ behavioral and emotional needs, future training strategies should include behavior and welfare concepts.



2020 ◽  
Vol 11 (2) ◽  
pp. 107-111
Author(s):  
Christevan Destitus ◽  
Wella Wella ◽  
Suryasari Suryasari

This study aims to clarify tweets on twitter using the Support Vector Machine and Information Gain methods. The clarification itself aims to find a hyperplane that separates the negative and positive classes. In the research stage, there is a system process, namely text mining, text processing which has stages of tokenizing, filtering, stemming, and term weighting. After that, a feature selection is made by information gain which calculates the entropy value of each word. After that, clarify based on the features that have been selected and the output is in the form of identifying whether the tweet is bully or not. The results of this study found that the Support Vector Machine and Information Gain methods have sufficiently maximum results.



2018 ◽  
Vol 69 (12) ◽  
pp. 1882 ◽  
Author(s):  
Elena-Maria Klopries ◽  
Zhiqun Daniel Deng ◽  
Theresa U. Lachmann ◽  
Holger Schüttrumpf ◽  
Bradly A. Trumbo

Surface bypasses are downstream migration structures that can help reduce hydropower-induced damage to migrating fish. However, no comprehensive design concept that facilitates good surface bypass performance for a wide range of sites and species is available. This is why fish-passage efficiencies at recently built bypass structures vary widely between 0% and up to 97%. We reviewed 50 surface bypass performance studies and existing guidelines for salmonids, eels and potamodromous species to identify crucial design criteria for surface bypasses employed in North America, Europe and Australia. Two-tailed Pearson correlation of bypass efficiency and bypass design criteria shows that bypass entrance area (r=0.3300, P=0.0036) and proportion of inflow to the bypass (r=0.3741, P=0.0032) are the most influential parameters on bypass efficiency. However, other parameters such as guiding structures (P=0.2181, ordinary Student’s t-test) and trash-rack spacing (r=–0.1483, P=0.3951, Spearman correlation), although not statistically significant, have been shown to have an effect on efficiency in some studies. The use of different performance criteria and efficiency definitions for bypass evaluation hampers direct comparison of studies and, therefore, deduction of design criteria. To enable meta-analyses and improve bypass design considerations, we suggest a list of standardised performance parameters for bypasses that should be considered in future bypass-performance studies.



Author(s):  
Yegireddi Ramesh ◽  
Kiran Kumar Reddi

With the enormous growth in the Internet and network, data security has become an inevitable concern for any organization. From antecedent security has attracted considerable attention from network researchers. In this perspective many possible fields of endeavour come to mind with many cryptographic algorithms in a broader way, each is highly worthy and lengthy. As society is moving towards digital information age we necessitate highly standard algorithms which compute faster when data size is of wide range or scope. On survey, numerous sequential approaches carried out by symmetric key algorithms on 128 bits as block size are ascertained to be highly in securable and resulting at a low speed. As in the course the commodities are immensely parallelized on multi core processors to solve computational problems, in accordance with, propound parallel symmetric key based algorithms to encrypt/decrypt large data for secure conveyance. The algorithm is aimed to prevail by considering 64 character (512 bits) plain text data, processed 16 characters separately by applying parallelism and finally combine each 16 character cipher data to form 64 character cipher text. The round function employed in the algorithm is very complex, on which improves efficacy.



1997 ◽  
Vol 16 (6) ◽  
pp. 545-559 ◽  
Author(s):  
Edward J. Calabrese ◽  
Linda A. Baldwin

A comprehensive effort was undertaken to identify articles demonstrating chemical hormesis. Nearly 4000 potentially relevant articles were retrieved from preliminary computer searches utilizing various keyword descriptors and extensive cross-referencing. A priori evaluation criteria were established including study design features (e.g., number of doses, dose range), statistical analysis, and reproducibility of results. Evidence of chemical hormesis was judged to have occurred in approximately 350 of the 4000 studies evaluated. Chemical hormesis was observed in a wide range of taxonomic groups and involved agents representing highly diverse chemical classes, many of potential environmental relevance. Numerous biologic endpoints were assessed, with growth responses the most prevalent, followed by metabolic effects, longevity, reproductive responses, and survival. Hormetic responses were generally observed to be of limited magnitude with the average low-dose maximum stimulation approximately 50% greater than controls. The hormetic dose-response range was generally limited to about one order of magnitude with the upper end of the hormetic curve approaching the estimated no-observed-effect level (NOEL) for the particular endpoint. Based on the evaluation criteria, high to moderate evidence of hormesis was observed in studies comprised of ≥ doses with <3 doses in the hormetic zone. The present analysis suggests that chem ical hormesis is a reproducible and generalizable biologic phenomenon. Over the last decade advances have been made providing mechanistic insight helpful in explaining the phenomenon of chemical hormesis in multiple biologic systems with various endpoints. The reason for the uncertainty surrounding the existence of hormesis as a “real phenomenon” is believed to be the result of its relatively infrequent observation in the literature due to experimental design considerations, especially with respect to the number of doses, range of doses, and endpoint selection.



2021 ◽  
Vol 12 (1) ◽  
pp. 45-50
Author(s):  
Jan Szybka ◽  
Sylwester Pabian

The APEKS method was developed in the 1970s. It has a wide range of applications for making a decision. The article describes the APEKS method, which is a multi-criteria method and consists of 10 steps. The application of this method was presented in the example of car selection. The problem of choosing a passenger car was analyzed taking into account 6 evaluation criteria: fuel consumption, power, price, annual operating costs, aesthetic values, and utility values. Following the APEKS method, the analysis was completed with the selection of the best variant, using the forced decision method, consisting of an individual comparison of all criteria with one another. The APEKS variant is used for this, which has all the best features of the variants to choose from. This indicates that APEKS is an idealized and fictional variant.



Sign in / Sign up

Export Citation Format

Share Document