mining tools
Recently Published Documents


TOTAL DOCUMENTS

459
(FIVE YEARS 104)

H-INDEX

24
(FIVE YEARS 4)

2022 ◽  
Vol 21 (4) ◽  
pp. 346-363
Author(s):  
Hubert Anysz

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods


2021 ◽  
Author(s):  
Fachrul Kurniawan ◽  
Badruddin ◽  
Aji Prasetya Wibawa

Abstract By identifying a text's polarity, sentiment analysis is a technique for extracting information from a person's attitude about an issue or occurrence. The grouping is made to discuss whether the reader is positive or negative. The drop duplication procedure creates 4339 from the preceding 10997, and the result language detection is 31 languages, thanks to the pre-processing stage. Although the data comes from the world's largest Muslim country, the problem is not limited to it, as evidenced by the use of text mining tools to identify languages.


2021 ◽  
Vol 2 (4) ◽  
Author(s):  
J Borges-Rosa ◽  
M Oliveira-Santos ◽  
M Simoes ◽  
P Carvalho ◽  
G Ibanez-Sanchez ◽  
...  

Abstract Background In ST-segment elevation myocardial infarction (STEMI), time delay between symptom onset and treatment is critical to improve outcome. The expected transport delay between patient location and percutaneous coronary intervention (PCI) centre is paramount for choosing the adequate reperfusion therapy. The “Centre” region of Portugal has heterogeneity in PCI assess due to geographical reasons. Purpose We aimed to explore time delays between regions using process mining (PM) tools. Methods We retrospectively assessed the Portuguese Registry of Acute Coronary Syndromes for patients with STEMI from October 2010 to September 2019, collecting information on geographical area of symptom onset, reperfusion option, and in-hospital mortality. We used a PM toolkit (PM4H – PMApp Version) to build two models (one national and one regional) that represent the flow of patients in a healthcare system, enhancing time differences between groups. One-way analysis of variance was employed for the global comparison of study variables between groups and post hoc analysis with Bonferroni correction was used for multiple comparisons. Results Overall, 8956 patients (75% male, 48% from 51 to 70 years) were included in the national model (Fig. 1A), in which primary PCI was the treatment of choice (73%), with the median time between admission and primary PCI <120 minutes in every region; “Lisboa” and “Centro” had the longest delays, (orange arrows). Fibrinolysis was performed in 4.5%, with a median time delay <1 hour in every region. In-hospital mortality was 5%, significantly higher for those without reperfusion therapy compared to PCI and fibrinolysis (10% vs. 4% vs. 4%, P<0.001). In the regional model (Fig. 1B) corresponding to the “Centre” region of Portugal divided by districts (n=773, 74% male, 47% from 51 to 70 years), only 61% had primary PCI, with “Guarda” (05:04) and “Castelo Branco” (06:50) showing significant longer delays between diagnosis and reperfusion treatment (orange and red arrows, respectively) than “Coimbra” (01:19) (green arrow); only 15% of patients from “Castelo Branco” had primary PCI. Fibrinolysis was chosen in 10% of patients, mostly in “Castelo Branco” (53%), followed by “Guarda” (30%), with a median time delay of 39 and 48 minutes, respectively. Regarding mortality, PCI and fibrinolysis groups had similar death rates while those patients without reperfusion had higher mortality (5% vs. 3% vs. 13%, P=0.001). Conclusion Process mining tools help to understand referencing networks visually, easily highlighting inefficiencies and potential needs for improvement. The “Centre” region of Portugal has lower rates and longer delay to primary PCI partially due to the geographical reasons, with worse outcomes in remote regions. The implementation of a new PCI centre in one of these districts, is critical to offer timely first-line treatment to their population. Funding Acknowledgement Type of funding sources: None. Figure 1


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
J Borges-Rosa ◽  
M Oliveira-Santos ◽  
M Simoes ◽  
P Carvalho ◽  
G Ibanez-Sanchez ◽  
...  

Abstract Background In ST-segment elevation myocardial infarction (STEMI), time delay between symptom onset and treatment is critical to improve outcome. The expected transport delay between patient location and percutaneous coronary intervention (PCI) centre is paramount for choosing the adequate reperfusion therapy. The “Centre” region of Portugal has heterogeneity in PCI assess due to geographical reasons. Purpose We aimed to explore time delays between regions using process mining (PM) tools. Methods We retrospectively assessed the Portuguese Registry of Acute Coronary Syndromes for patients with STEMI from October 2010 to September 2019, collecting information on geographical area of symptom onset, reperfusion option, and in-hospital mortality. We used a PM toolkit (PM4H – PMApp Version) to build two models (one national and one regional) that represent the flow of patients in a healthcare system, enhancing time differences between groups. One-way analysis of variance was employed for the global comparison of study variables between groups and post hoc analysis with Bonferroni correction was used for multiple comparisons. Results Overall, 8956 patients (75% male, 48% from 51 to 70 years) were included in the national model (Fig. 1A), in which primary PCI was the treatment of choice (73%), with the median time between admission and primary PCI <120 minutes in every region; “Lisboa” and “Centro” had the longest delays, (orange arrows). Fibrinolysis was performed in 4.5%, with a median time delay <1 hour in every region. In-hospital mortality was 5%, significantly higher for those without reperfusion therapy compared to PCI and fibrinolysis (10% vs. 4% vs. 4%, P<0.001). In the regional model (Fig. 1B) corresponding to the “Centre” region of Portugal divided by districts (n=773, 74% male, 47% from 51 to 70 years), only 61% had primary PCI, with “Guarda” (05:04) and “Castelo Branco” (06:50) showing significant longer delays between diagnosis and reperfusion treatment (orange and red arrows, respectively) than “Coimbra” (01:19) (green arrow); only 15% of patients from “Castelo Branco” had primary PCI. Fibrinolysis was chosen in 10% of patients, mostly in “Castelo Branco” (53%), followed by “Guarda” (30%), with a median time delay of 39 and 48 minutes, respectively. Regarding mortality, PCI and fibrinolysis groups had similar death rates while those patients without reperfusion had higher mortality (5% vs. 3% vs. 13%, P=0.001). Conclusion Process mining tools help to understand referencing networks visually, easily highlighting inefficiencies and potential needs for improvement. The “Centre” region of Portugal has lower rates and longer delay to primary PCI partially due to the geographical reasons, with worse outcomes in remote regions. The implementation of a new PCI centre in one of these districts, is critical to offer timely first-line treatment to their population. FUNDunding Acknowledgement Type of funding sources: None. Figure 1


2021 ◽  
Author(s):  
Vitor D.T Andrade ◽  
Pedro Ruas ◽  
Francisco M. Couto

Biomedical literature is the main mean of communication for researchers to share their findings. Since biomedical literature is composed of a large collection of text expressed in natural language, the usage of text mining tools to extract information from those texts automatically is of utmost importance. The problem is that the majority of the state-of-the-art tools were not developed to deal with other languages besides English, which in biomedical literature is even more critical since a significant part of health-related texts is written in the author's native language. To address this issue, this work presents a deep learning NERL (Named Entity Recognition and Linking) system and a parallel corpus for the Spanish and Portuguese languages focused on the oncological domain. Both the system and the corpus are available at https://github.com/lasigeBioTM/ICERL_system-ICR_Corpus.


2021 ◽  
Author(s):  
Lucca Portes Cavalheiro ◽  
Marco Antonio Alves Zanata ◽  
Jean Paul Barddal

Author(s):  
Elizabeth T. Hobbs ◽  
Stephen M. Goralski ◽  
Ashley Mitchell ◽  
Andrew Simpson ◽  
Dorjan Leka ◽  
...  

Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.


2021 ◽  
Vol 17 (2) ◽  
pp. 21-26
Author(s):  
Kaitlyn Hair ◽  
Emily S. Sena ◽  
Emma Wilson ◽  
Gillian Currie ◽  
Malcolm Macleod ◽  
...  

Throughout the global coronavirus pandemic, we have seen an unprecedented volume of COVID-19 researchpublications. This vast body of evidence continues to grow, making it difficult for research users to keep up with the pace of evolving research findings. To enable the synthesis of this evidence for timely use by researchers, policymakers, and other stakeholders, we developed an automated workflow to collect, categorise, and visualise the evidence from primary COVID-19 research studies. We trained a crowd of volunteer reviewers to annotate studies by relevance to COVID-19, study objectives, and methodological approaches. Using these human decisions, we are training machine learning classifiers and applying text-mining tools to continually categorise the findings and evaluate the quality of COVID-19 evidence.


2021 ◽  
Author(s):  
Henrique Vicente ◽  
Alexandre Dias ◽  
Margarida Figueiredo ◽  
Humberto Chaves ◽  
José Neves

Nowadays, the issues related with environment preservation assume an increasing importance. Progressively, more sustainable solutions/techniques are being developed to combat environmental destruction. The decision to include themes related to the environment in the curriculum of technological courses in higher education aims to promote more sustainable behaviors and in an indirect way, increase the environmental literacy of the population. Thus, this study aims to evaluate the environmental literacy focusing on four topics, i.e., air pollution, water pollution, global warming, and energy resources. For this purpose, a questionnaire was developed and applied to a convenience sample, formed by individuals of both genders, aged between 20 and 81 years old. The questionnaire intended to collect data to characterize the sample and assess the literacy regarding environmental issues. In order to carry out the environmental literacy assessment, the respondents were asked to express their degree of agreement with some statements related with the environmental themes mentioned above. The data collected was analyzed using data mining tools. The results suggest that the population’s literacy is satisfactory in relation to some issues, but insufficient in relation to others, equally important, but less disseminated.


Sign in / Sign up

Export Citation Format

Share Document