scholarly journals Data extraction methods: an analysis of internal reporting discrepancies in single manuscripts and practical advice

2020 ◽  
Vol 117 ◽  
pp. 158-164 ◽  
Author(s):  
Livia Puljak ◽  
Nicoletta Riva ◽  
Elena Parmelli ◽  
Marien González-Lorenzo ◽  
Lorenzo Moja ◽  
...  
2018 ◽  
Author(s):  
Jordan Carlson ◽  
J. Aaron Hipp ◽  
Jacqueline Kerr ◽  
Todd Horowitz ◽  
David Berrigan

BACKGROUND Image based data collection for obesity research is in its infancy. OBJECTIVE The present study aimed to document challenges to and benefits from such research by capturing examples of research involving the use of images to assess physical activity- or nutrition-related behaviors and/or environments. METHODS Researchers (i.e., key informants) using image capture in their research were identified through knowledge and networks of the authors of this paper and through literature search. Twenty-nine key informants completed a survey covering the type of research, source of images, and challenges and benefits experienced, developed specifically for this study. RESULTS Most respondents used still images in their research, with only 26.7% using video. Image sources were categorized as participant generated (N = 13; e.g., participants using smartphones for dietary assessment), researcher generated (N = 10; e.g., wearable cameras with automatic image capture), or curated from third parties (N = 7; e.g., Google Street View). Two of the major challenges that emerged included the need for automated processing of large datasets (58.8%) and participant recruitment/compliance (41.2%). Benefit-related themes included greater perspectives on obesity with increased data coverage (34.6%) and improved accuracy of behavior and environment assessment (34.6%). CONCLUSIONS Technological advances will support the increased use of images in the assessment of physical activity, nutrition behaviors, and environments. To advance this area of research, more effective collaborations are needed between health and computer scientists. In particular development of automated data extraction methods for diverse aspects of behavior, environment, and food characteristics are needed. Additionally, progress in standards for addressing ethical issues related to image capture for research purposes are critical. CLINICALTRIAL NA


2020 ◽  
pp. 5-9
Author(s):  
Manasvi Srivastava ◽  
◽  
Vikas Yadav ◽  
Swati Singh ◽  
◽  
...  

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Irvin Dongo ◽  
Yudith Cardinale ◽  
Ana Aguilera ◽  
Fabiola Martinez ◽  
Yuni Quintero ◽  
...  

Purpose This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.


2021 ◽  
Author(s):  
Liam Rose ◽  
Linda Diem Tran ◽  
Steven M Asch ◽  
Anita Vashi

Objective: To examine how VA shifted care delivery methods one year into the pandemic. Study Setting: All encounters paid or provided by VA between January 1, 2019 and February 27, 2021. Study Design: We aggregated all VA paid or provided encounters and classified them into community (non-VA) acute and non-acute visits, VA acute and non-acute visits, and VA virtual visits. We then compared the number of encounters by week over time to pre-pandemic levels. Data Extraction Methods: Aggregation of administrative VA claims and health records. Principal Findings: VA has experienced a dramatic and persistent shift to providing virtual care and purchasing care from non-VA providers. Before the pandemic, a majority (63%) of VA care was provided in-person at a VA facility. One year into the pandemic, in-person care at VA's constituted just 33% of all visits. Most of the difference made up by large expansions of virtual care; total VA provided visits (in person and virtual) declined (4.9 million to 4.2 million) while total visits of all types declined only 3.5%. Community provided visits exceeded prepandemic levels (2.3 million to 2.9 million, +26%). Conclusion: Unlike private health care, VA has resumed in-person care slowly at its own facilities, and more rapidly in purchased care with different financial incentives a likely driver. The very large expansion of virtual care nearly made up the difference. With a widespread physical presence across the U.S., this has important implications for access to care and future allocation of medical personnel, facilities, and resources.


2000 ◽  
Vol 15 (1) ◽  
pp. 1-8 ◽  
Author(s):  
Steven N. Blair ◽  
Ming Wei

Purpose. To evaluate the relation of physical activity and cardiorespiratory fitness to morbidity, mortality, and functional limitations in older persons. Data Sources. We reviewed published reports related to the review's purpose. Sources were identified from recent major reports and position statements from scientific and public health organizations, our files, and reference lists of published papers. Study Inclusion and Exclusion Criteria. We included prospective epidemiological studies and clinical trials published in the peer-reviewed literature that included data from age groups of people 60 years and older. We evaluated study methods and included studies that used valid measures of exposures, clearly specified outcomes, and controlled for confounders. Data Extraction Methods. We extracted by detailed review data on sample characteristics, outcomes, and rates and relative risks. Data Synthesis. Extracted data were included in tables, figures, or the text and were synthesized by nonquantitative methods. Major Conclusions. Active and fit individuals were at much lower risk for morbidity, mortality, and loss of function when compared with sedentary and unfit persons. Data from the studies generally conformed to a steep inverse dose-response gradient across activity or fitness categories. Results were consistent, temporally appropriate, strong, and graded, and therefore support a causal hypothesis that a fit and active way of life improves health and function in older individuals.


2021 ◽  
Author(s):  
Naomi Shinotsuka ◽  
Franziska Denk

AbstractChronic pain and its underlying biological mechanisms have been studied for many decades, with a myriad of molecules, receptors and cell types known to contribute to abnormal pain sensations. We now know that besides an obvious role for neuronal populations in the peripheral and central nervous system, immune cells like microglia, macrophages and T cells are also important drivers of persistent pain. While neuroinflammation has therefore been widely studied in pain research, there is one cell-type that appears to be rather neglected in this context: the humble fibroblast.Fibroblasts may seem unassuming, but actually play a major part in regulating immune cell function and driving chronic inflammation. What is known about them in the context chronic pain?Here we set out to analyze the literature on this topic – using systematic screening and data extraction methods to obtain a balanced view on what has been published. We found that there has been surprisingly little research in this area: 134 articles met our inclusion criteria, only a tiny minority of which directly investigated interactions between fibroblasts and peripheral neurons. We categorized the articles we included – stratifying them according to what was investigated, the estimated quality of results, and any common conclusions.Fibroblasts are a ubiquitous cell type and a prominent source of many pro-algesic mediators in a wide variety of tissues. We think that they deserve a more central role in pain research and propose a new, testable model of how fibroblasts might drive peripheral neuron sensitization.


VASA ◽  
2020 ◽  
Vol 49 (2) ◽  
pp. 87-97 ◽  
Author(s):  
Endre Kolossváry ◽  
Tamás Ferenci ◽  
Tamás Kováts

Summary. Although more and more data on lower limb amputations are becoming available by leveraging the widening access to health care administrative databases, the applicability of these data for public health decisions is still limited. Problems can be traced back to methodological issues, how data are generated and to conceptual issues, namely, how data are interpreted in a multidimensional environment. The present review summarised all of the steps from converting the claims data of administrative databases into the analytical data and reviewed the wide array of sources of potential biases in the analysis of such data. The origins of uncertainty of administrative data analysis include uncontrolled confounding due to a lack of clinical data, the left- and right-censored nature of data collection, the non-standardized diagnosis/procedure-based data extraction methods (i.e., numerator/denominator problems) and additional methodological problems associated with temporal and spatial analyses. The existence of these methodological challenges in the administrative data-based analysis should not deter the analysts from using these data as a powerful tool in the armamentarium of clinical research. However, it must be done with caution and a thorough understanding and respect of the methodological limitations. In addition to this requirement, there is a profound need for pursuing further research on methodology and widening the search for other indicators (structural, process or outcome) that allow a deeper insight how the quality of vascular care may be assessed. Effective research using administrative data is based on strong collaboration in three domains, namely expertise in claims data handling and processing, the clinical field, and statistical analysis. The final interpretations of results and the countermeasures on the level of vascular care ought to be grounded on the integrity of research, open discussions and institutionalized mechanisms of science arbitration and honest brokering.


2021 ◽  
Vol 1 (2) ◽  
pp. 125-133
Author(s):  
Hindreen Rashid Abdulqadir ◽  
Adnan Mohsin Abdulazeez ◽  
Dilovan Assad Zebari

Diabetes may be predicted and prevented by exploring critical diabetes characteristics by computational data extraction methods. This study proposed a system biology approach to the pathogenic process to identify essential biomarkers as drug targets. The fact that disease recognition and investigation require many details, data mining plays a critical role in healthcare. This study aims to evaluate the efficiency of the methods used that are based on classification. Besides, the researchers have highlighted the most widely employed techniques and the strategies with the best precision. Many analyses include multiple Machine Learning algorithms for various disease assessments and predictions to improve overall issues. The detection and prediction of diseases is an aspect of classification and prediction. This paper estimates diabetes by its key features and also categorizes the relations between conflicting elements. The recursive random forest removal function provided a significant feature range. Random Forest Classifier investigated the diabetes estimate. RF offers 75,7813 greater precisions than Support Vector Machine (SVM).and may assist medical professionals in making care decisions.


Sign in / Sign up

Export Citation Format

Share Document