scholarly journals Crowdsourced-Data Normalization with Python and Pandas

2021 ◽  
Author(s):  
Halle Burns

Pandas is a popular and powerful package used in Python communities for data handling and analysis. This lesson describes crowdsourcing as a form of data creation as well as how pandas can be used to prepare a crowdsourced dataset for analysis. This lesson covers managing duplicate and missing data and explains the difficulties of dealing with dates.

Author(s):  
Craig K. Enders ◽  
Amanda N. Baraldi

Author(s):  
Pedro J. García-Laencina ◽  
Juan Morales-Sánchez ◽  
Rafael Verdú-Monedero ◽  
Jorge Larrey-Ruiz ◽  
José-Luis Sancho-Gómez ◽  
...  

Many real-word classification scenarios suffer a common drawback: missing, or incomplete, data. The ability of missing data handling has become a fundamental requirement for pattern classification because the absence of certain values for relevant data attributes can seriously affect the accuracy of classification results. This chapter focuses on incomplete pattern classification. The research works on this topic currently grows wider and it is well known how useful and efficient are most of the solutions based on machine learning. This chapter analyzes the most popular and proper missing data techniques based on machine learning for solving pattern classification tasks, trying to highlight their advantages and disadvantages.


2004 ◽  
Vol 3 (6) ◽  
pp. 1210-1218 ◽  
Author(s):  
Jinsook Chang ◽  
Holly Van Remmen ◽  
Walter F. Ward ◽  
Fred E. Regnier ◽  
Arlan Richardson ◽  
...  

RMD Open ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e001708
Author(s):  
Nasim A Khan ◽  
Karina D Torralba ◽  
Fawad Aslam

ObjectivesTo analyse the amount, reporting and handling of missing data, approach to intention-to-treat (ITT) principle application and sensitivity analysis utilisation in randomised clinical trials (RCTs) of rheumatoid arthritis (RA). To assess the trend in such reporting 10 years apart (2006 and 2016).MethodsParallel group drug therapy RA RCTs with a clinical primary endpoint.Results176 studies enrolling a median of 160 (IQR 62–339) patients were eligible. In terms of actual analysis: 81 (46%) RCTs conducted ITT, 42 (23.9%) conducted modified ITT while 53 (30.1%) conducted non-ITT analysis. Only 58 of 97 (59.8%) RCTs reporting an ITT analysis actually performed it. The median (IQR) numbers of participants completing the trial and included in analysis for primary outcome were 86% (74%–91%) and 100% (97.1%–100%), respectively. 53 (32.7%) and 65 (40.1%) RCTs had >20% and 10%–20% missing primary outcome data, respectively. Missing data handling was unreported by 58 of 171 (33.9%) RCTs. When reported, vast majority used simple imputation methods. No significant trend towards improved reporting was seen between 2006 and 2016. Sensitivity analysis numerically improved from 2006 to 2016 (14.7% vs 21.4%).ConclusionsThere is significant discrepancy in the reported and the actual performed analysis in RA drug therapy RCTs. Nearly one-third of RCTs had >20% missing data. The reporting and methods of missing data handling remain inadequate with high usage of non-preferred simple imputation methods. Sensitivity analysis utilisation was low. No trend towards better missing data reporting and handling was seen.


Sign in / Sign up

Export Citation Format

Share Document