ABSTRACTAs sequencing read length has increased, researchers have quickly adopted longer reads for their experiments. Here, we examine host-pathogen interaction studies to assess if using longer reads is warranted. Six diverse datasets encountered in studies of host-pathogen interactions were used to assess what genomic attributes might affect the outcome of differential gene expression analysis including: gene density, operons, gene length, number of introns/exons, and intron length. Principal components analysis, hierarchical clustering with bootstrap support, and regression analyses of pairwise comparisons were undertaken on the same reads, looking at all combinations of paired and unpaired reads trimmed to 36,54,72, and 101-bp. For E coli, 36-bp single end reads performed as well as any other read length and as well as paired end reads. For all other comparisons, 54-bp and 72-bp reads were typically equivalent and different from 36-bp and 101-bp reads. Read pairing improved the outcome in several, but not all, comparisons in no discernable pattern, such that using paired reads is recommended in most scenarios. No specific genome attribute appeared to influence the data. However, experiments with an a priori expected greater biological complexity had more variable results with all read lengths relative to those with decreased complexity. When combined with cost, 54-bp paired end reads provided the most robust, internally reproducible results across all comparisons. However, using 36-bp single end reads may be desirable for bacterial samples, although possibly only if the transcriptional response is expected a priori to be robust.DATA SUMMARYThe human only CSHL Encode data set (1) was downloaded from ftp://hgdownload.cse.ucsc.edu/goldenPath/hgl9/encodeDCC/wgEncodeCshlLongRnaSeq/.The data from mice vaginas infected with Candida albicans (2) was downloaded from the SRA (url - https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP057050).The data from Aspergillus fumigatus cells in contact with human cells was downloaded from the SRA (url - https://www.ncbi.nlm.nih.gov/bioproject/399754).The data from a strand-specific library from a study comparing C. albicans cells in contact with human cells with those in media (3) was downloaded from the SRA (url - https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP011085).The data from C. albicans in culture media (3) was downloaded from the SRA (url - https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP011085).The data from Escherichia coli grown in different media (4) was downloaded from the SRA (url - https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP056578).I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTAs sequencing technologies improve, sequencing costs decrease and read lengths increase. We examine host-pathogen interaction studies to assess if using these longer reads is warranted given their increased cost relative to using the same number of shorter reads. To this end we compared the use of various read lengths and read pairing for six diverse host-pathogen datasets with varying genomic attributes including: gene density, operons, gene length, number of introns/exons, and intron length. We find that in the bacterial sample, 36-bp single end reads performed as well as any other read length and as well as paired end reads. When combined with cost, 54-bp paired end reads provided the most robust, internally reproducible results for all other comparisons. Read pairing improved the outcome in several, but not all, comparisons in no discernable pattern, such that using paired reads is recommended in most scenarios. No specific genome attribute appeared to influence the data.