Large swathes of dark-matter (not-sequenced) in the SARS-Cov2 spike protein in significant number of samples in GISAID - probably due to ARTIC-primer artifacts - which will mask real mutations in these genomic regions, and where/when some mutations arose.
The Covid19 pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-Cov2 [1, 2]) has caused significant mortality globally [3], along with severe socio-economic damage [4, 5]. Many vaccines have been given emergency authorization in different countries [6,7]. Mutations raise concerns about these vaccines efficiencies [8] and re-infections [9]. Genome sequencing has been deployed globally to analyze these variants [10,11]. Among different methods, amplicon sequencing using a set of ∼ 100 primers (ARTIC) was adopted in early Jan 2020 (https://www.protocols.io/view/ncov-2019-sequencing-protocol- bbmuik6w). However, subsequent studies found that, while clinical samples with relatively high viral loads had no amplification bias, with lower viral loads there was a significant decrease in abundances of several amplicons [12,13]. This led to newer versions of these primers, the current one being V3.Here, I report large swathes of *dark matter* (not sequenced) in multiple parts of the spike protein - these exact protein sequences occur in different countries in different time-frames, upto the latest data submitted from South Africa about the B.1.351 variant (Accid:PRJNA694014) [14]. While these are ARTIC-primer artifacts, real mutations in these genomic regions will escape detection. Also, this will give us a wrong estimate of when certain mutations actually arose in the population - and in which country.