We’ve been seeing increased prevalence of barcode error recently, especially on NovaSeq X. Knowing that it is sensitive to loading concentrations, we ran a range of loading concentrations across 6 individual lanes of a NovaSeq X 10B flow cell, with the end goal of determining the optimal loading concentration to avoid under clustering and an increased barcode error rate.
Based on this information, we recommend that the optimal loading concentration for our Single Cell RNA Sequencing Kit libraries was around 120 – 140 pM to lower barcode error while avoiding risk of underloading (note, these concentrations are reported as the final loading concentrations after dilute and denature!).
But wait, this is higher barcode error than we’d expect for these libraries.
Barcode error is likely caused by short fragment bias on the NSX, a known phenomenon with the new XLEAP chemistry. Short fragments naturally occur in our libraries due to the tagmentation reaction, however they typically make up < 0.5% of the total library.
These short fragments (150-180 bp) are generally removed during bead cleanup. This is verified by the library size on a Bioanalyzer or TapeStation trace. Any small fragments with RT, Ligation, UMI, or Linker barcode errors that may remain are bioinformatically categorized as “errors.” In conjunction with NSX short fragment bias, small fragments that do not appear on the trace are overclustered and then bioinformatically identified.
In our testing, we have seen this more prevalently on NovaSeq X, but it can happen with any library and any sequencer, as a byproduct of tagmentation.
But does this impact your data? No, it does not, as demonstrated below, with comparable library quality metrics across the NS6k and the NSX.
TLDR; short fragment bias from the sequencer does NOT impact data quality of passing data. The Scale Bio Seq Suite: RNA pipeline identifies and excludes short fragments through trimming, and detects the passing sample reads.
Interested in running your libraries on NovaSeq X?
For the 10B flow cell, start at 120 – 140 pM final loading concentration and at least 1% PhiX. Adding more PhiX won’t help with barcode error, but follow your core’s recommendation for PhiX spike in.
For the 25B flow cell, the general recommendation from the field has been to start a little higher, around 185 pM.
Moving Forward with Confidence
While our investigation revealed some nuances in NovaSeq X sequencing, the key takeaway is reassuring: despite the observed barcode error rates, your data quality remains robust and reliable. We've provided specific, tested recommendations to help you achieve optimal results from day one. As the sequencing landscape continues to evolve, we're committed to providing you with practical, data-driven guidance to support your research success.
Remember, our support team at support@scale.bio is always ready to help you navigate these waters. Together, we're pushing the boundaries of what's possible in single cell analysis while ensuring the highest standards of data quality. Feel free to also contact us below if you have further questions.