The taxonomy nightmare before Christmas…

This post is intended to educate people more on the technical aspects of the microbiome. I am not talking about taking 4 samples from one stool and sending it to 4 different testing company. I am talking about one sample sent to one testing company which then provided their analysis and a FASTQ file. The raw data.

What is a FASTQ file (besides being megabytes big)? It is the DNA (technically the RNA) of the bacteria in the stool. It looks like this (using the 4 letters that DNA has):

CCGGACTACACGGGTTTCTAATCCTGTTTGATACCCACTCTTTCGAGCATCAGTGTCAGTTGCAGTCCAGTGAGCAGCCTTCGCAATCGGAGTTCATCGTTATATCTAAGCATTTCACCGCTACACAACGAATTCCGCACACCTCTA

The file that I am using as text would be around 16 megabytes. This data comes from a lab machine. The company then processes it through their software to match up sequences to bacteria.

In this post, I am using the FASTQ from uBiome and getting reports on the bacteria from:

  • ubiome
  • thryve inside
  • biomesight
  • sequentia biotech.

Naively, one would expect almost identical results. What I got is shown in detail below. At a high level we had the following taxa counts reported

  • ubiome – 253
  • thryve inside – 632
  • biomesight – 558
  • sequentia biotech 366

I did a more technical post on my other blog. From some providers, a taxonomy may be 40% on another 2% or even none… ugly!

Standards seekers put the human microbiome in their sights, 2019

The headaches!

Number One Issue: You cannot, repeat cannot, compare a taxonomy report from one lab with another. EVER!

  • I have 8 uBiome reports and 2 Thryve reports. I can compare the uBiome to each other and the Thryve to each other. I can never mix their direct taxonomy reports !

Number Two Issue: If I wish to compare different lab reports, I MUST obtain the FastQ files from each lab and process them thru the same provider. The FastQ files are the raw data! For me, I prefer to push them through multiple providers which means that the 10 reports suddenly become 40 or 50 different reports in my site.

My Headaches

I need to revise my site to show data by specific provider (while keeping the across all provider data still available). A lot of pages to revise and test.