New Suggestion Algorithm

Determining what is or is not a significant shift in the microbiome is challenging. My original algorithm on
http://microbiomeprescription.com/ was based on reverse engineering the normative values that uBiome appear to be working from. This was a quick and dirty solution — what was possible a year ago. Doing some R analysis of the contributed microbiomes and rethinking the issue of determining what is atypical, I borrow some algorithms/observations from a work project whose goal was to detect abnormal behavior of computer systems.

Both microbiomes and complex computer system tend to share similar challenges: they are not normal distributions and often long tailed (skewed) which means that averages and standard deviation often produce poor results for detecting abnormal values.

We will Box Up the Issue!

A common process in filtering data for machine learning etc, is excluding outliers. We are actually interested in finding the outliers! This is often done by boxplots. An example of some of phylum level bacteria is shown below. (Note 1.0 = 100%)

outliers are the round circles

And we can do it to lower levels, for example, order

Down to Species

The solid black line is the median (almost an average). For B.Vulgatus we see that the range of values from 25%ile to median is almost the same as median to 75%ile. For B. uniformis, this is very different.

New Algorithm looks for these outliers and only deem these to be significant. For a post showing more details, click here.

What is the main difference?

I put several samples thru comparing the NUMBER of bacteria shifts deemed significant. The first number is with the old algorithm, the second number is with this new algorithm.

  • 181 -> 55
  • 193 -> 56
  • 160 -> 43
  • 133 -> 33
  • 133 -> 29

In theory this means that we are much more focus on the major shifts and not any shift. You can see the number of items identified on the suggestion page.

Where you will see it:
http://microbiomeprescription.com/email/analysis?….

Again — this is experimental (as the entire site is).