Wound Data is not Normal

Data science in wound care is a tricky thing. Most data in wound care is not normal. What I mean is that it does not follow the Gaussian distribution (aka Normal Distribution). We run into some funky shapes of data, which take us from the safe predictable land of the Gaussian (or Bell Curve), where things are nice and neat.

The sweet predictable Gaussian…

Wound care data almost never looks like that. Let’s take some real-world data representing 100 starting areas of pressure ulcers of the heel (all are greater than zero). The top of the chart shows the smoothed density of the distribution of the data, and the bottom is the individual data points.

As we can see, the data looks nothing like a Guassian distribution. It peaks on the left of the distribution and has a long tail of data on the right. Let’s look at the data from another perspective, namely the cumulative distribution function of the data:

Here we can see that 50% of the data is approximately 3.5 cm² or less and that approximately 80% of the data is 12 cm² or less. The highest observed area extends to double that. The implication of this non-normality is that we must be very careful with assumptions and statistical machinery that leans on the properties of the normal distribution. This will almost inevitably lead to errors in inference and prediction.

By way of illustration, let’s perform two straightforward tasks (using our 100 data points from above):

  1. Infer the mean starting wound area using a simple Bayesian model

  2. Retrodict (predict backward) the full distribution of starting areas to see how well our model performs

We will perform this analysis using the Gaussian likelihood on the one hand, and the Gamma likelihood on the other, so we can compare and contrast our results. The Gamma distribution is commonly used for data that is non-negative, continuous, and skewed to the right, like our starting wound areas above.

Inferring the Mean Starting Wound Area

After fitting the same simple Bayesian model, using the Gaussian and Gamma likelihoods, our respective posterior distributions of the mean of the starting wound areas are:

The mean of the posterior mean and HDI’s (Highesty-Density Intervals) for both likelihoods are fairly similar. It seems that when estimating the mean, the Guassian likelihood can perform fairly well. However, we can see a subtle difference in the right tail of posterior distributions of the mean, where the Gamma likelihood extends more than 1 cm² past the tail of the Guassian likelihood.

Retrodicting full distribution of starting areas

Using the parameters that were learned from our previous models, we can sample from the posterior predictive distribution to see if our model outputs data that looks like our original data. This is called a Posterior Predictive Check (PPC). This is a good way to visually check if a model makes sense across the full distribution of possibilities and not just at the posterior distribution at the mean.

The PPC lays bare the problem of the Gaussian likelihood when we extend our view past the mean. It predicts negative values upon retrodiction! A wound cannot obviously have a negative area! It does a terrible job of attempting to recreate the shape of the observed data.

However, the Gamma likelihood does far better on the PPC. It is able to retrodict the shape of the original data well, although one could reasonably argue that its tail extends too far to the right. But visually, it is a far better fit than the Gaussian likelihood.

Model Comparison via LOOCV

We can use a formal model-checking method called LOOCV (Leave-One-Out Cross-Validation) to compare our models in a more rigorous way. Essentially, the LOOCV algorithm takes each model, trains all but one of the data points, and then attempts to predict the outstanding data point. It then repeats this for all data points. In effect, it creates, in our case, 100 sub-models for both the Gaussian and Gamma likelihoods and compares how accurate their forward predictions are. By so doing it quantifies the relative compatibility of the model with the data.

The LOOCV clearly shows that the Gamma likelihood is more compatible with the data than the Gaussian likelihood.

Towards a new Normal

If we are serious about leveraging RWE (Real-World Evidence) in wound care, we are going to have to leave the Gaussian likelihood mostly behind. The outcomes that we care about, like wound area (and by extension PAR (Percent Area Reduction)), time-to-heal, wound healing rate, wound costs, predicted pressure ulcer stage, pain on the VAS scale, etc. are all non-Guassian.

The use of non-Gaussian likelihoods should become the norm in wound-care data science. This is true whether you are a Likelihoodist, Frequentist, or, like me, a Bayesian.

Next
Next

The Wound Healing Rate is not what you think it is