Interpreting the knife blade DNA results in the trial of Amanda Knox and Raffaele Sollecito

Introduction

The most important and apparently decisive evidence advanced by the prosecution was DNA evidence found on a knife seized by the police from the kitchen of Raffaele Sollecito’s apartment, and on the hook of the bra worn by the victim, discovered in the room where the murder was committed. According to the Scientific Police – Amanda Knox’s DNA profile was present on the handle of the knife and that of Meredith Kercher on the blade.

This article discusses whether Meredith Kercher’s DNA was on the knife blade, or whether the result was due to contamination.

Evidence

According to the Hellmann report, Dr Stefanoni "performed a second electrophoretic run of the same amplified sample and compared the two graphs". Thus we have two sets of readings, which are set out below:

Run 1: {41,28}, {23,0}, {48,15}, {87}, {47,32}, {30,29}, {32,27}, {21,45}, {55,27}, {99,36}, {41,25}, {30,51}, {75,39}, {113,36}, {27,43}

Run 2: {108,55}, {80,35}, {64,51}, {140}, {33,37}, {17,0}, {56,39}, {42,0}, {32,66}, {49,61}, {57,0}, {65,76}, {40,0}, {53,59}, {0,0}

Here each number is the height of a peak in the electropherogram, representing (in rough terms) the amount of DNA found at that genetic location.

For each item enclosed in brackets { }, except as noted below, we have a pair of numbers, one marker is inherited from Meredith’s father, and one from her mother.

For the fourth item in each run, only a single number is given : this is because for this location in Meredith’s DNA, the marker from here mother and father happens to be the same.

There are some missing peaks (zeros) : just one in the first run, the second location, {23,0}. In the second run there are 6 missing peaks.

A problem

The major problem here is simply that the two runs did not give similar results. This is quite inexplicable if we accept that the second run was analysing the same amplified sample. Electrophoresis is an accurate process, and the results should be essentially the same, For example, at the ninth location, we have {55,27} in the first run and {32,66} in the second. At the 14th location we have {113,36} in the first run and {53,59} in the second. This is simply impossible if we are analysing the same sample. For examples of multiple electrophoresis runs on the same sample see this blog post

http://forensicdnaconsulting.wordpress.com/2012/04/04/multiple-injections-of-same-sample-capillary-electrophoresis/

Also, by looking at the 5-digit sample numbers in http://kermit-analysis.wikispaces.com/file/view/Quantificazione.pdf ( some hand-written, others printed ), we can see that Stefanoni’s method of working was to divide each sample into two wells for amplification. There is no reason why she would have deviated from this method for this particular sample.

Thus my first conclusion (contrary to Hellmann) is that either Stefanoni was confused, or she was misunderstood by the court. These cannot possibly be the same amplified sample, and we should assume that the sample was split into two wells before amplification, the same as for the hundreds of samples shown in the "Quantificazione" pdf linked above.

The main issue

The main issue is whether the results were due to contamination. When a DNA sample is analysed, it is first "amplified", that is a chemical reaction is performed known as PCR (Polymerase Chain Reaction), which duplicates selected portions of the DNA. This produces millions or even billions of copies, which can then be accurately measured using electrophoresis. Thus there is an ever-present risk of contamination of a sample by the amplified material from an earlier test.

In fact the results from the two runs, now assumed to be distinct amplifications, very strongly suggest contamination.

The difference ( between contamination and no contamination )

How is it possible to tell the difference? Well, the difference is that when the DNA from a small number of cells is analysed, we start with an equal number of copies of each marker. Thus we expect peak heights to be approximately equal (after combining the two runs). Why not exactly equal? Well, the chemical reaction is not 100% reliable, in fact a duplication fails 20% of the time. This means that there can be some variation in the output. The variation will tend to be larger when we start with a small number of initial copies.

By contrast, when the sample is contaminated, we do not start with an equal number of copies of each marker – it is a small random sample from a practically infinite source ( the billions of fragments from earlier tests ).

So the big question is : can the variations (imbalances) seen be attributed to chance variations in the amplification process?

We can find out by running a large number of simulations of the amplification process, and see how often the imbalance in the peaks matches (or exceeds) the imbalance in the actual results.

Simulation results

As explained above, the variation will tend to be larger starting with a small number of initial copies. We don’t need to consider a single copy, since at many locations there are peaks in both runs.

In fact there are just two cases to be considered : two initial copies and three initial copies.

For three copies, after 10 million simulations, only 23 times were the peaks as imbalanced as the actual results, a 1 in 435,000 chance.

For two copies, after 100,000 simulations, only 110 times were the peaks as imbalanced as the actual results, a 1 in 900 chance. But two copies is unlikely for another reason : in the first run, there is only one missing peak. Starting with two copies, the chance of this happening is only 1 in 350. Considering this could have happened in either run, lets say 1 in 175. Multiplying the odds together, we see that for two copies we have a 1 in 157,000 chance.

The program used to perform these simulations is at http://kermit-analysis.wikispaces.com/file/view/montecarlo.txt

Conclusion

It’s highly improbable these results could occur without contamination, and thus PCR contamination is proven with high probability.