The Problem of Probabilistic Genotyping

Recent years have seen the progress of DNA profiling from relatively crude methods that could obtain results from visible body fluids such as blood and saliva. Because of the limited areas of DNA being profiled, it was necessary to produce statistics to estimate the probability that the profile could be obtained from anyone other than the suspect; such statistics were in the millions. Improvements in sensitivity and specificity mean that we can now profile invisible stains and produce much greater statistics. With these improvements, a ‘ceiling’ statistic of a billion (thousand million) was introduced in the UK (but not in the USA) to ensure confidence in the reliability of the calculations involved. Therefore, a straightforward single-person profile is normally reported to have a chance of one in a billion of coming from another unrelated person.

Mixtures of DNA from different people can be notoriously difficult to assess. Until recently, some of these were regarded by DNA analysts as uninterpretable. Recent trials in which we have been involved have seen the introduction of statistics using complex and not widely accepted computer programmes to produce statistics for profiles that the DNA expert has been unable to provide statistics using conventional methods. These programmes are collectively called probabilistic genotyping systems. 

Debate continues within the scientific community even about the best way to interpret mixtures with clear, unambiguous alleles. Other mixtures exhibit the features normally associated with Low Template DNA (LTDNA); that is, variable results. The interpretation of such profiles remains controversial with various approaches being proposed.  Despite that, courts appear loathe to reject the apparently compelling statistics that these programmes produce.

Recent trials in which we have been involved have seen the introduction of evidence from such statistical software programmes. A trial in Oxford in 2010 heard testimony from statisticians from both the UK and the USA, presenting similar, yet different, computer models of the statistical evaluation of complex DNA mixtures. The judge ruled that the statistics from the UK-based statistician (using LikeLTD) be admitted to evidence, but ruled that statistics from the American programme (TrueAllele) as ‘not yet ready’ for admission. The accused was convicted.

In a subsequent trial in Northern Ireland (concluded in January 2012), the same American expert provided testimony of his statistical programme (TrueAllele). Its admissibility was challenged, but the judge ruled that that the system was at a stage that it can be regarded as reliable and admitted to evidence. The statistics that were generated by this programme went into the trillions; therefore providing apparently even better evidence from complex mixtures than what is provided from a straightforward single source profile. This is illogical. One accused was convicted, while the other was found not guilty. 

 

In a California case in which we were involved Cybergenetics actually threatened to withdraw their evidence when ordered by the trial judge to disclose the software code.  The trial judge's ruling was overturned on appeal.  Other software, such as LikeLTD, is freely available on the internet.

In February 2012, statistics provided by the same programme as in the above Oxford case were presented in evidence in a murder trial in Liverpool. Professor Allan Jamieson gave evidence in which he accepted the conclusion of the Crown’s DNA expert that the profiles were not capable of conventional statistical analysis.  However, he challenged the reliability of the statistical programme. The defendant was found not guilty.

More recently, in New York, we were part of a team which successfully challenged the use of another programme (FST) used by the Office of the Chief Medical Examiner (OCME) in a Frye hearing.  Professor Jamieson also gave evidence in another high profile case involving OCME’s low template DNA method as well as their FST probabilistic genotyping system.  Both defendants were found not guilty (the press appeared to lose interest at that point!).

We have seen the statistics from these programmes change significantly during investigations.  In a trial in Pittsburgh, USA, the TrueAllele programme provided results of 100million (as a 'preliminary' figure) then about 2 billion, but this increased to 5 billion as a result of 'improvements to the programme.  The vendor of the programme (Cybergenetics) staunchly refuses to disclose the software code to enable checking that the software implements the published statistical model.  Nevertheless, the defendant was found not guilty of a double homicide.

Closer to home, in our first two cases involving STRMix, the prosecution withdrew the evidence pre-trial when it was obvious that there was to be a serious challenge to the validation of the software in this jurisdiction.  Ultimately, we were never provided with the software code as we refused to sign a non-disclosure agreement which we considered prohibitive and included a clause requiring us not to disclose the fact that we had signed a non-disclosure agreement!  Examination of the software proved unnecessary given developments at court.

Of course, it is impossible to know the effect that this evidence had on the various decisions. But, aside from the novelty of such programmes and the current debate regarding how to interpret ‘normal’ mixtures, the acceptance of these statistical models by the prosecution appears contrary to the increasing recognition that scientific techniques used in courts (and it is arguable whether statistics is a science per se) should be tested for their reliability before being used.

The International Society of Forensic Genetics stated,

“ The implementation of such an approach in routine casework, in particular when using a computer-based expert system for mixture interpretation, requires an extensive validation of the variable parameters such as Hb and Mx, as well as appropriate guidelines for all laboratory procedures.”

The National Academy of Sciences of the United States 2009 report on the state of forensic science was clear,

The simple reality is that the interpretation of forensic evidence is not always based on scientific studies to determine its validity. This is a serious problem. Although research has been done in some disciplines, there is a notable dearth of peer-reviewed, published studies establishing the scientific bases and validity of many forensic methods. …

However, some courts appear to be loath to insist on such research as a condition of admitting forensic science evidence in criminal cases, perhaps because to do so would likely “demand more by way of validation than the disciplines can presently offer.” …

The bottom line is simple: In a number of forensic science disciplines, forensic science professionals have yet to establish either the validity of their approach or the accuracy of their conclusions, and the courts have been utterly ineffective in addressing this problem.”

The President’s Council of Advisors on Science and Technology (PCAST) is an advisory group of leading scientists and engineers, appointed by the President of the United States to provide scientific advice.   Crucially, these advisers are primarily drawn not from the forensic science community but are authoritative experts in a range of scientific fields; in effect they are external scientific reviewers.  In September 2016, PCAST released a critique of several methods used in ‘forensic science’, including the interpretation of mixed DNA profiles. 

PCAST noted;

“Judges’ decisions about the admissibility of scientific evidence rest solely on legal standards; they are exclusively the province of the courts and PCAST does not opine on them. But, these decisions require making determinations about scientific validity.

 It is the proper province of the scientific community to provide guidance concerning scientific standards for scientific validity ...

“The fundamental difference between DNA analysis of complex-mixture samples and DNA analysis of single-source and simple mixtures lies not in the laboratory processing, but in the interpretation of the resulting DNA profile.  ...

 probabilistic genotyping software programs clearly represent a major improvement over purely subjective interpretation. However, they still require careful scrutiny to determine

(1) whether the methods are scientifically valid, including defining the limitations on their reliability (that is, the circumstances in which they may yield unreliable results) and

 (2) whether the software correctly implements the methods. This is particularly important because the programs employ different mathematical algorithms and can yield different results for the same mixture profile.

Appropriate evaluation of the proposed methods should consist of studies by multiple groups, not associated with the software developers, that investigate the performance and define the limitations of programs by testing them on a wide range of mixtures with different properties.”

 

We continue to challenge this interpretation of DNA data, and maintain our view that it is time for a comprehensive, independent, and authoritative review of the reliability and limitations of software programmes being used to provide statistical estimates from DNA profiles in court.  The recent scandal involving software in Volkswagen cars shows the necessity of looking closely at the software to see how it operates rather than simply looking at the output.  These reviews need to involve known, real mixtures in conditions found in casework.  The Scientific Working Group on DNA Analysis Methods has recently published guidelines for validation of such software.  It’s a start but not the whole solution.

Updated April 2017

See also 'something from nothing' and 'Y is that?'.

Website by WDG