Thursday, October 20, 2016

What error rate would justify excluding non-science-based forensics?

A recent report from the President's Council of Advisers on Science and Technology renewed concerns first raised by the National Academy of Sciences in 2009 about the lack of scientific foundation for many if not most commonly used forensics besides DNA and toxicology. Our friends at TDCAA shared on their user forum a link to the first federal District Court ruling citing the PCAST report, focused in this instance on ballistics matching.

The federal judge out of Illinois admitted ballistics evidence despite the PCAST report because he considered estimated false-positive rates relatively low. Here's the critical passage on that score:
PCAST did find one scientific study that met its requirements (in addition to a number of other studies with less predictive power as a result of their designs). That study, the “Ames  Laboratory study,” found  that toolmark analysis has a false positive rate between 1 in 66 and 1 in 46. Id. at 110. The next most reliable study, the “Miami-Dade Study” found a false positive rate between 1 in 49 and 1 in 21. Thus, the defendants’ submission places the error rate at roughly 2%. The Court finds that this is a sufficiently low error rate to weigh in favor  of  allowing  expert  testimony. See  Daubert  v.  Merrell  Dow  Pharms.,  509  U.S.  579,  594 (1993) (“the court ordinarily should consider the known or potential rate of error”); United States v. Ashburn, 88 F. Supp. 3d 239, 246 (E.D.N.Y. 2015) (finding error rates between 0.9 and 1.5% to favor admission of expert testimony); United States v. Otero, 849 F. Supp. 2d 425, 434 (D.N.J. 2012)  (error  rate  that  “hovered  around  1  to  2% ”  was  “low”  and  supported  admitting  expert testimony).  The  other  factors  remain  unchanged  from  this  Court’s  earlier  ruling  on  toolmark analysis.
Using a 2 percent error rate could understate things: The error rates from the studies he cited ranged from 1.5 to 4.8 percent, so it could be twice that high (1 in 21). Still, I'm not surprised that some judges might consider an error rate of 1.5 to 4.8 percent acceptable. And the judge is surely right that the PCAST  report provides a new basis for cross-examining experts and reduces the level of certainty about their findings which experts can portray to juries, so that's a plus.

OTOH, an erroneous ballistics match - and even though analysts can't use the word "match" any more, it's how jurors will inevitably view such testimony - will loom large for jurors and be highly prejudicial as evidence. So if you're the unlucky one in 49, one in 21, or whatever the real number is of people falsely accused by ballistics comparisons, jurors are likely to go with the so-called "expert" and the defendant is basically screwed.

Grits has estimated before that two to three percent of criminal convictions involve actually innocent defendants - not too different than the judge's error rate he considers allowable on ballistics. But that rate gets you to thousands of unexonerated people sitting in Texas prisons alone, with many more on probation, parole, and who have already completed their sentences. Given the volume of humanity which churns through the justice system, two or three percent is quite a significant number of people.

I'm curious as to Grits readers' opinions: How high a false positive rate is too high? Is forensic evidence that's 95 to 98 percent accurate good enough to secure a conviction "beyond a reasonable doubt" if that's the principle evidence against a defendant? At what error threshold should forensic evidence be excluded? Make your case in the comment section.

32 comments:

A Waco Friend said...

1. Each so-called expert should have to be tested for his/her error rate, and that particular expert should have an error rate of no more than 1 percent.

2. The evidence should be presented to the jury. Photographs or videos of the entire evidence bullet and of the test bullet showing the entire surfaces should be required to show the jury the degree of similarity, with the items at the same magnification and parallel. The technology is available to do this and there is no excuse not to do it.

Anonymous said...

2-3% of the convicted are innocent. I think that percentage is too low.

Anonymous said...

Crime lab analysts who fail a competency or proficiency test are usually given another test until they get it right. (not very real world). Crime lab managers are fully aware of those disciplines (forensic science specialties) that require carefully crafted 'tests' before their people can pass them. National statistics on these failures would reveal those disciplines that everybody already knows are suspect. It is noted that "inconclusive" is not a suitable answer on tests for continued employment in some places. Common sense would dictate that those disciplines in which it is difficult to prepare a passable test, be eliminated from a crime lab's menu of services. This however is a lab revenue problem for some; the government supported labs that charge a fee for service. Less tests on the menu, less revenue.

Anonymous said...

For the nerds...

Koehler, J.J. “Forensics or fauxrensics? Ascertaining accuracy in the forensic sciences.” (2016)

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2773255

Soronel Haetir said...

A Waco Friend :

My understanding is that #2 is generally the case already, certainly I have seen plenty of photos of two bullets lined up under microscopes, yet that is certainly one of the disciplines under fire. Are you saying that when it comes time for the expert to testify (or even for the defense to receive the expert's report) that those photos are not part of the package?

James Dark said...

I just wrote a true crime novel about a conviction in Dallas County. It was the hammer-murder of a gay pornographer who had an MO of abusing, on camera, otherwise straight males under the guise (which was crap) of auditioning for straight porn. Very sick and twisted.

The physical evidence was pretty damning for the defendent. But when the jury found out that the victim was an unregistered child sex offender, they seemed to see the light.

Found guilty of murder, he was found affirmative in Sudden Passion, and given 9 years in prison.

Frankly, I would have hung the jury, but I was just there as an observer, and had information that was withheld from the jury.

My book on Amazon is called Sudden Passion: The Anatomy of a Murder.


Anonymous said...

I believe that your "estimate" of False Positives in the criminal justice system is problematic since you are not taking into account the Base Rate in your "estimate". Since the Base Rate for "guilty" (based on ground truth) individuals is extremely high, it is most probable that the error rate for False Negatives will be very high, and the rate of False Positives very low. Without understanding Base Rate you are falling into the trap of Base Rate Fallacy in reasoning.

Anonymous said...

GFB:

The question you pose is an interesting one. However, the idea that these error rates will benefit the defense by reducing the level of certainty of the findings may not play out that way in practice.

The study that is cited by the Ames Laboratory estimated a very specific base error rate: the base error rate per analyst and per item. How that base error rate is applied in a particular case will depend upon the particular circumstances of the case. In particular, it will depend upon the structure of the laboratory's processes, and the number and types of evidence items in the case.

For example, most firearms labs have identifications verified by a second examiner. The Ames study did not examine the effect of this verification process on the base error rate. The study just looked at the per analyst error rate. But in the extreme case where the verification is a blind, independent verification by a second examiner, the error rate for the full process would be the product of the base rates for each of the two examiners. So, the first analyst has an error rate (say) of 1%. The verifier has an error rate of 1%. The combined error rate for the process is then 1% x 1% = 0.01% or 1-in-10 thousand. (This becomes a strong argument for blind, independent verifications, btw.)

As far a the number and types of evidence items, take the example of a homicide victim shot 4 times, where the 4 bullets recovered at autopsy all are "identified" to the defendant's gun. The defense's explanation is that all 4 identifications are false positive identifications. Is that a reasonable explanation? Well, if the false positive error rate per item is 1%, then the probability that all 4 identifications are false positive identifications is 1% x 1% x 1% x 1% = 0.000001% = 1-in-1 billion. That would be if no verification is performed. If the laboratory uses blind, independent verification by a second analyst then the probability would be 0.01% x 0.01% x 0.01% x 0.01% = 1-in-10 quadrillion.

These are the sorts of numbers that will come up in full-blown admissibility hearings, not just the simple base error rate. (It doesn't sound like the Illinois federal district court decision was based upon a full admissibility hearing.)

Anonymous said...

Sorry. Not 1-in-1 billion. 1-in-100 million.

d said...

It's interesting, because if a birth control pill's theoretical effectiveness rate were less than 99.9%, it would be considered useless. Because if there is room for error, there will be errors. Why is that so understandable when trying to prevent pregnancy, but so hard to grasp when trying to prevent wrongful convictions?

Anonymous said...

Lindsey,

Per the CDC, the ineffectiveness of birth control pills (when properly used) is about 9% (9 pregnancies per 100 women per year). The number you mention(99.9%) appears to be a per coitus probability, and doesn't take into account the frequency of coitus.

The devil is always in the details with these numbers.

Cheers,

A med school professor

d said...

@Anonymous 11:18, I'm afraid you're right. The devil is always in the details, whether we are talking about the ineffectiveness of birth control or the ineffectiveness of forensics. If you ask me, the numbers for both are too high for comfort.

Anonymous said...

A Waco friend said:

"2. The evidence should be presented to the jury. Photographs or videos of the entire evidence bullet and of the test bullet showing the entire surfaces should be required to show the jury the degree of similarity, with the items at the same magnification and parallel. The technology is available to do this and there is no excuse not to do it."

You are correct that these photos can be shown to juries now. They are not routinely shown by either the prosecution or the defense (and the defense could require their production.) The reason for that, as I understand, is that neither the prosecution nor the defense want jurors to start interpreting the evidence themselves using their inexpert eyes. You can't reliably predict what a layman will see and what he will ignore when he looks at a complicated pattern. For the defense, it's much better to get a defense expert to look at the evidence and (hopefully) reach a different conclusion.

Anonymous said...

Lindsey,

Grit's original question was, what would be an acceptable value for a false positive error rate.

So for you, what would the acceptable value be if you were a juror? 99.9% is 1-in-1,000. Would 1-in-10,000 be okay? Or would you want to see something smaller?

Anonymous said...

Lindsey, I think you do not understand "tests". All tests have error rates and are simply probabilistic estimates of what we expect to find. The vast majority of medical tests have far higher degrees of error than what most people understand. This is why the medical field looks at sensitivity vs. specificity and multiple tests. Tests such as the old TB test in school had a very high degree of sensitivity, (they picked up a lot) but poor specificity(being right). Additional tests were needed for specificity, (to determine if you had TB). Hence, the need in most testing for a "successive hurdles" approach.

What is confusing for many people is that if a test is 99% accurate then most people presume that it is just that. There are a couple of problems, the first is that almost all tests have a range of accuracy and those who try to give a point estimate do not understand or are simply trying to make it easier for others.

The second problem is to understand Base Rate and Base Rate fallacy. Base Rate is simply the presence in the population for that which we are testing. (How many red marbles are in a bag of green marbles for example.) If the number is the same, red vs. green then the types of errors are the same, either false positive(accusing the innocent) or false negative (the guilty gets away).

So, if we are testing for X(red marbles) and the test accuracy is 90%, then we think that the test is 90% accurate. However, if the rate of red marbles is 10 red and 90 green the False Positives for green will be higher. 90 x .90, or we will have 81 True Negatives and 9 False Positives. For the red 10 x .90 will equal 9 so we will have 9 True Positives and one False Negative. So 9 innocent green marbles are False Positive and 1 red marble is a False Negative. This is with a highly accurate test by any standards, and must be understood.

This is why I pointed out to Grits that his concern for False Positives is incorrect based on analyzing the criminal justice system, there are far more guilty than innocent, so they types of errors is skewed towards False Negative.

It does help to explain why there is a problem with interdiction stops as pointed out in another article by Grits.

It is fairly well established that humans are only about 54% accurate in detecting deception from an interview. If the base rate for crime (drugs in a car) is low, then we will have a lot more false positives than false negatives.

Grits probably knows this but was more influenced by emotions than reasoning in his statement that I objected to.

Anonymous said...

@7:47-

Adding to Anony 7:47, alternatively if you happen to be a bench analyst at the Dallas County crime lab SWIFS, you simply have the Lab Supervisor write a memo claiming that the PROFICIENCY TEST was actually wrong, and the ANALYST was right.

https://sliterchewspens.files.wordpress.com/2013/03/slide341.jpg

This is also a good way to cover-up contamination caused by the analyst while taking the proficiency test. Just blame the manufacturer of the proficiency test for contaminating the test before it was sent to the crime lab.

Oh yeah, make sure the Lab Supervisor does not perform additional analysis to validate that the contamination didn't occur inside the crime lab. And make sure the Lab Supervisor does not follow-up with the manufacturer of her findings of contamination.

Because, ya know, who needs absolute confirmation when you can have fraudulent documentation.

-SCP

Anonymous said...

Anonymous 7:47 said:
"Crime lab analysts who fail a competency or proficiency test are usually given another test until they get it right."

It's not that simple in any laboratory I know of.

Competency tests occur during the training of analysts. An unsuccessful competency test would be followed by additional training and practice. Then there would be follow-up competency testing.

Proficiency tests occur after training is completed. An unsuccessful proficiency test would be followed by a corrective action that is appropriate to address the root cause of the unsuccessful result. Depending upon the root cause, there might be additional testing performed to demonstrate competence. For example, an unsuccessful result due to an administrative error in transcribing a number from an instrument printout to a test result form would not be particularly worthy of a follow-up test.

Anonymous said...

If you knew there was a 2% chance that your parachute wouldn't open, would you still go skydiving?

The error the Judge made is ascribing the error rate for a scientific discipline to that of a specific individual expert. How is the Judge to know if the expert (for that case) used the same methodology as the methodology used by the Ames Study or the Miami-Dade Study?

Moreover, the studies do not have the added biasing variables of Prosecution or Peer Pressure for favorable outcomes, nor do the studies incorporate the so-called "technical review" of a second analyst's (amicable co-worker's) assessment (confirmation bias), nor (I don't think) do the studies incorporate contaminating information biasing (extraneous info related to the nature of the crime not necessary for the analysis, but may emotionally sway the analysts unconsciously).

It's not error rates associated with Junk Science that we should be concerned about per se, but the error rates associated with the Junk Scientist who is providing the expertness.

So every time a Judge lets the "expert" testify asserting the PCAST statistics, the judge should be required to go skydiving.

Question: Why don't we let Expert Witnesses cross-examine the expert witnesses of the opposition? Why do we insist that an unknowledged lawyer do the "rigorous" cross-exam to fetch out the charlatans?

Anonymous said...

Answer: Because they are not qualified or competent. The work-around is to sit with the attorney and feed him/her questions, which works fine in my experience.

Gritsforbreakfast said...

@3:57 who declared ''his concern for False Positives is incorrect"

Everyone understands that many guilty people go unpunished. In fact, the system is designed to value protecting the innocent over punishing the guilty (Blackstone, etc.) It's set up to allow some guilty folks to go free to avoid at extreme cots the injustice created by convicting the innocent. Your observation doesn't minimize the injustice created when the innocent are punished. That many guilty go free BECAUSE false positives are so abhorrent to the system.

Anyway, as a practical matter, if the guilty reoffend, maybe they'll be punished next time. The system arrests and incarcerates a lot of different people for a lot of different stuff, with tdcj taking in and releasing about 70k inmates per year. So if they don't get a criminal for one thing, they'll likely get him for another down the line. E.g., quite a few DNA exonerees saw their evidence match suspects who were already in prison for other rapes - the person continued to commit crimes and was eventually caught and punished. By contrast, the harm done to the innocent person falsely convicted can never be fully undone and is a greater injustice.

Anonymous said...

GFB I wasn't declaring a concern, I was pointing out that you were making a statistical and reasoning error. I agree that it is more appropriate that guilty are not convicted over wrongful incarceration. I was pointing out to the lady that was uncomfortable with a test that it was not 100% accurate that it is unreasonable to expect 100% accuracy as none exist. All tests are probabilistic statements. Base rates do matter in these tests as they will affect the type of errors that are made.

Anonymous said...

@8:25-

Please elaborate your answer. Experts are qualified by the judge to answer questions, but not qualified to ask questions??

One could argue that the lawyers aren't qualified to ask questions pertaining to the science behind the analysis by the expert.
Most lawyers don't know what IQ/PQ/OQ stands for or what a validation study might look like, much less ask questions related to those processes and their weaknesses.

And you may believe that "sitting with an expert" is a suitable answer, but that wouldn't explain the high number of wrongful convictions stemming from misleading, exaggerating, or fraudulent testimony of Experts (assuming that the opposing team's expert was feeding the attorney questions or had adequately prepped the cross examining attorney.) That also wouldn't explain how the APD DNA lab's erroneous application of statistics persisted for 6-plus years. Surely there was an expert that could have assisted the defense in 2010 to identify the problem and retrain the personnel sooner. So, either defense attorneys aren't using experts, or defense attorneys aren't adequately prepared to ask the right questions during cross.

Anonymous said...

@2:30 -

@8:25 here.

Scientist are not competent nor qualified to examine/cross-examine witnesses in criminal proceedings.

Trials and hearings are structured processes that have formalized rules. Scientists have not been trained in these rules. Scientists have been trained to discuss and argue with one another, generally on arcane bits of specialized knowledge, and generally using professional jargon that is extremely meaningful to fellow scientists and extremely mind-numbing to everyone else. This is not particularly helpful to the legal process, where laymen have to evaluate competing claims and explanations.

What you want here is reasonable. But the way you want to get it is not. I have testified as an expert in many, many trials and evidentiary hearings, and know the process as an expert witness as well as anyone, and better than most. There is no way I would ever do what you are suggesting.

The way to do what you want is through an evidentiary hearing, where experts can hear what each other are going to say, and then get on the stand and offer opposing testimony.

Re the APD DNA issue, I can't explain that. I have reviewed cases for defense attorneys, but never an APD case. The problems seem to be the sort that would be easily found in a review. But I say that without ever having looked at one of their reports or having read their protocols.

Gritsforbreakfast said...

@12:37, I have made no statistical nor reasoning error that you have identified, nor did I suggest tests must be 100 percent accurate. I asked how accurate the tests should be to justify their use in court. One notices you didn't answer.

Gritsforbreakfast said...

Also, 12:37, not all forensics suffer from such big error rates. Toxicology, non-mixture DNA tests - there are actual science based forensics which are not based on subjective comparisons and instead use techniques derived from the scientific method. Those methods are far more accurate, approaching the percentages for birth control pills, etc.. The question here is how accurate NON-science based forensics should be, when essentially you just have some guy or gal at the bench making a subjective judgment and not a science-based analysis.

Anonymous said...

One approach to answering the question is to look at the choices that people routinely make for themselves and their loved ones, and the risks associated with those choices. The National Safety Council calculates that the life-time risk of dying in a motor vehicle crash is 1-in-113 (0.9%). Injuries in motor vehicle crashes typically run about 50 times the number of fatalities (per the National Highway Transportation Safety Administration), which translates into a lifetime risk of injury in a motor vehicle crash approaching 50%.

In this context, a false positive error rate of up to 20% seems reasonable to me for tests that are not used to directly identify a perpetrator (e.g., firearms analysis which would identify a gun but not the shooter of the gun). The 20% risk associated with this test is less than the risk of injury-when-driving that people routinely view as an acceptable risk.

For tests that are used to identify a perpetrator (fingerprints, DNA, handwriting analysis) I would want to see a false positive error rate of 0.1%, which is less than the 1% risk of death-when-driving that people routinely view as an acceptable risk.

Anonymous said...

SCP/4:46-

That's quite an accusation. If true, has the Dallas District Attorney been notified? Or the accreditation agency?

I mean, if the Judges are allowing expert testimony based on error rates, and the proficiency test is a (sorta) measure of error, then how is a Judge to know if the proficiency test from a crime lab hasn't been manipulated to hide the errors of it's analysts (thus allowing them to testify)? To take it a step further, how does the Judge know if the actual lab report hasn't been manipulated?

Problems.

Anonymous said...

@2:02
Like other testimony, laboratory results are presented under oath. Unlike other testimony, laboratory work can be repeated by a defense expert if there is a question about the original work.

Proficiency tests as they are currently administered are not designed to give individual analyst error rates. Proficiency tests typically consist of 1-2 test samples, and are taken annually. For a test with an overall population error rate of 1%, to reliably detect an analyst with a 2% error rate would require about 1800 known non-matching samples and a similar number of known matching samples, so about 3600 samples. This is not logistically feasible.

So the "population" error rates that are cited are really sample error rates with a confidence interval applied - so a statistical worst case value.

But, as was mentioned in a previous comment, the per analyst error rate is only part of the story. Labs can and do institute process controls that have the effect of reducing the false positive error rates.

Anonymous said...

Apparently, the original seemingly easy question ("How high a false rate is too high? ") is not so easy after all. I suspect that this is because people generally are uncomfortable with anything less than belief in complete and absolute certainty, even if it is a delusional belief.

Anonymous said...

@8:25/5:52-

Alright. So you meant "unqualified" and "not competent" by legal definitions, not as a pejorative. I'm with you now.

But given the function of the Expert (to assist the Judge and Jury with understanding of the technical nature of his/her findings), I would disagree that two opposing Experts would result in a question/testimony of "arcane bits of specialized knowledge, and generally using professional jargon that is extremely meaningful to fellow scientists and extremely mind-numbing to everyone else." Frankly, this sounds like an excuse a lawyer would claim in order to keep the Experts out of the room -- they want the jury to be dumb, uninformed. Similar to commentor 2:56, "neither the prosecution nor the defense want jurors to start interpreting the evidence themselves using their inexpert eyes." Lawyers want to keep control of the information getting to the judge and jury, even if they (the lawyers) don't understand it. And much or the time, lawyers will exaggerate or twist the conclusions to fit their narrative.

The "arcane bits" are just as, if not more, germane to the scientific conclusions. Juries can be educated to these "arcane bits" if the Expert Witnesses are doing their job. Better for the jury to have too much information than not enough.

And the evidentiary hearings...I haven't had the pleasure of seeing one of those although it would make for some good reading. And would an evidentiary hearing have discovered these types of lab problems (improper statistics, expired reagents, etc.) that were discovered in the APD DNA Lab fiasco. [Clearly not, since the public/lawyers were not aware of the problems until the results of an external audit hit the news].

But from what testimony I have seen or read, the Experts always drop the "accreditation" line, as if that would absolve them from error rates. Rarely (if ever) have I heard an attorney ask about error rates or even standard lab protocols ("Did you use freshly prepared reagents in accordance with the protocols or did you use a reagent that was 3 days old?" Not really mind-numbing or arcane, and I think a jury could follow the implications.)

So, as far as I can tell, Texas labs have a problem of either not knowing the error rates of their tests or their analysts, or not reporting the error rates of the tests and analysts to the accreditation auditors. (Or, as seen from the commentor above, they just falsify the information.)

From Grits' example, what would have happened if the Judge did NOT think the error rate was low enough and did not let the Expert testify? Would the Expert's crime lab revamp it's protocols for better precision/accuracy? Would the accreditation agency (which already gave the "ok" for the lab's protocols) be accountable for giving an accreditation certificate to the crime lab that can't actually use its Experts in court (per the Judge's "gatekeeper" function)? Or, would the Prosecution claim that the Judge's opinion can not supersede accreditation seal of approval...thus, the expert testifies?

Anonymous said...

@7:05-

If it is not logistically feasible, as you suggest, then why are they required? Why are they given? Why are the TFSC discussing certification and licensing exams/requirements? Is this just more smoke and mirrors for the forensic community? Probably.

"Labs can and do institute process controls that have the effect of reducing the false positive error rates." Can you give an example? Because from the labs Ive seen and the news reports of the lab errors across Texas, confirmation bias and mob mentality drives the errors up (and simultaneously dilutes accountability to a number of people, adds greater confusion for internal investigations, and increases costs for the fixes.) Human nature is difficult to control, but it can be circumnavigated if transparent.

I think the money used for accreditation, certification and licensing could be better used on body cams for the scientists. Like the police, I wanna see what the analyst is doing with the evidence. I wanna see where error could be introduced (since they're not going to tell us). I wanna know that they know what they are doing.

Sad excuses and finger-pointing will go by the wayside.

Anonymous said...

2:26 -

@ 7:05 here.

Confirmation bias is certainly a concern, particularly in opinion-based disciplines.

The use of independent, blind verification of firearms comparisons is an example of a process control that will reduce the false positive error rate. This is where a second analyst (the verifier) looks at the evidence without knowing the conclusions of the first analyst. If the per analyst error rate is 1%, then the combined process error rate is 1% x 1% = 0.01%. That is basic statistics.

Re proficiency tests: Proficiency tests are widely used by testing laboratories (Not just forensic labs. Check out the College of American Pathology for clinical proficiency tests, and ASTM for testing of fuels, textiles, plastics, etc.) All testing laboratories accredited to ISO/IEC 17025 are required to perform proficiency testing as one component of their quality management programs. There are good discussions of the purpose and value of proficiency tests on the web.

My point was not that proficiency tests are valueless. My point was that they don't give a per analyst error rate. That is not what they were designed to do, and they don't do it. So, you can't really complain that they don't do something that you think they should do. It's like complaining to your car's manufacturer that the gas gauge of your car didn't tell you that you put diesel fuel into your tank instead of the gasoline that it needed. It might make you feel good to complain. But, all you've really achieved is to give a lot of people their laugh of the day.

If you have ideas about how to better spend the taxpayer's money, I am sure your elected officials will be happy to hear from you.