Thursday, October 20, 2011

Judges cautioned against reliance on overstated ballistics testimony

Recently, thanks to contributions from readers, Grits purchased a copy of the brand spanking new third edition of the "Reference Manual on Scientific Evidence" produced by the Federal Judicial Center and the National Research Council of the National Academies of Science - the first update of the manual in more than a decade - and just finished reading the chapter on "Forensic Identification Expertise" which may end up providing fodder for multiple Grits posts.

Basically, the book expands on work by the NAS in their 2009 report on the science (or lack thereof) behind forensics commonly used in criminal courtrooms, creating a reference manual for judges that combines the latest scientific assessments with an analysis of the relevant case law governing each technique discussed. Very helpful, and enlightening.

This 1,000-page tome addresses myriad aspects of scientific evidence used in courtrooms, but I thought I'd start with a discussion of the section on "Firearms Identification Evidence," or "ballistics," which have been used as identifiers in court dating back to the 1920s. "In 1923, the Illinois Supreme Court wrote that positive identification of a bullet was not only impossible but 'preposterous.' Seven years later, however, that court did an about-face and became one of the first courts in the country to admit firearms identification evidence. The technique subsequently gained widespread judicial acceptance and was not seriously challenged until recently." (Citations omitted.)

The 2009 NAS report found that "Sufficient studies have not been done to understand the reliability and repeatability of the methods" for matching fired bullets or cartridges with the originating weapon, but the studies that have been done certainly give one pause. Tests in 1978 by the Crime Laboratory Proficiency Testing Program found a 5.3% error rate in one case and a 13.6% error rate in another.  Experts evaluating those errors called them "particularly grave in nature." A third test by the same group found a 28.2% error rate.

Later proficiency testing produced lower error rates, but "Questions have arisen concerning the significance of these tests." Only firearms experts in accredited labs participated in testing, for starters, and they weren't "blind" studies: I.e., participants knew they were being tested. Some proficiency testing even reported zero errors, but in 2006, the US Supreme Court observed, "One could read these results to mean the technique is foolproof, but the results might instead indicate that the test was somewhat elementary."

Then, "In 2008, NAS published a report on computer imaging of bullets" that commented on the subject of identification, concluding that "Additional general research on the uniqueness and reproducibility of firearms-related toolmarks would have to be done if the basic premise of firearms identification are to be put on a more solid scientific footing." That report cautioned:
Conclusions drawn in firearms identification should not be made to imply the presence of a firm statistical basis when none has been demonstrated. Specifically, ... examiners tend to cast their assessments in bold absolutes, commonly asserting that a match can be made "to the exclusion of all other firearms in the world." Such comments cloak an inherently subjective assessment of a match with an extreme probability statement that has no firm grounding and unrealistically implies an error rate of zero. (emphasis in original)
In 1993, the US Supreme Court issued its pivotal Daubert ruling which proscribed new evidentiary standards for scientific evidence, but it took years for those standards to be rigorously applied to ballistics evidence. "This changed in 2005 in United States v. Green where the court ruled that the expert could describe only the ways in which the casings were similar but not that the casings came from a specific weapon." A 2008 case said an expert could not testify that a bullet matched a weapon to a "reasonable scientific certainty," but was only permitted to say that it was "more likely than not" that a bullet came from a particular weapon.

As with other comparative forensic techniques from fingerprints to bitemarks to microscopic hair examination, essentially, all ballistics experts are really saying is "After looking at them closely, I think these two things look alike." It strikes this writer that it's quite a big leap from "reasonable scientific certainty" to "more likely than not." Basically it's the leap from "beyond a reasonable doubt" to having "substantial doubt." I wonder how many past convictions hinged on testimony where experts used phrases like "reasonable scientific certainty" or "to the exclusion of all other firearms in the world"? And I wonder how many times those experts were simply wrong?


Anonymous said...

Finally! Someone in the legal system is growing a brain! I'm a retired machinist, and amateur gunsmith. When firearms are mass produced, the manufacturers will most likely use the same broach and reamer set to machine several gun barrels from the same set of tools. Of course these barrels are very likely going to have the same toolmarks! So, you may ask, with all the hundreds to thousands of rifles of the same caliber, manufacturer, and model, what are the chances of encountering two or more guns that came from the same set of cutting tools? I say it's pretty good, since I have twice encountered hunters/shooters that owned guns that were one serial number apart from a firearm I owned. These guns most likely came off the same set of cutting tools, and probably would leave identical toolmarks on bullets as my guns. I can only hope that they don't commit a crime with these guns, and the cops come knocking on MY door wanting to test MY guns!

Ryan Paige said...

A couple of years ago, I went back and read the old newspaper stories covering the trial of Johnny Frank Garrett, a teenager who was convicted of and executed for raping and murdering an elderly nun in Amarillo back in the early 1980s.

In one of the articles, the prosecutor is quoted grilling Garrett on the stand, asking him how his pubic hair ended up on the victim, going so far as to ask, as a follow-up "did they fall out, down your leg, over your shoes and socks and onto the floor?"

The idea that the science of the time could definitively narrow down a hair as having come from a single person to the exclusion of all others is ludicrous, but that's certainly not how that evidence played in Garrett's trial.

And since then, Garrett's guilt has come into question due to DNA results in a similar crime committed nearby four months earlier.

Anonymous said...

Anonymous 10:01 - It might seem reasonable that consecutively machined parts would have indistintuishable toolmarks. But in fact they can be reliable distinguished by trained examiners. For instance, there are a number of blind studies involving comparisons of bullets from consecutively machined barrel that show that trained individuals can reliably distinguish the bullets fired by consecutively manufactured barrels.

Gritsforbreakfast said...

11:23, would you have us believe that the National Academy of Sciences and the Federal Judicial Center willfully ignored those studies when they reached their conclusions? This publication was issued less than a month ago; if what you say is true, why do you think these agencies would advise judges that trained ballistics examiners cannot reliably individualize beyond "more likely than not"?

Anonymous said...

Willfully? no. Ignorantly? yes. The NAS did a pitiful job of "investigating" the comparative sciences, firearms and toolmarks in particular. They never looked at the Association of Firearms and Toolmark Examiner's (AFTE) journal and read the published studies there and in other journals concerning the validity of the science. Their investigation was woefully lacking, but what do you expect when you have academics in charge instead of scientists.

And as far as firearms marking the same as a previous poster asserted, I suggest reading the Brundage study of 1998 where 10 consecutively manufactured Ruger barrels (all made with the exact same manufacturing tools one after the other) were used to create a blind study for firearms examiners. All barrels were found to mark uniquely, all bullets identified to the origin barrel, and no mistakes made at that time. The Brundage study continues to be utilized in the field, and data is continually collected, and as of 2008 had an error rate of 0.0712%. If memory serves, the errors that have occurred were made were by trainees and/or interns who weren't qualified to look at actual casework yet.

Or there are plenty of other studies for anyone who cares to actually look for them. Freeman (1978), Brown (1995), DeFrance (2003), Hall (1983), Lutz(1970), Miller (1998, 2000, 2001), Murdock (1981), Bunch (2003), Tulleners (1998, 1999), Lardizabal (1995), Smith (2004, 2005), Lomoro (1974), Matty (1984, 1985), Skolrood (1975), Grooss (1995), Kennington (1999), Lopez (2000), Rosati (2000), Thompson (1994, 1996), Uchiyama (1986).... the list goes on and on, and these are all ones published before the NAS report, so no excuse for their researchers never to have found them.

All these studies and more have found time and time again that firearms and toolmark identification is a valid science when the technique is properly applied. Mistakes can and do happen, but they are exceedingly rare and there are a number of steps crime laboratories take to diminish their likelihood. All accredited laboratories require verification of an examiner's findings by a second qualified examiner. They also undergo annual audits where cases are reviewed for errors.

Honestly, the lack of research on the part of the NAS and the author of this blog is appalling. The studies and support for the science are out there if people would just spend the time looking for them before putting pen to paper or hand to keyboard.

Anonymous said...

GFB 11:23. One of issues addressed by the NAS in the 2009 has to do with expressing the weight of a match (or an exclusion) in language that suggests a statistical underpinning when there isn't sufficient basis for this. So statements such as "to a scientific certainty" in scientific parlance imply a statistical basis - basically a probability of a random match of less than some critical value (usually 0.05). That type of statistical analysis exists for some types of analysis (notably DNA) but not for the toolmark comparisons that form the basis of firearms analysis, which is a qualitative, opinion basis type of analysis.

In regard to the the document in question, it's noteworthy that the chapter cited on firearms is written by three non-scientists: Giannelli & Imwinkelried are law professors, and Peterson is a crimiologist (i.e., a social scientist). The references that are cited in the chapter clearly do ignore much recent literature. For instance, they state "Because these imperfections are randomly
produced, examiners assume that they are unique to each firearm. Although the assumption is plausible, there is no statistical basis for this assumption." The only citation they offer for this statement is a 1964 journal article. If I was writing a NIH grant application and justified the proposal using 1960s literature citations I would be roundly chastised for being not current with the literature.

That being said, they do cite more recent relevant literature. For instance, they say: "A recent study reported testing concerning 10 consecutively rifled Ruger pistol barrels. In 463 tests during
the study, no false positives were reported; 8 inconclusive results were reported." The study they cite is a 2009 study indicates a high degree of reliability.

Gritsforbreakfast said...

2:58/3:09, why don't you give us some links instead of vague cites nobody can access to prove your point?

In the meantime, given that the NAS took input from the very accreditation agencies you say validate the techniques, held public hearing where ballistics and toolmarks experts could present all the research you describe, etc., you really can't blame the NASs but should blame those in your own profession who either didn't present the studies you cite or (much more likely, IMO) couldn't explain away the criticisms cited by the court that the proficiency testing itself was too "elementary" to validate the practice.

IMO calling any comparative forensic technique "science" is just silliness. I don't dispute that experienced examiners may be right more often than wrong (hence, "more likely than not"), but it's not "science" in the sense of application of the scientific method. At best, you're just saying, "After looking at them closely, I think these two things look alike." That's subjective comparison, not "science" in any way, shape or form.

Gritsforbreakfast said...

BTW, 3:09, in this case "the document in question" isn't the NAS report but the new manual for judges linked in the post which includes the most recent case law on the topic. So you're saying that not only was the NAS ignorant, but all the prosecutors presenting support for ballistics evidence in the relevant post-Daubert court cases relied on experts who also could not or did not produce convincing studies to keep the court from radically limiting testimony from how it had been presented in the past.

Could it be that all those folks were "ignorant" but a blog commenter who doesn't even write under their own name should be definitively relied upon? Everybody is wrong but you? How plausible is that?

Bottom line: What's cited in this post is the current, best advice being given to judges on how to interpret this evidence based on existing science and case law. Even if you're right and all the prosecution experts from precedent-setting court cases, the NAS, etc., are "ignorant," what's described here is how judges are being told to interpret the law for now and until the technique is validated by the scientific method, with replicable results, that's how the courts will handle such evidence.

Anonymous said...

GFB - Here is a short list of references for anyone interested in reading the primary literature.

Stone, Rocky. (2003) How Unique are Impressed Marks, Association of Firearm and
Toolmark Examiners Journal, 35:4, pp. 376-383.

Collins, E.R., (2005) How “Unique” Are Impressed Toolmarks? – An Empirical Study of
20 Worn Hammer Faces, Association of Firearm and Toolmark Examiners Journal,
37:4, pp. 252-295.

Howitt D., Tulleners F., “A Calculation of the Theoretical Significance of Matched
Bullets”, Journal of Forensic Sciences, Volume 53, Number 4, July 2008, Pp.868-

Neel M., and Wells M., “A Comprehensive Statistical Analysis of Striated Tool
Mark Examinations Part I: Comparing Known Matches and Known Non-Matches”,
AFTE Journal, Volume 39, (4), Summer 2007, pp. 176-198.

May L., “Identification of Knives, Tools and Instruments” Journal of Police Science
Vol. 1, No. 3, 1930, pp. 247-248.

Brackett, J.W. “A Study of Idealized Striated Marks and their Comparisons using
Models.” Journal of the Forensic Science Society, Vol. 10, No. 1, January, 1970,
pp. 27-56.

Stone, R., “How Unique are Impressed Marks,” AFTE Journal, Vol. 35(4), Fall
2003, pp. 376-383.

Biasotti, A., (1981) Bullet Bearing Surface Composition and Rifling (Bore) Conditions as
Variables in the Reproduction of Individual Characteristics on Fired Bullets Association
of Firearm and Toolmark Examiners Journal , Volume13, Number 2 pp. 94 – 102.

Roberge, D., Beauchamp, A., (2006) The Use of BulletTrax-3D in a Study of
Consecutively Manufactured Barrels Association of Firearm and Toolmark Examiners
Journal 38:2 pp. 166 – 172.

Uchiyama, T., “Toolmark Reproducibility on Fired Bullets and Expended Cartridge
Cases”, Association of Firearm and Toolmark Examiners Journal, Vol. 40, No.1, 2008)
pp. 3 – 46.

Bacharach, B. (2009) Statistical Validation on the Individuality of Tool Marks Due to the
Effect of Wear, Environment Exposure and Partial Evidence”, NIJ/NCJRS Document

Gouwe J., Hamby J.E., Norris, S. (2008). Comparison of 10,000 Consecutively Fired
Cartridge Cases from a Model 22 Glock .40 S&W Caliber Semiautomatic Pistol.
Association of Firearm and Toolmark Examiners Journal, 40:1, pp. 57-63.

Kirby, S. “Comparison of 900 Consecutively Fired Bullets and Cartridge Cases
from a .455 Caliber S&W Revolver”, AFTE Journal, Vol. 33. No. 3, Summer
2001, pp. 113-125.

Biasotti, Alfred A. “A Statistical Study of the Individual Characteristics of Fired
Bullets.” Journal of Forensic Sciences, vol. 4(1), January, 1959, pp. 34-50.

Brundage, David J. “The Identification of Consecutively Rifled Gun Barrels.” AFTE
Journal, vol. 30(3), Summer, 1998, pp. 438-444.

DeFrance, Charles S. and Michael VanArsdale. “Validation Study of
Electrochemical Rifling.” AFTE Journal, vol. 35 (1), Winter, 2003, pp. 35-37.

Fadul, T.G., “An Empirical Study to Evaluate the Repeatability and Uniqueness of
Striations/Impressions Imparted on Consecutively Manufactured Glock EBIS Gun
Barrels”, AFTE Journal, Volume 43, Number 1, Winter 2011, Pp. 37-44.

Hamby J. E., Brundage D. J. , Thorpe J. W., “The Identification of Bullets Fired
from 10 Consecutively Rifled 9mm Ruger Pistol Barrels: A Research Project
Involving 507 Participants from 20 Countries”, AFTE Journal, Volume 41,
Number 2, Spring 2009, pp. 99-110.

Gritsforbreakfast said...

Thanks 7:05, that's helpful. It is not, however, dispositive for at least two reasons: First, quite a few of those don't relate to bullets but other types of toolmarks, and more importantly, checking just a few of the bullet-specific cites, some of them (and many, many others) were indeed cited in the 2008 NAS study on ballistic imaging that preceded the broader 2009 report and on which the latter relied.

IMO your assertion that the NAS ignored that research is just not true.

Anonymous said...

Certainly you can determine a similarity, but that is far from being exact. There are too many variables at play. Sure, SOME guns can leave a very distinguishable extractor mark, firing pin mark, throat mark, or rifling mark. But, most won't. And, before you can get down to exact minute detail, you have another hurtle to overcome. You have to be testing the same ammunition from the same lot number that was fired in the suspect weapon. Cartridge rim thickness, exact rim diameter, case length, case thickness, bullet cladding, primer anvil thickness, primer seat depth, bullet seat depth, powder charge, and a host of other variables are all subject to change between manufactured ammo lots. Now, what are your chances of coming up with a box of ammo to test that came from the same lot number as the ammo used in the crime?

Did the perp use a cleaned firearm to commit the crime, or had it been fired prior? If so, how many times? This can certainly affect what you see, and the marks it will leave. And, were the barrels clean when you tested them? Were they powder-fouled? How much leading did they have? Were they copper-fouled? All these can also affect what you see in the barrel throat and riflings, and the marks they will leave on a bullet.

There are way too many variables at play here to determine exact scientific matches.

Anonymous said...

I reviewed the study on bullets fired from the 10 Ruger barrels published in AFTE Journal--Volume 41 Number 2--Spring 2009.

The first thought that came to mind is how does this apply to real world forensic work?
It appears the test was conducted with an extreme lack of variation that weighs heavily in the favor of the examiners.

Full metal jacketed bullets fired into a tank of water grouped into sets of seven consecutive shots out of the barrel. How does such a test compare to the day in, day out examination work in a real world environment?

I'm just a manufacturing guy, but the study comes across as a dog and pony show to me. I would assure you if a supplier was to submit such an DOE study to an automotive OEM as proof of capability it would be rejected out of hand.

Anonymous said...

7:56 So is your objection to basic studies of variation, or blinded studies to address a single specific issue? The whole point of the NAS report is that there is a greater need for exactly these sorts of basic studies that address underlying variation.

Anonymous said...

The Reference Manual on Scientific Evidence is available in print form, but there is also a free fully-searchable PDF of the document. It is found here: if you click on "Download free PDF."

Anonymous said...

I have no objection to the study if one used it to counter a blog post by an amateur gunsmith.
My objection would be to someone using a very controlled and narrowly defined study such as this and began making general claims of capability as applied to real world cases.

Anonymous said...

8:34 I was pressed for time with my response at 12:07.

In manufacturing systems analysis we are concerned about two things, the capability of the process to produce good parts and the ability of the measurement system to distinguish good parts from bad parts.

To that end various tools are used such as process capability studies, both short and long term, multi variance studies, and measurement systems analysis (repeatability and reproducibility) are conducted. These are basic aspects of the production part approval process.

It’s a significant criteria of process validation that capability is calculated using production tooling, personnel, materials, gauges, measurement systems in the production environment, running at rate, using a statistically valid sample size. Final validation studies are conducted on the production parts.

A critical parameter directing these activities is the PFMEA (Process Failure Mode Effects Analysis). Used properly, the PFMEA begins before the first prototype is made and is used throughout the production lifecycle. This is a closed loop mechanism for directing process improvements and evaluation before and after the inevitable failure occurs.

One of the great strengths of using an analytical model such as FMEA is the team is tasked with identifying methods to change the subjective into the objective. New measurement systems, test equipment or standards may need to be purchased or developed in order for the process to quantified and evaluated based on objective criteria. The advantages of which are obvious to all involved weather scientist, academic, technician or production personnel.

Getting back to my specific criticism of the study, the study contains the following statement.
“In a Daubert Hearing (a legal challenge in the United States), an examiner could state something like the following: “A long term internationally administered validity test using
consecutively rifled barrels, a condition widely considered the most likely to produce errors, was completed by 507 different participants (502 examiners, 5 using instrumentation) and
resulted in 7,597 correct identification conclusions and no false positive conclusions”.

It’s clear that the purpose of this study is to be used as a claim of capability of the process. But there is no relation to the actual process here. Samples were mailed or distributed at the SHOT show. Compliance was voluntary. Respondents knew they were being evaluated. They were provided as much time as they wanted. Results were self reported. Overall, I would say these results would likely bear as much resemblance to the day in day out analysis of these labs, as sample parts built by engineers do to serial production off the assembly line. I absolutely believe this is a misuse of the data. The statement in fact validates the criticism leveled by Grits and the NAS.

The other claim is that the condition is considered worst case or “most likely to produce errors”. I don’t presume to be an expert on what you guys do, but the first question is what percentage of the analysis you conduct is on FMJ bullets recovered from bodies of water? What’s the effect on the distribution if these bullets pass through a layer or two of sheet rock or a live hog or plywood? What effect does substituting semi-wad cutters or hollow points or other expanding projectiles have? Button or electrolytic rifling? Why limit to seven consecutive shots, why not 7 to the 2nd, 3rd or 4th? What minimum percentage of a projectile has to be recovered to proceed with an analysis, what is the effect? Why not write a FMEA and begin validating your process in the same manner as any other competent discipline in this day and age?

Anonymous said...

All of the sources cited by 7:05 come from journals or sources that appear to be alighed with law enforcement and thus are very likely biased. I doubt you'd catch one of these journals publishing a study that didn't go the way they wanted. Its kind of like allowing the drug companies to do their own studies on psychotropic drugs. They get the results they want and throw out any studies that don't say what they want. But, when independent studies are done they show some of these drugs are no more effective than placebo. So, consider the sources. I'd like to see some studies by non-law enforcemetn aligned groups.

Anonymous said...

One additional comment to add to my previous post: I am immediately suspect of studies in a journal called "Association of Firearm and
Toolmark Examiners Journal." Think about it - would this group really want to publish anything that was critical of their field. If the field was discredited, there would no longer be an "Association of Firearm and
Toolmark Examiners Journal." This group has an inherent conflict of itnerest. In order to perpetuate its existence it must validate its methods. Hardly an unbiased source of information. Lets ses soem studies from folks who don't have a dog in the fight.

Anonymous said...

Psyhics have an accreditation agency. If they had a journal, I bet every study they published would show that psychics are right most of the time.

Anonymous said...

And by the way. you stated "All accredited laboratories require verification of an examiner's findings by a second qualified examiner." Assuming for a moment that this is an effective detection measure, what percentage of the cases do the second examiners fail to validate the results of the first?