The Likelihood of Guilt and Statistical Probability
Via Simple Justice and Eugene Volokh, I found this pair of posts from Steven Landsburg at The Big Questions both provocative and interesting. Go read them both before continuing with this post, since I'll refer to them but won't quote them extensively:
As a probability game, that's all fine and good. As an analogy for the likelihood of guilt in a given situation, however, perhaps not so much. Even from a statistical perspective, I think the good professor is wrong, at least when you scale up his analysis to a systemic level. He is engaging in what's known among mathematicians as the "base rate fallacy," as evidenced by this example offered up by of all sources, the CIA:
During the Vietnam War, a fighter plane made a non-fatal strafing attack on a US aerial reconnaissance mission at twilight. Both Cambodian and Vietnamese jets operate in the area. You know the following facts:Bruce Schneier offers a similar example regarding data mining:
(a) Specific case information: The US pilot identified the fighter as Cambodian. The pilot's aircraft recognition capabilities were tested under appropriate visibility and flight conditions. When presented with a sample of fighters (half with Vietnamese markings and half with Cambodian) the pilot made correct identifications 80 percent of the time and erred 20 percent of the time.
(b) Base rate data: 85 percent of the jet fighters in that area are Vietnamese; 15 percent are Cambodian.
Question: What is the probability that the fighter was Cambodian rather than Vietnamese?
A common procedure in answering this question is to reason as follows: We know the pilot identified the aircraft as Cambodian. We also know the pilot's identifications are correct 80 percent of the time; therefore, there is an 80 percent probability the fighter was Cambodian. This reasoning appears plausible but is incorrect. It ignores the base rate--that 85 percent of the fighters in that area are Vietnamese. The base rate, or prior probability, is what you can say about any hostile fighter in that area before you learn anything about the specific sighting.
It is actually more likely that the plane was Vietnamese than Cambodian despite the pilot's "probably correct" identification. Readers who are unfamiliar with probabilistic reasoning and do not grasp this point should imagine 100 cases in which the pilot has a similar encounter. Based on paragraph (a), we know that 80 percent or 68 of the 85 Vietnamese aircraft will be correctly identified as Vietnamese, while 20 percent or 17 will be incorrectly identified as Cambodian. Based on paragraph (b), we know that 85 of these encounters will be with Vietnamese aircraft, 15 with Cambodian.
Similarly, 80 percent or 12 of the 15 Cambodian aircraft will be correctly identified as Cambodian, while 20 percent or three will be incorrectly identified as Vietnamese. This makes a total of 71 Vietnamese and 29 Cambodian sightings, of which only 12 of the 29 Cambodian sightings are correct; the other 17 are incorrect sightings of Vietnamese aircraft. Therefore, when the pilot claims the attack was by a Cambodian fighter, the probability that the craft was actually Cambodian is only 12/29ths or 41 percent, despite the fact that the pilot's identifications are correct 80 percent of the time.
This may seem like a mathematical trick, but it is not. The difference stems from the strong prior probability of the pilot observing a Vietnamese aircraft. The difficulty in understanding this arises because untrained intuitive judgment does not incorporate some of the basic statistical principles of probabilistic reasoning.
Data mining is like searching for a needle in a haystack. There are 900 million credit cards in circulation in the United States. According to the FTC September 2003 Identity Theft Survey Report, about 1% (10 million) cards are stolen and fraudulently used each year. Terrorism is different. There are trillions of connections between people and events -- things that the data mining system will have to "look at" -- and very few plots. This rarity makes even accurate identification systems useless.So if the evaluation of Prof. Landsburg's culpability were an isolated incident, perhaps his analytical assumptions make sense. If the same investigative technique is applied to many people, however, even a 98% level of certainty can produce many false positives, as in these examples.
Let's look at some numbers. We'll be optimistic. We'll assume the system has a 1 in 100 false positive rate (99% accurate), and a 1 in 1,000 false negative rate (99.9% accurate).
Assume one trillion possible indicators to sift through: that's about ten events -- e-mails, phone calls, purchases, web surfings, whatever -- per person in the U.S. per day. Also assume that 10 of them are actually terrorists plotting.
This unrealistically-accurate system will generate one billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999% and you're still chasing 2,750 false alarms per day -- but that will inevitably raise your false negatives, and you're going to miss some of those ten real plots.
There's another mitigating wildcard to examine before we declare this one piece of datum sufficient for a conviction: Landsburg's assumptions included the statement, "While you weren’t looking, I reached into one of these urns and randomly drew out a dozen balls…4 of them were red and 8 were black." But what if that's a lie? What if while you weren't looking, Prof. Landsburg went into the left-hand urn and counted out a precise, predetermined number of red and black balls for his own purposes? Provocatively, Landsburg goes so far as to declare that "If I were on trial for the crime of drawing from the right urn, I hope this evidence would be strong enough to convict me," without even insisting that the balls in each urn be counted to ensure the theft really took place!
If taking balls from the right urn is a crime, why should we take the word of a criminal that the balls were indeed chosen randomly? :) In all seriousness, actually creating a law against taking balls from the right urn creates an incentive for the ball-taker to lie. For that matter, if someone wanted to make it falsely appear that the right urn was the source of balls that were really handpicked from the left, they would successfully fool others 98% of the time if they made sure their "random" pickings mirrored a right-urn pattern. In the world of statistical parlor games governed by hard-and-fast assumptions and rational decisions, those percentages make sense. In a world where 25% of DNA exonerations involved false confessions or guilty pleas, not so much.
Landsburg says "I hope this evidence would be strong enough to convict me," but depending on the jurisdiction that might be overly hopeful. In Texas state court, it may. In military courts martial, though, as well as in federal court, a confession must be independently corroborated by other evidence. The assumption that the balls were taken from an urn randomly comes from a stand-alone confession for which there is no corroboration. And of course, as one of Landsburg's commenters noted, in the real world one seldom would know the exact distribution of balls in each urn: In other words, much evidence in criminal cases includes elements of uncertainty which are merely assumed away in this hypothetical. Moreover, witnesses sometimes embellish or lie in ways that aren't always anticipated by game theory.
'n Guilty Men'
Relatedly, via Landsburg's discussion and Eugene Volokh's tout, I also discovered this wonderful and humorous 1997 law review article, "n Guilty Men," by Alexader Volokh (brother of Eugene), which traces the variations throughout history of the sentiment, most famously stated by William Blackstone, that it's better for ten guilty men to go free than to punish an innocent one. He begins by pointing out that there's not universal agreement on whether ten is the right number:
But why ten? Other eminent legal authorities through the ages have put their weight behind other numbers. "One" has appeared on Geraldo. 7 "It's better for four guilty men to go free than one innocent man to be imprisoned," says basketball coach George Raveling. 8 But "it's better to turn five guilty men loose than it is to convict one innocent man," according to ex-Mississippi executioner and roadside fruit stand operator Thomas Berry Bruce, who ought to know. 9 "It is better to let nine guilty men free than to convict one innocent man," counters lawyer Bruce Rosen from Madison, Wisconsin. 10 Justice Benjamin Cardozo certainly believed in five for execution, 11 and allegedly favored ten for imprisonment, 12 which is a bit counterintuitive. Benjamin Franklin thought "that it is better [one hundred] guilty Persons should escape than that one innocent Person should suffer." 13 Mario Puzo's Don Clericuzio heard about letting a hundred guilty men go free and, "struck almost dumb by the beauty of the concept . . . became an ardent patriot." 14 Denver radio talk show host Mike Rosen claims to have heard it argued "in the abstract" that it's better that 1000 guilty men go free than one innocent man be imprisoned, and comments, "Well, we get our wish." 15Excellent stuff, huh? Volokh sees the roots of these mathematical calculations over the relative value of innocent men in the story of Abraham haggling with God over the fate of the sinful city Sodom, the scriptural account of which is the epigraph to his article:
Or, perhaps, it may be merely "a few," 16 "some," 17 "several," 18 "many" (and particularly more than eight), 19 "a considerable amount," 20 or even "a goodly number." 21 Not all commentators weigh acquitting the guilty against the conviction of one innocent man. A Missouri district court said in 1877 that it was "better that some guilty ones should escape than that many innocent persons should be subjected to the expense and disgrace attendant upon being arrested upon a criminal charge." 22 And in Judge Henry J. Friendly's opinion, "Most Americans would allow a considerable number of guilty persons to go free than to convict any appreciable number of innocent men." 23 It is unclear whether "considerable" is greater or less than "appreciable." 24
n guilty men, then. The travels and metamorphoses of n through all lands and eras are the stuff that epic miniseries are made of. n is the father of criminal law. This is its story.
And Abraham drew near and said, Wilt thou also destroy the righteous with the wicked? Peradventure there be fifty righteous within the city: wilt thou also destroy and not spare the place for the fifty righteous that are therein? That be far from thee to do after this manner, to slay the righteous with the wicked: and that the righteous should be as the wicked, that be far from thee: Shall not the Judge of all the earth do right? And the Lord said, If I find in Sodom fifty righteous within the city, then I will spare all the place for their sakes.From this Volokh sensibly takes it that God's bottom line n for guilty people who may be freed for the sake of an innocent person would be n = (P-10)/10 where P = the population of Sodom. So, if Sodom was a town of 3000 people, by this logic God would countenance letting 299 guilty men go free for the sake of a single innocent [(3000 - 10)/10].
And Abraham answered and said, Behold now, I have taken upon me to speak unto the Lord, which am but dust and ashes: Peradventure there shall lack five of the fifty righteous: wilt thou destroy all the city for lack of five? And he said, If I find there forty and five, I will not destroy it. And he spake unto him yet again, and said, Peradventure there shall be forty found there. And he said, I will not do it for forty's sake. And he said unto him, Oh let not the Lord be angry, and I will speak: Peradventure there shall thirty be found there. And he said, I will not do it, if I find thirty there. And he said, Behold now, I have taken upon me to speak unto the Lord: Peradventure there shall be twenty found there. And he said, I will not destroy it for twenty's sake.
And he said, Oh let not the Lord be angry, and I will speak yet but this once: Peradventure ten shall be found there. And he said, I will not destroy it for ten's sake.
Volokh also cites Talmudic interpretations from the middle ages of Exodus 23:7 which came to the conclusion that, for purposes of execution, it was better to let 1,000 guilty men go free rather than convict an innocent one! (To jibe with Volokh's calculations vis a vis Abraham's bargain, Sodom's population would have had to have been 10,010.) I'm really just providing a taste, though, the whole thing is a must-read and IMO flat-out hysterical. (My favorite part is the send up of Blackstone's divinity and immortality.)
Acceptable Error Rates?
These discussions are dancing around the question of how many innocent people are falsely accused and convicted? And assuming the answer is greater than zero, what error rate is tolerable? Over the past couple of years this blog has compiled available estimates of actual innocence rates among prison inmates based on various methodologies and datasets, with results ranging on the low end from .75% to 3.3% on the high end. However you slice it, that's a not-insignificant false-positive rate. At a rate of .75%, we would assume there are 1,200 actually innocent people locked up in Texas prisons right now; at 3.3%, the total exceeds 5,000. (The rate among probationers is likely a little higher because of the incentive innocent people have to take a deal to avoid incarceration.)
Perhaps, given these data, Blackstone's 10-1 false-negative to false-positive ratio was always a little too stingy; perhaps Ben Franklin's n=100 is a better standard. I fail to understand why, if manufacturing companies can adopt "six sigma" error standards where "in which 99.99966% of the products manufactured are statistically expected to be free of defects (3.4 defects per million)," that the criminal justice system couldn't shoot for at least a four-sigma error rate (99.38% accurate). Criminal justice is a field where relatively high error rates are routinely tolerated in high-stakes circumstances.
These debates hone in on key, fundamental questions: What error rate is tolerable in the justice system - both false positives and false negatives - and how may those errors be reduced? How many innocent people is it acceptable to punish in pursuit of punishing the guilty, or is that ever justified? How much evidence is sufficient to convict and what constitutes reasonable doubt? I don't know the answer to all these questions, but they're interesting topics to think about and discuss.