Wednesday, September 9, 2015

The Heuristic About Representativeness Heuristic

Some people think that the problem with representativeness heuristic is a base rate neglect. I hold that this is incorrect, and the problem is deeper than that, and simple use of a base rate isn't going to fix it. This makes the idea about "look at the base rate!" a heuristic as well.

The thing is, there is a fundamental difference between "How strongly E resembles H" and "How strongly H implies E". The latter question is about P(E|H), and this number could be used in Bayesian reasoning, if you add P(E|!H) and P(H)[1]. The former question — the question humans actually answer when asked to judge about whether something is likely — sometimes just could not be saved at all.

Several examples to get point across:

1) Conspiracy theorists / ufologists: naively, their existence strongly points to a world where UFOs exist, but really, their existence is very weak evidence of UFOs (human psychology suggests that ufologists could exist in a perfectly alienless world), and even could be an evidence against them, because if Secret World Government was real, we expect it to be very good at hiding, and therefore any voices who got close to the truth will be quickly silenced.

So, the answer to "how strongly E resembles H?" is very different from "how much is P(E|H)?". No amount of accounting for base rate is going to fix this.

2) Suppose that some analysis comes too good in a favor of some hypothesis

Maybe some paper argues that leaded gasoline accounts for 90% variation in violent crime (credit for this example goes to /u/yodatsracist on reddit). Or some ridiculously simple school intervention is claimed to have a gigantic effect size.

Let's take leaded gasoline, for example. On the surface, this data strongly "resembles" a world where leaded gasoline is indeed causing a violence, since 90% suggest that effect is very large and is very unlikely to be a fluke. On the other hand, this effect is too large, and 10% of "other factors" (including but not limited to: abortion rate, economic situation, police budget, alcohol consumption, imprisonment rate) is too small of percentage.

The decline we expect in a world of harmful leaded gasoline is more like 10% than 90%, so this model is too good to be true; instead of being very strong evidence in favor, this analysis could be either irrelevant (just a random botched analysis with faked data, nothing to see here) or offer some evidence against (for reasons related to the conservation of expected evidence, for example).

So, how it should be done? Remember that P(E|H) would be written as P(H -> E), were the notation a bit saner. P(E|H) is a "to which degree H implies E?", so the correct method for answering this query involves imagining world-where-H-is-true and asking yourself about "how often does E occur here?" instead of answering the question "which world comes to my mind after seeing E?".
 
--------------

[1] And often just using base rate is good enough, but this is another, even less correct heuristic. See: Prosecutor's Fallacy.

Tuesday, September 8, 2015

Bayesianism for humans: prosaic priors

(note: this is a copy of my LW post. I'm gathering all my stuff in one place)


There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. This post is about the second penny, the first one is here.

Prosaic Priors

The second insight can be formulated as «the dull explanations are more likely to be correct because they tend to have high prior probability.»

Why is that? 

1) Almost by definition! Some property X is 'banal' if X applies to a lot of people in an disappointingly mundane way, not having any redeeming features which would make it more rare (and, hence, interesting).

In the other words, X is banal iff base rate of X is high. Or, you can say, prior probability of X is high.

1.5) Because of Occam's Razor and burdensome details. One way to make something boring more exciting is to add interesting details: some special features which will make sure that this explanation is about you as opposed to 'about almost anybody'.

This could work the other way around: sometimes the explanation feels unsatisfying exactly because it was shaved of any unnecessary and (ultimately) burdensome details.

2) Often, the alternative of a mundane explanation is something unique and custom made to fit the case you are interested in. And anybody familiar with overfitting and conjunction fallacy (and the fact that people tend to love coherent stories with blinding passion1) should be very suspicious about such things. So, there could be a strong bias against stale explanations, which should  be countered.

* * *

I fully grokked this when being in process of CBT-induced soul-searching; usage in this context still looks the most natural to me, but I believe that the area of application of this heuristic is wider.

Examples

1) I'm fairly confident that I'm an introvert. Still, sometimes I can behave like an extrovert. I was interested in the causes of this "extroversion activation", as I called it2. I suspected that I really had two modes of functioning (with "introversion" being the default one), and some events — for example, mutual interest (when I am interested in a person I was talking to, and xe is interested in me) or feeling high-status — made me switch between them.

Or, you know, it could be just reduction in a social anxiety, which makes people more communicative. Increased anxiety levels wasn't a new element to be postulated; I already knew I had it, yet I was tempted to make up new mental entities, and prosaic explanation about anxiety managed to avoid me for a while.

2) I find it hard to do something I consider worthwhile while on a spring break, despite having lots of a free time. I tend to make grandiose plans — I should meet new people! I should be more involved in sports! I should start using Anki! I should learn Lojban! I should practice meditation! I should read these textbooks including doing most of exercises! — and then fail to do almost anything. Yet I manage to do some impressive stuff during academic term, despite having less time and more commitments.

This paradoxical situation calls for explanation.

The first hypothesis that came to my mind was about activation energy. It takes effort to go  from "procrastinating" to "doing something"; speaking more generally, you can say that it takes effort to go from "lazy day" to "productive day". During the academic term, I am forced to make most of my days productive: I have to attend classes, do homework, etc. And, already having done something good, I can do something else as well. During spring break, I am deprived of that natural structure, and, hence I am on my own in terms of starting doing something I find worthwhile.

The alternative explanation: I was tired. Because, you know, vacation comes right after midterms, and I tend to go all out while preparing for midterms. I am exhausted, my energy and willpower are scarce, so it's no wonder I am having trouble utilizing it.

(I don't really believe in the latter explanation (I think that my situation is caused by several factors, including two outlined above), so it is also an example of descriptive "probable enough" hypothesis)

3) This example comes from Slate Star Codex. Nerds tend to find aversive many group bonding activities usual people supposedly enjoy, such as patriotism, prayer, team sports, and pep rallies. Supposedly, they should feel (with a tear-jerking passion of thousand exploding suns) the great unity with their fellow citizens, church-goers, teammates or pupils respectively, but instead they feel nothing.

Might it be that nerds are unable to enjoy these activities because something is broken inside their brains? One could be tempted to construct an elaborate argument involving autism spectrum and a mild case of schizoid personality disorder. In other words, this calls for postulating a rare form of autism which affects only some types of social behaviour (perception of group activities), leaving other types unchanged.

Or, you know, maybe nerds just don't like the group they are supposed to root for. Maybe nerds don't feel unity and relationship to The Great Whole because they don't feel like they truly belong here.

As Scott put it, "It’s not that we lack the ability to lose ourselves in an in-group, it’s that all the groups people expected us to lose ourselves in weren’t ones we could imagine as our in-group by any stretch of the imagination"3.

4) This example comes from this short comic titled "Sherlock Holmes in real life".

5) Scott Aaronson uses something similar to the Hanlon's Razor to explain that the lack of practical expertise of CS theorists aren't caused by arrogance or something like that:

"If theorists don’t have as much experience building robots as they should have, don’t know as much about large software projects as they should  know, etc., then those are all defects to add to the long list of their other, unrelated defects.  But it would be a mistake to assume that they failed to acquire this knowledge because of disdain for practical people, rather than for mundane reasons like busyness or laziness."

6) (NEW) An answer seen in an Quora thread titled "Why aren't there a lot of old programmers at software companies?". I feel like an answer by Zach Brook is worth quoting (almost) in its entirety:

Demographics. There weren't very many programmers 40 years ago, so therefore there aren't many programmers with 40 years of experience. Ditto 30 and 20.

Note that this explanation is straightforward and does not require:

  • Mass stereotyping of older developers
  • Conspiracy theories involving hiring managers at thousands of companies colluding
  • Characterization of modern companies as no longer solving interesting or hard technical problems
  • Blaming 20-something-year-olds (a.k.a "kids these days")
  • Suspension of disbelief in the free market

* * *

...and after this the word "prosaic" quickly turned into an awesome compliment. Like, "so, this hypothesis explains my behaviour well; but is it boring enough?", or "your claim is refreshingly dull; I like it!".


1. If you had read Thinking: Fast and Slow, you probably know what I mean. If you hadn't, you can look at narrative fallacy in order to get a general idea.
2. Which was, as I now realize, an excellent way to deceive myself via using word with a lot of hidden assumptions. Taboo your words, folks!
3. As a side note, my friend proposed an alternative explanation: the thing is, often nerds are defined as "sort of people who dislike pep rallies". So, naturally, we have "usual people" who like pep rallies and "nerds" who avoid them. And then "nerds dislike pep rallies" is tautology rather than something to be explained.

Bayesianism for humans: "probable enough"

(note: this is a copy of my LW post. I'm gathering all my stuff in one place)


There are two insights from Bayesianism which occurred to me and which I hadn't seen anywhere else before.
I like lists in the two posts linked above, so for the sake of completeness, I'm going to add my two cents to a public domain. Second penny is here.


"Probable enough"

When you have eliminated the impossible, whatever  remains is often more improbable than your having made a mistake in one  of your impossibility proofs.


Bayesian way of thinking introduced me to the idea of "hypothesis which is probably isn't true, but probable enough to rise to the level of conscious attention" — in other words, to the situation when P(H) is notable but less than 50%.

Looking back, I think that the notion of taking seriously something which you don't think is true was alien to me. Hence, everything was either probably true or probably false; things from the former category were over-confidently certain, and things from the latter category were barely worth thinking about.

This model was correct, but only in a formal sense.

Suppose you are living in Gotham, the city famous because of it's crime rate and it's masked (and well-funded) vigilante, Batman. Recently you had read The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, and according to some theories described here, Batman isn't good for Gotham at all.

Now you know, for example, the theory of Donald Black that "crime is, from the point of view of the perpetrator, the pursuit of justice". You know about idea that in order for crime rate to drop, people should perceive their law system as legitimate. You suspect that criminals beaten by Bats don't perceive the act as a fair and regular punishment for something bad, or an attempt to defend them from injustice; instead the act is perceived as a round of bad luck. So, the criminals are busy plotting their revenge, not internalizing civil norms.

You believe that if you send your copy of book (with key passages highlighted) to the person connected to Batman, Batman will change his ways and Gotham will become much more nice in terms of homicide rate. 

So you are trying to find out Batman's secret identity, and there are 17 possible suspects. Derek Powers looks like a good candidate: he is wealthy, and has a long history of secretly delegating illegal-violence-including tasks to his henchmen; however, his motivation is far from obvious. You estimate P(Derek Powers employs Batman) as 20%. You have very little information about other candidates, like Ferris Boyle, Bruce Wayne, Roland Daggett, Lucius Fox or Matches Malone, so you assign an equal 5% to everyone else.

In this case you should pick Derek Powers as your best guess when forced to name only one candidate (for example, if you forced to send the book to someone today), but also you should be aware that your guess is 80% likely to be wrong. When making expected utility calculations, you should take Derek Powers more seriously than Lucius Fox, but only by 15% more seriously.

In other words, you should take maximum a posteriori probability hypothesis into account while not deluding yourself into thinking that now you understand everything or nothing at all. Derek Powers hypothesis probably isn't true; but it is useful.

Sometimes I find it easier to reframe question from "what hypothesis is true?" to "what hypothesis is probable enough?". Now it's totally okay that your pet theory isn't probable but still probable enough, so doubt becomes easier. Also, you are aware that your pet theory is likely to be wrong (and this is nothing to be sad about), so the alternatives come to mind more naturally.

These "probable enough" hypothesis can serve as a very concise summaries of state of your knowledge when you simultaneously outline the general sort of evidence you've observed, and stress that you aren't really sure. I like to think about it like a rough, qualitative and more System1-friendly variant of Likelihood ratio sharing.

Planning Fallacy

The original explanation of planning fallacy (proposed by Kahneman and Tversky) is about people focusing on a most optimistic scenario when asked about typical one (instead of trying to do an Outside VIew). If you keep the distinction between "probable" and "probable enough" in mind, you can see this claim in a new light.

Because the most optimistic scenario is the most probable and the most typical one, in a certain sense.

The illustration, with numbers pulled out of thin air, goes like this: so, you want to visit a museum.

The first thing you need to do is to get dressed and take your keys and stuff. Usually (with 80% probability) you do this very quick, but there is a weak possibility of your museum ticket having been devoured by an entropy monster living on your computer table.

The second thing is to catch bus. Usually (p = 80%), bus is on schedule, but sometimes it can be too early or too late. After this, the bus could (20%) or could not (80%) get stuck in a traffic jam.

Finally, you need to find a museum building. You've been there before once, so you sorta remember your route, yet still could be lost with 20% probability.

And there you have it: P(everything is fine) = 40%, and probability of every other scenario is 10% or even less. "Everything is fine" is probable enough, yet likely to be false. Supposedly, humans pick MAP hypothesis and then forget about every other scenario in order to save computations.

Also, "everything is fine" is a good description of your plan. If your friend asks you, "so how are you planning to get to the museum?", and you answer "well, I catch the bus, get stuck in a traffic jam for 30 agonizing minutes, and then just walk from here", your friend is going  to get a completely wrong idea about dangers of your journey. So, in a certain sense, "everything is fine" is a typical scenario. 

Maybe it isn't human inability to pick the most likely scenario which should be blamed. Maybe it is false assumption that "most likely == likely to be correct" which contributes to this ubiquitous error.

In this case you would be better off having picked the "something will go wrong, and I will be late", instead of "everything will be fine".

So, sometimes you are interested in the best specimen out of your hypothesis space, sometimes you are interested in a most likely thingy (and it doesn't matter how vague it would be), and sometimes there are no shortcuts, and you have to do an actual expected utility calculation.