The third book of rationality

This is part 3 of 6 in my series of summaries. See this post for an introduction.



Interlude: The Power of Intelligence

OUR BRAINS ARE basically yucky lumps of slimy grayish tissue. Aristotle thought the brain was an organ for cooling blood. The brain doesn’t look like some of the most powerful stuff in the universe, but it is the ultimate weapon and ultimate defense.

Humans don’t have armored shells, claws, or venoms like other animals. But we have machine guns, fighter jets, and nuclear bombs. We can manipulate DNA. If you were alive when human civilization began, you would not have predicted these things.

Such is the power of intelligence and creativity. The gray wet thing is the trick behind landing on the moon, curing smallpox, and manufacturing computers. But people still don’t understand how their brains work, so the power of intelligence seems less real, and harder to imagine.

But the one trick works anyway. The footprints are on the moon nonetheless. If you are ignorant about a phenomenon, that is a fact about your state of mind, not the phenomenon. Intelligence is as real as electricity. So if we could understand deeply enough, we could create and shape that power – except it is far more powerful and dangerous than electricity. One fell swoop could solve even such prosaic problems as obesity, cancer, and aging. Don’t think that an Artificial Intelligence can’t do anything interesting over the internet unless a human programmer builds it a robot body.




Part III

The Machine in the Ghost



W
hy haven’t we evolved to be more rational? To get a realistic picture of how and why our minds execute their biological functions, this part cracks open the hood and looks at how evolution works and how our brains work, with more precision. By locating our minds within a larger space of goal-directed systems, we can identify some peculiarities of human reasoning.

You are a mind, which means you can make predictions and form plans, and even form, inside your mind, a picture of your whole mind. However, your mind is implemented on a human brain, which despite its flexibility, can follow patterns and routines for a lifetime without noticing that it is doing so. These routines can have great consequences. A mental pattern that serves you well is called “rationality”. But because of your ancestry, you are hard-wired to exhibit certain species of irrationality. You are built on the echoes of your ancestors’ struggles and victories.

We tend to think of our minds in terms of mental categories (e.g. ideas, feelings) rather than physical categories. Philosophers in the past have argued that minds and brains are fundamentally distinct and separate phenomena. This is dualism, also known as “the dogma of the Ghost in the Machine”. But even a modern scientific view needs a precise overarching theory that predicts how the mind works, to avoid making the same kinds of mistakes. Perhaps we can learn about ourselves from inhuman mind-like systems/processes in evolutionary biology and artificial intelligence (AI).

Yudkowsky is a decision theorist who works on foundational issues in Artificial General Intelligence (AGI), the theoretical study of domain-general problem-solving systems. His work in AI has been a major driver behind his exploration of human rationality. Yudkowsky predicts that in the long-run, AI will surpass humans in an “intelligence explosion” (or “technological singularity”), which will likely result in social upheaval. The term “Friendly AI” refers to research into techniques for aligning AGI preferences with the preferences of humans. This is technically difficult, because AI systems will learn and evolve over time, as will circumstances and our desired responses to circumstances change – so we need to give the program a utility function that remains harmless to humans dynamically. Cognitive bias can interfere with our ability to forecast existential risks in advance.



12

The Simple Math of Evolution

Knowing the designer can tell you much about the design. This chapter is about the dissonance and divergence between our hereditary history, our present-day biology, and our ultimate aspirations. It will dig deeper than the surface-level features of natural selection.

We humans tend to see “purpose” in the natural world. But there is no “Evolution Fairy” with the prudential foresight to design purposeful creatures. Rather, whatever happens is caused by genes’ effects on those genes’ frequency in the next generation. Thus foxes catch rabbits, and rabbits evade foxes. We are simply the embodied history of which organisms did in fact survive and reproduce. This doesn’t seem like it was designed by the Judeo-Christian concept of one benevolent God. Evolution is like an amoral blind idiot god – not like Jehovah, but like Azathoth (from H.P. Lovecraft’s stories), burbling chaotically at the center of everything. Evolution is awesomely powerful, unbelievably stupid, incredibly slow, monomaniacally single-minded, irrevocably splintered in focus, blindly shortsighted, and itself a completely accidental process.

The wonder of evolution is that despite being stupid and inefficient, it works anyway. The first accidental replicator (probably RNA) did not replicate gracefully, so don’t praise it too highly for being a “wonderful designer” – that’s missing the point. Evolution doesn’t work amazingly well, but that it works at all is the wonder: that a first replicator could arise by pure accident in the primordial seas of Earth, and that a brainless, mindless optimization process can produce complex designs by working incrementally. But evolution is not a wonderfully intelligent designer that we humans ought to imitate!

We have a definite picture of evolution’s capabilities. Evolution is slow: it can take an allele thousands of generations to become fixated in the gene pool, and complex adaptations take millions of years. Evolution is sufficiently simpler than organic brains that we can describe mathematically how slow and stupid it is. The formula for the number of generations to fixation is 2ln(N)/s where N is the population size and (1+s) is the fitness advantage of the gene, and the probability that a mutation reaches fixation is 2s. For example, a gene conveying a 3% advantage would take 768 generations (on average) to reach universality among a population of 100,000 and have a 6% chance of fixating. By comparison, human technology seems like magic.

Evolution is not invoked wherever “reproduction” exists. Corporations or self-replicating nanodevices do not “evolve” in the Darwinian sense because they don’t conform to Price’s Equation, which says that for each generation, the change in average characteristic (e.g. a gene for height) equals the covariance between the characteristic and its relative reproductive fitness. It is practically impossible for complex adaptations to arise without constant use. You need high-fidelity long-range heritability across the generations, plus variance in reproduction, plus frequent death of old generations to have enough cumulative selection pressure to produce complex adaptations by evolution.

Contrary to common misconception, evolution doesn’t optimize for species survival, but for allele frequency. Evolution cares only about the inclusive fitness of genes relative to each other. Genes that “harm” the species may outreproduce their alternative alleles. It is thus possible to evolve to extinction, because individual competition can overcome group selection pressures (for example see cancer cells). There are related concepts like bystander apathy (not helping someone because there are others around) and existential risk (species-level extinction threats which no-one is fighting).

The group selectionists (some pre-1960s biologists) had the romantic aesthetic idea that individuals would voluntarily restrain reproduction in order to avoid collapse of the predator-prey system, for the good of the species. They expected evolution to do smart, nice things like they would do themselves. Group selection is difficult to make work mathematically, and evolution doesn’t usually do this sort of thing. So the biologists conducted a lab experiment with insects. But Nature didn’t listen: it selected for individuals who cannibalize others’ offspring! This case warns us not to anthropomorphize evolution.

Humans are good at arguing that almost any optimization criterion suggests almost any policy. Natural selection optimizes only for inclusive genetic fitness – not politics or aesthetics. That is why it’s a good case study to see the results of a monotone optimization criterion (like that of evolution), to compare to human rationalizations of choices in a complex society. Studying evolution lets us see an alien optimization process and its consequences up close, and lets us see how optimizing for inclusive genetic fitness does not require predators to restrain their breeding to live in harmony with prey. Humans have trouble seeing what option a choice criterion really endorses.

A central principle of evolutionary science is that we are adaptation-executors, not fitness-maximizers; and that is why the consequences of having taste buds are different today than fifty thousand years ago. Taste buds are an executing adaptation because they are adapted to an ancestral environment where calories were scarce, but that doesn’t mean that humans today would find a cheeseburger distasteful. Purpose or meaning exists in the mind of a designer, not the tool itself. Hence the real-world operations of the tool won’t necessarily match the expectations.

Our brains are reproductive organs too, and the evolutionary cause of anger or lust is that ancestors with them reproduced more. Humans have the ability for thought and emotion for the same reason that birds have wings: they are adaptations selected for by evolution. However, this does not mean that the cognitive purpose of anger is to have children! Our neural circuitry has no sense of “inclusive genetic fitness”. The reason why a particular person was angry at a particular thing is separate! The cause, shape, meaning, and consequence of an adaptation are all separate things. Evolutionary psychology demands careful nitpicking of facts.

We should use language that separates out evolutionary reasons for behavior and clearly labels them. Draw a clear boundary between psychological events in a brain (cognitive reasons for behavior), and historical evolutionary causes! We should be less cynical about executing adaptations to do X, and more cynical about doing X because people (consciously or subconsciously) expect it to signal Y. For example, it would be disturbing if parents in general showed no grief for the loss of a child who they consciously believed to be sterile.

In a 1989 experiment, Canadian adults who imagined their expected grief of losing a child of varying age showed a strong correlation (0.92) with the reproductive-potential curve of the !Kung hunter-gatherers – in other words, with the future reproductive success rate of children at that age in the tribe. This shows that the parental grief adaptation continues executing as if the parent were living in a !Kung tribe rather than Canada. Although the grief was created by evolution, it is not about the children’s reproductive value! Parents care about children for their own sake.

The modern world contains things that match our desires more strongly than anything in the evolutionary environment. Candy bars, supermodels, and video games are superstimuli. As of 2007, at least three people have died by playing online games non-stop. Superstimuli exist as a side-effect of evolution. A candy bar corresponds overwhelmingly well to the food characteristics of sugar and fat that were healthy in the environment of evolutionary adaptedness. Today the market incentive is to supply us with more temptation, even as we suffer harmful consequences. Thus, superstimuli are likely to get worse. This doesn’t necessarily mean the government can fix it – but it’s still an issue.

Why did evolution create brains that would invent condoms? Because the blind idiot god wasn’t smart enough to program a concept of “inclusive genetic fitness” into us, so its one pure utility function splintered into a thousand shards of desire. We don’t want to optimize for genetic fitness, but we want to optimize for other things. Human values are complex. We care about sex, cookies, dancing, music, chocolate, and learning out of curiosity. At some point we evolved tastes for novelty, complexity, elegance, and challenge. The first protein computers (brains) had to use short-term reinforcement learning to reproduce, but now we use intelligence to get the same reinforcers. With our tastes we judge the blind idiot god’s monomaniacal focus as aesthetically unsatisfying. We don’t have to fill the future with maximally-efficient replicators, and it would be boring anyway. Being a thousand shards of desire isn’t always fun, but at least it’s not boring.


13

Fragile Purposes

This chapter abstracts from human cognition and evolution to the idea of minds and goal-directed systems at their most general. It also explains Yudkowsky’s general approach to philosophy and the science of rationality, which is strongly informed by his work in AI.

What does a belief that an agent is intelligent look like, and what predictions does it make? For optimization processes (like Artificial Intelligence or natural selection) we can know the final outcomes without being able to predict the exact next action or intermediary steps, because we know the target toward which they are trying to steer the future. For example, you don’t know which moves Garry Kasparov will make in a chess game, yet your belief that Kasparov is a better chess player than his opponent tells you to anticipate that Kasparov will win (because you understand his goals).

Complex adaptations in a sexually-reproducing species are universal, and this applies to the human brain. We share universal psychological properties that let us tell stories, laugh, cry, keep secrets, make promises and be sexually jealous, and so on. All human beings employ nearly identical facial expressions and share the emotions of joy, sadness, fear, disgust, anger and surprise. We take these for granted. But thus we naively expect all other minds to work like ours, which causes problems when trying to predict the actions of non-human intelligences.

It’s really hard to imagine aliens that are fundamentally different from human beings, which is why aliens in movies (e.g. Star Trek) always look pretty much like humans. People anthropomorphize evolution, AI and fictional aliens because they fail to understand (or imagine) humanity as a unique special case of intelligence. Not all intelligence requires looking or even thinking like a human! If there are true alien species, they might be more different from us than we are from insects, and they may feel emotions (if they even have emotions) that our minds cannot empathize with.

Your power as a mind is your ability to hit small targets in a large search space – either the space of possible futures (planning) or the space of possible designs (invention). Intelligence and natural selection are meta-level processes of optimization, with technology and cats on the object-level as outputs. We humans have invented science, but we have not yet redesigned the protected, hidden meta-level structure of the human brain itself. Artificial Intelligence (AI) that could rewrite its own machine code would be able to recursively optimize the optimization structure itself – a historical first! This “singularity” would therefore be far more powerful than calculations based on human progress would suggest. The graph of “optimization power in” versus “optimized product out” after a fully general AI is going to look completely different from Earth’s history so far.

Students who take computer programming courses often have the intuition that you can program a computer by telling it what to do – literally, like writing “put these letters on the screen”. But the computer isn’t going to understand you unless you program it to understand. Artificial Intelligence can only do things that follow a lawful chain of cause-and-effect starting at the programmer’s source code. It has no ghostly free will that you can just tell what to do as if it were a human being. If you are programming an AI, you aren’t giving instructions to a little ghost in the machine who looks over the instructions and decides how to follow them; you are creating the ghost, because the program is the AI. If you have a program that computes which decision the AI should make, you’re done. So, a Friendly AI cannot decide to just remove any constraints you place on it.

Imagine that human beings had evolved a built-in ability to count sheep, but had absolutely no idea how it worked. These people would be stuck on the “artificial addition” (i.e. machine calculator) problem, the way that people in our world currently are stuck on artificial intelligence. In that world, philosophers would argue that calculators only “simulated” addition rather than really adding. They would take the same silly approaches, like trying to “evolve” an artificial adder or invoking the need for special physics. All these methods dodge the crucial task of understanding what addition involves. The real history of AI teaches us that we should beware of assertions that we can’t generate from our own knowledge, and that we shouldn’t dance around confusing gaps in our knowledge. Until you know a clever idea will work, it won’t!

Instrumental values (means) are different from terminal values (ends), yet they get confused in moral arguments. Terminal values are world states that we assign some sort of positive or negative worth (utility) to. Instrumental values are links in a chain of events that lead to desired world states. Our decisions are based on expected utilities of actions, which depend on factual beliefs about the consequences (i.e. the probability of the outcome conditional on taking the action), and the utility of possible outcomes. An ideal Bayesian decision system picks an action whose expected utility is maximal. Many debates fail to distinguish between terminal and instrumental values. People may share values (e.g. that crime is bad) but disagree about the factual instrumental consequences of an action (e.g. banning guns).

Despite our need for closure, generalizations are leaky. The words and statements we use at the macroscopic level are inherently “leaky” because they do not precisely convey absolute and perfect information. For example, most humans have ten fingers, but if you know that someone is a human, that doesn’t guarantee that they have ten fingers. This also applies to planning and ethical advice, like instrumental values (i.e. expected utility). They often have no specification that is both compact (simple) and local. Yet this complexity of action doesn’t say much about the complexity of goals. If expected utilities are leaky and complicated, it doesn’t mean that utilities (terminal values) must be leaky and complicated as well. For example, we might say that killing anyone, even Hitler, is bad. But the positive utility of saving many lives by shooting Hitler can make the net expected utility positive.

There are a lot of things that humans care about. Therefore, the wishes we make (as if to a genie) are enormously more complicated than we would intuitively suspect. If you wish for your mother to “get out of a burning building”, there are many paths through time and many roads leading to that destination. For example, the building may explode and send her flying through the air; she could fall from a second-story window; a rescue worker may remove her body after the building finishes burning down, etc. Only you know what you actually wanted. A wish is a leaky generalization: there is no safe genie except one who shares your entire morality, especially your terminal values. Otherwise the genie may take you to an unexpected (and undesirable) destination – a safe genie just does what it is you wish for (so you might as well say, “I wish for you to do what I should wish for”).

Remember the tragedy of group selectionism: evolution doesn’t listen to clever, persuasive arguments for why it will do things the way you prefer, so don’t bother. The causal structure of natural selection or AI is so different from your brain that it is an optimistic fallacy to argue that they will necessarily produce a solution that ranks high in your preference-ordering. Yet people do this out of instinct, since their brains are black boxes, and we evolved to argue politics. We tend not to bring to the surface candidate strategies we know no person wants. But remember, different optimization processes will search for possible solutions in different orders.

Civilization often loses sight of purpose (terminal values). Various steps in a complex plan to achieve some desired goal become valued in and of themselves, due to incentive structures. For example, schools focus more on preparing for tests than instilling knowledge, because it gives bureaucracy something to measure today. Such measures are often leaky generalizations. The extreme case is Soviet shoe factories manufacturing tiny shoes to meet their production quotas. Notice when you are still doing something that has become disconnected from its original purpose.


14

A Human’s Guide to Words

This chapter discusses the basic relationship between cognition and concept formation. You may think that definitions can’t be “wrong”, but suboptimal use of categories can have negative side effects on your cognition – like teaching you the attitude of not admitting your mistakes.

In the Parable of the Dagger (adapted from Raymond Smullyan), a court jester is shown two boxes: one with the inscription “Either both inscriptions are true, or both inscriptions are false” and another with “This box contains the key”. The jester is told by the King that if he finds the key, he’ll be let free, but that the other box contains a dagger for his heart. The jester tries to reason logically that the second box must contain the key, but opens it to find a dagger. The King explained: “I merely wrote those inscriptions on two boxes, and then I put the dagger in the second one.” This tale illustrates how self-referential sentences can fool you, since their truth value cannot be empirically determined. It is a version of the liar paradox (i.e. “this sentence is a lie”). Words should connect to reality – otherwise I could ask you: Is Socrates a framster? Yes or no? Statements are only entangled with reality if the process generating them made them so.

A standard syllogism goes: “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” But if you defined humans to not be mortal, would Socrates live forever? No. You can’t make reality go a different way by choosing a different definition. A logically valid syllogism is valid in all possible worlds, so it doesn’t tell us which possible world we actually live in. If you predict that Socrates would die if he drank hemlock, this is an empirical proposition – it is not true “by definition”, because there are logically possible worlds where Socrates may be immune to hemlock due to some quirk of biochemistry. Logic can help us predict, but not settle an empirical question. And if mortality is necessary to be “human” by your definition, then you can never know for certain that Socrates is a “human” until you observe him to be mortal – but you know perfectly well that he is human.

Our brains make inferences quickly and automatically – which is fine for recognizing tigers, but can cause you to make philosophical mistakes. The mere presence of words can influence thinking, sometimes misleading it. Labeling something can disguise a challengeable inductive inference you are making. For example, defining humans as mortal means you can’t classify someone as human until you observe their mortality. Or, imagine finding a barrel with a hole large enough for a hand. If the last 11 small, curved egg-shaped objects drawn have been blue, and the last 8 hard, flat cubes drawn have been red, it is a matter of induction to say that this rule will hold in the future (it is not proven). But if you call the blue eggs “bleggs” and the red cubes “rubes”, you may reach into the barrel, feel an egg shape, and think “Oh, it’s a blegg”.

Definitions are maps, not treasure. They can be intensional (referring to other words) or extensional (pointing to real-life examples). Ideally, the extension should match the intension, so you shouldn’t define “Mars” as “The God of War” and simultaneously point to a red light in the night sky and say “that’s Mars”. And you should avoid defining words using ever-more-abstract words without being able to point to an example. If someone asks “What is red?” it’s better to point to a stop sign and an apple than to say, “Red is a color, and color is a property of a thing.” But the brain applies intensions sub-deliberately, and you cannot capture in words all the details of the cognitive concept as it exists in your mind. Thus you cannot directly program the neural patterns of a concept into someone else’s brain; hence why people have arguments over definitions.

Humans have too many things in common to list; definitions like “featherless biped” are just clues that lead us to similarity clusters or properties that can distinguish similarity clusters. Our psychology doesn’t follow Aristotelian logical class definitions. If your verbal definition doesn’t capture more than a tiny fraction of the category’s shared characteristics, don’t try to reason as if it does. Otherwise you’ll be like the philosophers of Plato’s Academy who claimed that the best definition of a human was a “featherless biped”, whereupon Diogenes the Cynic is said to have exhibited a plucked chicken and declared, “Here is Plato’s Man.” The Platonists then changed their definition to “a featherless biped with broad nails.” Better would be to say, “See Diogenes over there? That’s a human, and I’m a human, and you’re a human, and that chimpanzee is not, though fairly close.”

Cognitive psychology experiments have found that people think that pigeons and robins are more “typical birds” than penguins and ducks (due to typicality effects or prototype effects), and that 98 is closer to 100 than 100 is to 98. Interestingly, a between-groups experiment showed that subjects thought a disease was more likely to spread from robins to ducks on an island, than from ducks to robins. Likewise, Americans seem to reason as if closeness is an inherent property of Kansas and distance is an inherent property of Alaska, and thus that Kansas is closer to Alaska than Alaska is to Kansas. Aristotelian categories (true-or-false membership) aren’t even good ways of modeling human psychology! Category membership is not all-or-nothing; there are more or less typical subclusters.

Objects have positions in configuration space, along all relevant dimensions (e.g. mass, volume). The position of an object’s point in this space corresponds to all the information in the real object itself. Concepts roughly describe clusters in thingspace. Categories may radiate a glow: an empirical cluster may have no set of necessary and sufficient properties, so definitions cannot exactly match them. The map is smaller and much less complicated than the territory. In practice, a verbal definition may work well enough to point out the intended cluster of similar things, in which case you shouldn’t nitpick exceptions. If you look for the empirical cluster of “has ten fingers, wears clothes, uses language”, then you’ll get enough information that the occasional nine-fingered human won’t fool you.

Questions may represent different queries on different occasions, and definitional disputes are about whether to infer a characteristic shared by most things in an empirical cluster. But a dictionary can’t move the actual points in Thingspace. Is atheism a “religion”? Regardless of what the dictionary says, the disguised query is about whether the empirical cluster of atheists use the same reasoning methods (or violence) used in religion. It is problematic when you ask whether something “is” or “is not” a category member but can’t name the question you really want answered. For example, what is a “man”, and is Barney the Baby Boy a “man”? The “correct” answer may depend considerably on whether the query you really want answered is “Would hemlock be a good thing to feed Barney?” or “Will Barney make a good husband?”

Our brains probably use a fast, cheap and scalable neural network design, with a central node for a category (e.g. “human”). Thus we associate humans with wearing clothes and being mortal, but it’s less intuitive to connect clothes with mortality. This design is visualized in the following figure:

We intuitively perceive hierarchical categories, but it’s a mistake to treat them like the only correct way to analyze the world, because other forms of statistical inference are possible (even though our brains don’t use them). For example, compare the following two networks:

A human might easily notice whether an object is a “blegg” or a “rube”, but not that red furred objects which never glow in the dark have all the other characteristics of bleggs. Other statistical algorithms (like the network on the right) work differently.

Our cognitive design means that even after we know all the surrounding nodes, the central node may still be unactivated – thus we feel as if there is a leftover question. This is what our brain’s algorithm feels like from inside. To fix your errors, you must realize that your mind’s eye is looking at your intuitions, not at reality directly. Categories are inferences implemented in a real brain, not manna fallen from the Platonic Realm. One example of a mistake is when you argue about a category membership even after screening off all questions that could possibly depend on a category-based inference. After you observe that an object is blue, egg-shaped, furred, flexible, opaque, luminescent, and palladium-containing, there’s nothing left to ask by arguing “Is it a blegg?” even if it may still feel like there’s a leftover question.

If a tree falls in a deserted forest, does it make a sound? The word “sound” can refer to auditory experiences in a brain, or acoustic vibrations in the air, and two disputants would probably agree about what is actually going on inside the forest. But even after knowing that the falling tree creates acoustic vibrations but not auditory experiences, it feels like there’s a leftover question. So they argue about definitions, even though dictionary editors are mere historians of usage, not legislators of language. Oftentimes, the definition of a word becomes politically charged only after the argument has started. Don’t allow an argument to slide into being about definitions if it isn’t what you originally wanted to argue about. It may help to keep track of some testable proposition that the argument is actually about, and to avoid using the word that’s causing problems. Remember that anything which is true “by definition” is true in all possible worlds, so observing its truth can never constrain which world you live in.

For evolutionary reasons, the complexity of communication is hidden from us (because fast transmission of thoughts is crucial in times of danger, like when there is a tiger around). So we intuitively feel like the label or word and its meaning (concept) are identical, or that the meaning is an intrinsic property of the word. Hence disputing definitions feels like disputing facts, and people want to find the correct meaning of a label like “sound”. But meaning is not a property of the word itself; there is just a label that your brain associates to a particular concept.

Language relies on coordination to work, so we have a mutual interest in using the same words for similar concepts. When each side understands what’s in the other’s mind, you’re done – in that case there’s no need to argue over the meanings of a word. If you defy common usage without reason, you make it gratuitously hard for people to understand you.  People who argue definitions often aren’t trying to communicate, but to infer an empirical or moral proposition. Yet dictionaries can’t settle such queries, as dictionary editors are not legislators of language and don’t have ultimate wisdom on substantive issues. If the common definition contains a problem (e.g. if “dolphin” is defined as a kind of fish), the dictionary will reflect the standard mistake. Do not pull out a dictionary in the middle of an empirical or moral argument, or any argument ever!

People use complex renaming to create the illusion of inference; e.g. “a ‘human’ is a ‘mortal featherless biped’”. But when you replace words with their definitions, Aristotelian syllogisms look less impressive. We would write: “All [mortal featherless bipeds] are mortal; Socrates is a [mortal featherless biped]; therefore Socrates is mortal.” That is because they are empirically unhelpful, and the labels merely conceal the premises and pretend to novelty in the conclusion. If you define a word rigidly in terms of attributes and state that something is that word, you assert that it has all those attributes; if you then go on to say it thus has one of those attributes, you are simply repeating that assertion. People think they can define a word any way they like, but a better proverb is: definitions don’t need words.

In the game Taboo by Hasbro, the player tries to get their partner to guess a word on a card without using that word or five additional words listed on the card. If you play Rationalist’s Taboo on a word and its close synonyms (i.e. thinking without using those words), you will reveal which expectations you anticipate, and reduce philosophical difficulties and disputes. It will help you describe outward observables and interior mechanisms, without using handles. It works better than trying to define your problematic terms. For example, eliminating the word “sound” can avoid the argument between two people about whether a tree falling in a forest produces a sound if nobody hears it.

The existence of a neat little word can prevent you from seeing the details of the thing you are trying to think about. What actually goes on in schools once you stop calling it “education”? Using Rationalist’s Taboo involves visualizing the details and hugging the query. Since categories can throw away information or lead to lost purpose (e.g. conflating a degree for learning), the fix is to replace the symbol with the substance. In other words: replace the word with the meaning, the label with the concept, the signifier with the signified; and dereference the pointer. Zoom in on your map.

The map is not the territory, but you can’t fold up the territory and put it in your glove compartment. Our map inevitably compresses reality, causing things that are distinct to feel like one point. Thus you may end up using one word when there are two or more different things-in-reality, and dump all the facts about them into a single undifferentiated mental bucket (e.g. using the word “consciousness” to refer to being awake and reflectivity). You may not even know that two distinct entities exist! Expanding the map is a scientific challenge, but sometimes using words wisely can help to allocate the right amount of mental buckets. A good hint is noticing a category with self-contradictory attributes.

The Japanese have a theory of personality based on blood types; people with blood type A are earnest and creative while those with type B are wild and cheerful, for example. This illustrates how we can make up stuff and see illusory patterns as soon as we name the category. Once you draw a boundary around a group, the mind starts trying to harvest similarities from it. Our brains detect patterns whether they are there or not. It’s just how our neural algorithms work – thus categorizing has consequences! One more reason not to believe that you can define a word any way you like.

We use words to infer the unobserved from the observed, and this can sneak in many connotations. These characteristics usually aren’t listed in the dictionary, implying that people argue because they care about the connotation, not the dictionary definition. For example, suppose the dictionary defines a “wiggin” as a person with green eyes and black hair. If “wiggin” also carries the connotation of someone who commits crime, but this part isn’t in the dictionary, then saying “See, I told you he’s a wiggin! Watch what he steals next…” is sneaking in the connotation.

Disputes are rarely about visible, known, and widely-believed characteristics, but usually about sneaking in connotations when the subject arguably differs from an empirical similarity cluster. For example, some declare that “atheism is a religion by definition!” But you cannot establish membership in an empirical cluster by definition; atheism does not resemble the central members of the “religion” cluster. When people insist that something is “true by definition”, it’s usually when there is other information that calls the default inference into doubt.

It may feel like a word has a meaning that you can discover by finding the right definition, but it is not so. The real challenge of categorization is figuring out which things are clustered together, and to carve reality along its natural joints. Both intensional and extensional definitions can be “wrong” if they fail to do so. For example, dolphins are not fish, because they don’t belong together with salmon and trout. Drawing the right boundary in thingspace is a scientific challenge (not the job of dictionary editors). Which concepts usefully divide the world is a question about the world. And you should be able to admit when your definition-guesses are mistaken.

A good code transmits messages quickly by reserving short words for words used frequently. The theoretical optimum (according to the Minimum Description Length formalization of Occam’s Razor) is to have the length of the message you need to describe something correspond exactly to its probability. The labels or sounds we attach to concepts aren’t arbitrary! Humans tend to use basic-level categories (e.g. “chair”) more than specific ones (e.g. “recliner”). If you use a short word for something you won’t need to describe often or a long word for something you will, the result is inefficient thinking or even a misapplication of Occam’s Razor if your mind thinks that short sentences sound “simpler”. For instance, “God did a miracle” sounds more plausible than “a supernatural universe-creating entity temporarily suspended the laws of physics.”

When two variables are entangled, such that P(Z, Y) > P(Z)*P(Y), they have mutual information. This means that one is Bayesian evidence about the other. Words are wrong when some of the thing’s properties don’t help us do probabilistic inference about the others. The way to carve reality at its joints is to draw your boundaries around concentrations of unusually high probability density in Thingspace. So if green-eyed people are not more likely to have black hair, or vice versa, and they don’t share any other characteristics in common, why have a word for “wiggin”?

A concept is any rule for classifying things. The conceptspace of definable rules that include or exclude examples is exponentially larger than thingspace (the space of describable things), because the space of all possible concepts grows superexponentially in the number of attributes. Thus, learning is inductive bias that requires us to use highly regular concepts and draw simple boundaries around high probability-density concentrations. Don’t draw an unsimple boundary without any reason to do so. It would be suspicious if you defined a word to refer to all humans except black people. If you don’t present reasons to draw that particular boundary, raising it to the level of our deliberate attention is like a detective saying, “Well, I haven’t the slightest shred of support one way or the other for who could’ve murdered those orphans… but have we considered John Q. Wiffleheim of 1234 Norkle Road as a suspect?”

In a system with three perfectly correlated variables, learning Z can render X and Y conditionally independent. Likewise, a neural network can use a central variable to screen off the others from each other to simplify the math: this is called NaĂŻve Bayes. If you want to use categorization to make inferences about properties, you need the appropriate empirical structure, which is conditional independence given knowledge of the class (to be well-approximated by NaĂŻve Bayes). For example, if the central class variable is “human”, then we can predict the probabilities of the thing having ten fingers, speaking, and wearing clothes – but these are all independent (so just because a person is missing a finger doesn’t make them more likely to be a nudist than the next person). This Bayesian structure applies to many AI algorithms.

Visualize a “triangular lightbulb”. What did you see? Can you visualize a “green dog”? Your brain can visualize images, and words are like paintbrush handles which we use to draw pictures of concepts in each other’s minds. The mental image is more complex and detailed than the word itself or the sound in the air. Words are labels that point to things/concepts. Words are not like tiny little LISP symbols in your mind. When you are handed a pointer, sooner or later you have to dereference it and actually look in the relevant memory area to see what the word points to.

Sentences may contain hidden speaker-dependent or context-dependent variables, which can make reality seem protean, shifting and unstable. This illusion may happen especially when you use a word that has different meanings in different places as though it meant the same thing on each occasion. For example, “Martin told Bob the building was on his left” – but whose “left”, Bob’s or Martin’s? The function-word “left” evaluates with a speaker-dependent variable grabbed from the surrounding context. Remember that your mind’s eye sees the map, not the territory directly.

Using words unwisely or suboptimally can adversely affect your cognition, since your mind races ahead unconsciously without your supervision. Therefore you shouldn’t think that you can define a word any way you like. In practice, saying “there’s no way my choice of X can be ‘wrong’” is nearly always an error, because you can always be wrong. Everything you do in the mind has an effect. You wouldn’t drive a car over thin ice with the accelerator floored and say “Looking at this steering wheel, I can’t see why one radial angle is special – so I can turn the steering wheel any way I like.”


Interlude: An Intuitive Explanation of Bayes’s Theorem

THIS ESSAY FROM Yudkowsky’s website gently introduces Bayesian inference. If you look up “Bayes’s Theorem” or “Bayes’s Rule”, you’ll find a random statistical equation and wonder why it’s useful or why your friends and colleagues sound so enthusiastic about it. Soon you will know.

To begin, here’s a situation that doctors often encounter: 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies, but 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? What do you think the answer is? Surprisingly, most doctors get the same wrong answer on this problem!

Studies find that most doctors estimate the probability to be between 70% and 80%, which is incorrect. To get the correct answer, suppose that 100 out of 10,000 women (1%) have breast cancer and 9,900 do not. From the group of 100, 80 will have a positive mammography. From the group of 9,900 about 950 will also have a positive mammography. So the total number of women with positive mammographies is 950 + 80 = 1,030. Of these 1030 women, 80 will have breast cancer. As a proportion, this is 80/1030 = 0.07767 = 7.77% or approximately 7.8%. This is the correct answer.

Figuring out the final answer requires three pieces of information: the original percentage of women with breast cancer (the prior probability); the percentage of women without breast cancer who receive false positives; and the percentage of women with breast cancer who receive true positives. The last two are known as conditional probabilities. All of this initial information is collectively known as the priors. The final answer is known as the revised probability or posterior probability, and as we have just seen, it depends in part on the prior probability.

To see why you always need all three pieces of information, imagine an alternate universe where only one woman out of a million has breast cancer. If a mammography gave a true positive result in 8 out of 10 cases and had a false positive rate of 10%, then the initial probability that a woman has breast cancer is so incredibly low that even if she gets a positive result, it’s still probably the case that she does not have breast cancer (because there are a lot more women getting false positives than true positives). Thus, the new evidence you get from the mammography does not replace the data you had at the outset, but merely slides the probability in one direction or the other from that starting point. And if someone is equally likely to get the same test result regardless of whether or not she has cancer, then the test doesn’t tell us anything, and we don’t shift our probability.

Now suppose a barrel contains many small plastic eggs, some red and some blue. If 40% of the eggs contain pearls and 60% contain nothing, we can write P(pearl) = 40%. Furthermore, if 30% of the eggs containing pearls are painted blue and 10% of the eggs containing nothing are painted blue, we can write P(blue|pearl) = 30% which means “the probability of blue given pearl”, and P(blue|~pearl) = 10% which means “the probability that an egg is painted blue, given that the egg does not contain a pearl”. What is the probability that a blue egg contains a pearl? In other words, P(pearl|blue) = ?

If 40% of eggs contain pearls and 30% of those are painted blue, then the fraction of all eggs that are blue and contain pearls is 12%. If 10% of the 60% of eggs containing nothing are painted blue, then 6% of all the eggs contain nothing and are painted blue. So 12% + 6% = 18% of eggs are blue. Therefore the chance a blue egg contains a pearl is 12/18 = 2/3 = 67%. Without Bayesian reasoning, one might respond that the probability a blue egg contains a pearl is 30% or maybe 20% (by subtracting the false positive rate). But that makes no sense in terms of the question asked.

It might help to visualize what is going on here. In the following graphic, the bar at the top is measuring all eggs, both blue and red; while the bottom bar is measuring only blue eggs. You can see how the prior probability and two conditional probabilities determine the posterior.

If we grab an egg from the bin and see that it is blue, and if we know that the chances it would be blue if it had a pearl are higher than the chances it would be blue if it didn’t have a pearl, then we slide our probability that the egg contains a pearl in the upward direction.

Studies find that people do better on these problems when they are phrased as natural frequencies (i.e. absolute numbers like “400 out of 1000 eggs”) rather than percentages or probabilities. A visualization in terms of natural frequencies might look like this:

Here you can see that the collection of all eggs is larger than the collection of just blue eggs, and you can see that the pearl condition takes up a larger proportion of the bottom bar than it does in the top bar, which means we are updating our probability upward.

The quantity P(A,B) is the same as P(B,A) but P(A|B) is not the same thing as P(B|A) and P(A,B) is completely different from P(A|B). The probability that a patient has breast cancer and a positive mammography is P(positive, cancer) which we can find by multiplying the fraction of patients with cancer P(cancer) and the chance that a cancer patient has a positive mammography P(positive|cancer). These three variables share two degrees of freedom among them, because if we know any two, we can deduce the third.

E.T. Jaynes suggested that credibility and evidence should be measured in decibels, which is given by 10*log10(intensity). Suppose we start with a 1% prior probability that a woman has breast cancer. This can be expressed as an odds ratio of 1:99. Now we administer three independent tests for breast cancer. The probability that a test gives a true positive divided by the probability that it gives a false positive is known as the likelihood ratio of that test. Let’s say the likelihood ratios for the three tests are 25:3, 18:1 and 7:2 respectively. We can multiply the numbers to get the odds for women with breast cancer who score positive on all three tests versus women without breast cancer who score positive on all three tests: 1*25*18*7 : 99*3*1*2 = 3150:594. To convert the odds back into probabilities, we just write 3150/(3150+594) = 84%.

Alternatively, you could just add the logarithms. We start out with 10*log10(1/99) = -20 decibels for our credibility that a woman has breast cancer, which is fairly low. But then the three test results come in, corresponding to 10*log10(25/3) = 9 decibels of evidence for the first test, 13 for the second and 5 for the third. This raises the credibility level by a total of 27 decibels, which shifts the prior credibility from -20 decibels to a posterior credibility of 7 decibels, so the odds go from 1:99 to 5:1 and the probability goes from 1% to around 83%.

To find the probability that a woman with a positive mammography has breast cancer, or P(cancer|positive), we computed P(positive, cancer)/P(positive), which is

which is the same as

which in its general form is known as Bayes’s Theorem or Bayes’s Rule.

When there is some phenomenon A we want to investigate, and some observation X that is evidence about A, Bayes’s Theorem tells us how we should update our probability of A given the new evidence X. The formula describes what makes something “evidence” and how much evidence it is. In cognitive science, a rational mind is a Bayesian reasoner. In statistics, models are judged by comparison to the Bayesian method. There is even the idea that science itself is a special case of Bayes’s Theorem, because experimental evidence is Bayesian evidence.

The Bayesian revolution in the sciences is currently replacing Karl Popper’s philosophy of falsificationism. If your theory makes a definite prediction that P(X|A) 1 then observing ~X strongly falsifies A, whereas observing X does not definitely confirm the theory because there might be some other condition B such that P(X|B) 1. It is a Bayesian rule that falsification is much stronger than confirmation – but falsification is still probabilistic in nature, and Popper’s idea that there is no such thing as confirmation is incorrect.

This is Bayes’s Theorem:

On the left side of the equation, X is the evidence we’re using to make inferences about A. Think of it as the degree to which X implies A. In other words, P(A|X) is the proportion of things that have property A and property X among all things that have A; e.g. the proportion of women with breast cancer and a positive mammography within the group of all women with positive mammographies. In a sense, P(A|X) really means P(A,X|X) but since you already know it has property X, specifying the extra X all the time would be redundant. When you take property X as a given, and restrict your focus to within this group, then looking for P(A|X) is the same as looking for just P(A).

The right side of Bayes’s Theorem is derived from the left side since P(A|X) = P(X,A)/P(X). The implication when we reason about causal relations is that facts cause observations, e.g. that breast cancer causes positive mammography results. So when we observe a positive mammography, we infer an increased probability of breast cancer, written P(cancer|positive) on the left side of the equation. Symmetrically, the right side of the equation describes the elementary causal steps, so they take the form P(positive|cancer) or P(positive|~cancer). So that is Bayes’s Theorem.

Rational inference (mind) on one side is bound to physical causality (reality) on the other.

Reverend Thomas Bayes welcomes you to the Bayesian Conspiracy.




Comments

Popular posts from this blog

Don't accept a Nobel Prize -- and other tips for improving your rationality