The penultimate book of rationality
This is part 5 of 6 in my series of summaries. See this post for an introduction.
Part V
Mere Goodness
Interlude: The Twelve Virtues of Rationality
Part V
Mere Goodness
T
|
his part asks how we can justify, revise,
and naturalize our values and desires. What makes something valuable in the moral, aesthetic or
prudential sense? How can we understand our goals without compromising our
efforts to actually achieve them? When should we trust our impulses?
Value theory is the study of what people
care about: goals, tastes, pleasures, pains, fears, and ambitions. This
includes morality, but also everyday things like art, food, sex, friendship,
and going to the movies with Sam. It includes not only things we already care about, but also things we wish we would care about if we were
wiser and better people. How we act is not always how we wish we’d act. Our
preferences and desires may conflict with each other, or we may lack the will
or insight needed to act the way we’d like to. Hence, humans are not
instrumentally rational.
Philosophers, psychologists and
politicians disagree wildly about what we want and about what we ought to want;
they even disagree about what it means to
“ought” to want something. There is a gulf between how we think we wish we’d act, and how we actually wish we’d act. Humanity’s history with moral theory does
not bode well for you if you’re trying to come up with a reliable and
pragmatically useful specification of your goals – whether you want to build
functional institutions, decide which charity to donate to, or design safe
autonomous adaptive AI.
But a deeper understanding of your values
should make you better at actually fulfilling them. And it helps to ask, what
outcomes are actually valuable? Yudkowsky calls his attempt to figure out what
our ideal vision of the future would look like “Fun Theory”. Questions of fun
theory intersect with questions of transhumanism, like hedonism vs. eudaimonia,
cryonics, mind uploading, and large-scale space colonization. Yet creating a
valuable future takes work, so we also have to ask: how shall we get there?
It’s not just about the destination, but also the journey.
21
|
Fake Preferences
|
This
chapter collects a sequence of blog posts on failed attempts at theories of
human value.
|
It is a misconception that “rational”
preferences must reduce to selfish hedonism (i.e. caring strictly about
personally experienced pleasure). Our values can’t be reduced to happiness
alone, because happiness is not the only important consequent of our decisions.
We treat people as ends in themselves, and care not only about subjective
states, but also objective accomplishments. We have preferences for how the
world is, not just for how we think the world is. An ideal Bayesian
agent can have a utility function that ranges over anything (including art,
science, love, freedom, etc.), not just internal subjective experiences.
Some people think they ought to be
selfish, and they find ways to rationalize altruistic behavior (e.g.
contributing to society) in selfish terms. But they probably started with the
bottom line of espousing this idea, rather than truly trying to be selfish.
Thus, they aren’t really selfish; if they were, there would be a lot more
productive things to do with their time than espouse the philosophy of
selfishness. Instead, these people do whatever it is they actually want,
including being nice to others, and then find some sort of self-interest
rationalization for it.
Many people provide fake reasons for
their own moral reasoning, such as following divine command ethics. Religious
fundamentalists, who fear the prospect of God withdrawing its threat to punish
murder, reveal an internal moral compass: in other words, a revulsion to murder
which is independent of whether God punishes murder or not (but not eating pork
or sneezing). They steer their decision system by that moral compass. If you
fear that humans would lack morality without an external threat, it means you like morality, rather than just being
forced to abide by it. Other examples of fake morality include selfish-ists who
provide altruistic justifications for selfishness, and altruists who provide
selfish justifications for altruism. If you want to know how moral someone is,
don’t look at their reasons; look at what they actually do.
The only process that reliably
regenerates all the local decisions you would make given your morality, is your
morality. For very tough problems, like Friendly AI, people propose solutions
very fast: they suggest an “amazingly simple utility function”. These folks
should learn from the tragedy of group selectionism, our thousand shards of
desire, the hidden complexity of planning, and the importance of keeping to
your original purpose. Yudkowsky had to write the sequences on evolution, fragile
purposes, and affective death spirals so he could make this point.
Many people have the seeming fascination
with trying to compress morality down to a single principle. They think they
know the amazingly simple utility
function that is all we need to program into an artificial superintelligence and
then everything will turn out fine. But a utility function doesn’t have to be
simple. We try to steer the future according to our terminal values, which are
complicated, and leaving one value
out of an AI’s utility function could lead to existential catastrophe. Yet the
“One Great Moral Principle” makes people go off in a happy death spiral.
The detached
lever fallacy is thinking that you can pry a control lever (e.g. words,
loving parents) from a human context and use it to program an AI, without
knowing how the underlying machinery works. The lever may be visible and
variable, but there is a lot of constant machinery hidden beneath the words
(and rationalist’s Taboo is one way to make a step towards exposing it). If the
AI doesn’t have the internal machinery, then prying the lever off the ship’s
bridge won’t do anything. People (and AIs) aren’t blank slates. For example,
even when human culture genuinely contains a whole bunch of complexity, it is
still acquired as a conditional genetic response (which isn’t always “mimic the
input” – hence why you can’t raise children to be perfectly selfless). In
general, the world is deeper by far than it appears.
Aristotle talked about telos, the “final cause” (or purpose) of
events. But there are three fallacies of teleological reasoning. The first is backward
causality: making the future a literal cause of the past. The second is
anthropomorphism: to attribute goal-directed behavior to things that are not
goal-directed, because you used your own brain as a black box to predict
external events. And the third is teleological capture: to treat telos as if it were an inherent property
of an object or system, thus committing the mind projection fallacy. These
contribute to fake reductions of mental properties.
It can feel as though you understand how to build an AI when
really, you’re still making all your predictions based on empathy. AI designs
made of empathic human parts (detached levers) are only dreams; you need
genuine reduction, not mysterious black boxes, to arrive at a non-mental causal
model that explains where intelligence comes from and what it will do. Your AI
design will not work until you figure out a way to reduce the mental to the
non-mental. Otherwise it can exist in the imagination but not translate into
transistors. And this is why AI is so hard.
AIs don’t form a natural class like
humans do; they are part of the vast space of minds-in-general, which is part
of the space of optimization processes. So don’t generalize over mind design space (e.g. by claiming that
all or no minds do X)!
There is an incredibly wide
range of possibilities. Somewhere in mind design space is at least one possible
mind with almost any kind of logically consistent property you care to imagine.
Having one word for “AI” is like having a word for everything which isn’t a
duck. This is also a reason why predicting what the future will be like after
the Singularity is pretty hard.
22
|
Value Theory
|
This
chapter is about obstacles to developing a new theory of value, and some
intuitively desirable features of such a theory.
|
If every belief must be justified, and those
justifications in turn must be justified, how do you terminate the infinite
recursion of justification? Ultimately, when you reflect on how your mind
operates and consider questions like “why does Occam’s Razor work?” and “why do
I expect the future to be like the past?” (the problem of induction), you have no other option but to use your own
mind. Always question everything and play to win, but note that you can only do
so using your current intelligence and rationality. Reflect on your mind’s
degree of trustworthiness, using your current mind – it’s not like you can use
anything else. There is no way to jump to an ideal state of pure emptiness and
evaluate these claims without using your existing mind (just as there is no
argument that you can explain to a rock). This reflective loop has a meta-level
character and is not a circular
logic.
Yudkowsky’s ideas on reflection differ from
those of other philosophers; his method is massively influenced by his work in
AI. If you think induction works, then you should use it in order to use your
maximum power. It’s okay to use induction and Occam’s Razor to think about and
inspect induction and Occam’s Razor, because that is how a self-modifying AI
would improve its source code with the aim of winning. Reflective coherence is
just a side-effect. (And since this loop of justifications goes through the
meta-level, it’s not the same thing as circular logic on the object level.)
There is no irreducible central ghost inside a
mind who looks over the neurons or source code. Minds are causal physical
processes, so it is theoretically possible to specify a mind which draws any
conclusion in response to any argument. For every system processing
information, there is a system with inverted output which makes the opposite
conclusion. This applies to moral conclusions, and regardless of the
intelligence of the system. For every argument, however convincing it may seem
to us, there exists at least one possible mind that doesn’t buy it. There is no
argument that will convince every possible mind. There are no universally
compelling arguments because compulsion is a property of minds, not a property of arguments! Thus there
is no “one true universal morality that must persuade any AI”.
When confronted with a difficult question,
don’t try to point backwards to a misunderstood black box. When you find a
black box, look inside it, and resist mysterious answers or postulating another
black box inside it (which is called passing the recursive buck). For example, you don’t need “meta free will”
to explain why your brain chooses to follow empathy over fear. The buck stops here: you did not choose for
heroic duty to overpower selfishness – that overpowering was the choice. Similarly, in Lewis Carroll’s story “What the
Tortoise Said to Achilles”, when Achilles says “If you have [(A&B)→Z] and you also have
(A&B) then surely you have Z,” and the Tortoise replies “Oh! You mean
<{(A&B)&[(A&B)→Z]}→Z> don’t you?”, the Tortoise is passing the
recursive buck. To stop the buck immediately, the Tortoise needs the dynamic of adding Z to the belief pool.
A mind, in order to be a mind,
needs some sort of dynamic rules of inference or action. Your mind is created
“already in motion”, because it dynamically implements modus ponens by translating it into action. But you can’t give
dynamic processes to a static thing like a rock; thus it won’t add Y to its
belief pool when it already has beliefs X and (X→Y). There is no computer program so persuasive
that you can run it on a rock. A “dynamic” is something that happens inside a
cognitive system over time; so if you try to write a dynamic on a piece of
paper, the paper will just lie there.
This is a parable about anthropomorphic
optimism. There is an imaginary society with arbitrary, alien values. The species has a passion for sorting pebbles into
correct heaps. The Pebblesorting People want to create correct heaps, but don’t
know what makes a heap “correct” (although their heaps tend to be prime-number
sized). Their disagreements have caused wars. They now want to build a
self-improving AI and they assume that its intelligence would inevitably result
in it creating reasonable heap sizes. Why wouldn’t smarter minds equal smarter
heaps?
It is possible to talk about “sexiness” as a
property of an observer and a subject: Sexiness
as a 2-place function depends on both the admirer and the entity being
admired. This function takes two parameters and gives a result, for example:
Sexiness:
Admirer, Entity → [0, ∞).
However, it is also possible
to talk about “sexiness” as a property of a subject: Sexiness as a 1-place function depends only on the entity, as long
as each observer can have a different process to determine how sexy someone is
(so speakers may use different 1-place words!). This function could look like:
Sexiness:
Entity → [0, ∞).
In this case, Fred and Bloogah
are different admirers and Fred::Sexiness is different from Bloogah::Sexiness. Through the
mathematical term currying, a two-parameter
function (the uncurried form) is
equivalent to a one-parameter function returning another function (the curried form). We can curry x=plus(2, 3) and write y=plus(2); x=y(3) instead, thus turning
a 2-place function plus into a
1-place function y. Failing to keep
track of this distinction can cause you trouble. Treating a function of two
arguments as though it were a function of one argument is an instance of the
Mind Projection Fallacy or Variable Question Fallacy. This is relevant to
confusing words like “objective”, “subjective” and “arbitrary”.
To those who say “nothing is real”, Yudkowsky
replies, “that’s great, but how does the nothing work?” Imagine being persuaded
that there was no morality and that everything was permissible. Suppose that
nothing is moral and that all utilities equal zero. What would you do? Would
you still tip cabdrivers? Would you still eat the same kinds of foods, and
would you still drag a child off the train tracks? When you cannot be innocent
or praiseworthy, what will you choose anyway? And if some “external objective
morality” (some great stone tablet upon which Morality was written) tells you to
kill people, what then? Why should you even listen, rather than just do what
you’d have wished the stone tablet
had said in the best case scenario? If you could write the stone tablet
yourself, what would it say? Maybe you should just do that.
There is opposition to rationality from people
who think it drains meaning from the universe. But if life seems painful,
reductionism may not be the real source of your problem; if living in a world
of mere particles seems too unbearable, maybe your life isn’t exciting enough
right now? A lot of existential
angst comes from people
trying to solve the wrong problem; so check and see if you’re not just feeling
unhappy because of something else going on in your life. Don’t blame the
universe for being a “meaningless dance of particles”, but try to solve
problems like unsatisfying relationships, poor health, boredom, and so on. It
is a general phenomenon that poor metaethics (e.g. misunderstanding where your
morality comes from) messes people up, because they end up joining a cult or
conclude that love is a delusion or that real morality arises only from
selfishness, etc.
It is easier to agree on
morality (e.g. “killing is wrong”) than metaethics
– what it means for something to be
bad or what makes it bad. Yet to make
philosophical progress, you might need to shift metaethics at some point in
your life. To shift your metaethics, you need a line of retreat to hold your
will in place; for example, by taking responsibility and deciding to save lives
anyway. If your current source of moral authority stops telling you to save
lives, you could just drag the child off the train tracks anyway – and it was
your own choice to follow this morality in the first place.
Causation is distinct from justification; so
while our emotions and morals ultimately come from evolution, that’s no reason
to accept or reject them. Just go on
morally reflecting. You can’t jump out of the system, for even rebelling
against your evolved goal
system must take place within it! When we rebel against our own nature, we act
in accordance with our own nature. There is no ghost of perfect emptiness by
which you can judge your brain from outside your brain. So can you trust your
moral intuitions at all, when they
are the product of mere evolution?
You do know quite a bit about
morality… not perfect or reliable information, but you have some place to
start. Otherwise you’d have a much harder time thinking about morality than you
do. If you know nothing about morality, how could you recognize it when you
discover it? Discarding all your
intuitions is like discarding your brain. There must be a starting point baked
into the concept of morality – a moral frame of reference. And we shouldn’t
just be content with ignorance about moral facts. Why not accept that, ceteris paribus, joy is preferable to
sorrow?
When you think about what you could do, your brain generates a forward-chaining
search tree of states that are primitively reachable from your current state. Should-ness flows backwards in time and
collides with the reachability algorithm to produce a plan that your brain
labels “right”. This makes rightness a derived property of
an abstract computation capable of being true (like counterfactuals),
subjunctively objective (like mathematics), and subjectively objective (like
probability). So morality is a 1-place function that quotes a huge number of
values and arguments.
Imagine a calculator which,
instead of computing the answer to the question, computes “what do I output?”
This calculator could output anything and
be correct. If it were an AI trying to maximize expected utility, it would have
a motive to modify the programmer to want something easily obtainable. Analogously
in moral philosophy, if what is “right” is a mere preference, then anything
that anyone wants is “right”. But we ourselves are like a calculator that
computes “what is 2+3?” except we don’t know our own fixed question, which is
extremely complicated and includes a thousand shards of desire. So saying “I should X” means that X answers the question,
“what will save my people, and how can we all have more fun, and how can we get
more control over our own lives, and what’s the funniest joke we can tell,
etc.?”
Some mental categories we draw are tricky
because they are primarily drawn in such a way that whether or not something
fits into that category is important information to our utility function (so
they are relevant to decision-making). Unnatural categories are formed not just by empirical structures,
but also our values themselves. Moral arguments are about redrawing the
boundaries, e.g. of a concept like “personhood”. This issue is partly why
technology creates new moral dilemmas, and why teaching morality to a computer
is so hard.
The fallacy of magical categories is thinking you can
train an AI to solve a real problem by using shallow machine-learning data that
reflect an unnatural category (i.e. your preferences). We underestimate the
complexity of our own unnatural categories. The problem of Friendly AI is one
of communication: transmitting category boundaries like “good” that can’t be
fully delineated in any training data you can give the AI during its childhood.
Generally, there are no patches or band-aids for Friendly AI!
The standard visualization of the Prisoner’s Dilemma is fake, because
neurologically intact human beings are not truly and entirely selfish – we have
evolved impulses for honor, fairness, sympathy and so on. We don’t really,
truly and entirely prefer the outcome where we defect (D) and our confederate
cooperates (C) such that we go free and he spends three years in prison. The
standard payoff matrix for player 1 and player 2 looks like this:
So in the True dilemma, (D,C)>(C,C)>(D,D) must actually hold from our
perspective. A situation where mutual cooperation doesn’t seem right is as follows: imagine that four billion human beings
have a fatal disease that can only be cured by substance S, which can only be
obtained by working with an alien paperclip maximizer from another dimension,
who wants to use substance S for paperclips.
We would feel indignant at
trading off billions of human lives for a few paperclips, yet still prefer
(C,C) to (D,D).
Empathy is how we humans predict each other’s
minds. Mirror neurons allow us to empathize with other humans, and sympathy
reinforces behavior that helps relatives and allies (which is probably why we
evolved it). And the most formidable among the human kind is not the strongest
or the smartest, but the one who can call upon the most friends. But not all
optimization processes are sympathetic and worth being friends with (e.g. AI,
some aliens, natural selection). An AI could model our minds directly, but we should
not expect it to have the human form of sympathy.
Life should not always be made easier for the
same reason that video games should not always be made easier. You need games
that are fun to play, not just fun to
win. As humans we don’t wish to be
passive blobs experiencing pleasure; we value ongoing processes like solving
challenging problems – thus subjective experience is not enough: it matters
that the journey and destination are real. We prefer real goals to fake ones;
goals that are good to pursue and not just good to satisfy. So think in terms
of eliminating low-quality work to make way for high-quality work, rather than
eliminating all challenge. There must be the true effort, true experience, and
true victory.
Pain can be much more intense than pleasure,
and it seems that we prefer empathizing with hurting, sad, and even dead
characters. Stories require conflict; otherwise we wouldn’t want to read them.
Do we want the post-Singularity world to contain no stories worth telling?
Perhaps Eutopia is not a complete
absence of pain, but the absence of systematic pointless sorrow, plus more
intense happiness and stronger minds not overbalanced by pain. Or perhaps we
should eliminate pain entirely and rewire our neurons not to feel like a life
of endless success is as painful as reading a story in which nothing ever goes
wrong. Yudkowsky prefers the former approach, but doesn’t know if it can last
in the long run.
Goodness is inseparable from an optimization
criterion or utility function; there is no ghostly essence of goodness apart
from values like life, freedom, truth, happiness, beauty, altruism, excitement,
humor, challenge, and many others. So don’t go looking for some pure essence of
goodness distinct from, you know, actual good. If we cannot take joy in things that
are merely
good, our lives shall be
empty indeed. Moreover, any future not shaped
by a goal system with detailed reliable inheritance from human morals and
metamorals will contain almost nothing of worth.
An interesting universe (that
would be incomprehensible to the universe today) is what the future looks like
if things go right. But there are a
lot of things that we value such that if we did everything else right when
building an AI, but left out that one thing, the future would end up looking
flat, pointless or empty. Merely human values do not emerge in all possible
minds, and they will not appear from nowhere to rebuke and revoke the utility
function of an expected paperclip maximizer. Value is fragile, as there is more
than one way to shatter all value, especially when we let go of the steering
wheel of the future. A universe devoid of human morals is dull moral noise,
because value is physically represented in our brains alone.
Why would evolution produce morally motivated creatures?
If evolution is stupid and cruel, how come we experience love and beauty and
hope? Because long ago, evolution coughed up minds with values and preferences,
and they were adaptive in the context of the hunter-gatherer savanna… these
things need a lawful causal story that begins somehow. A complex pattern like
love has to be explained by a cause that is not already that complex pattern.
So once upon a time, there were lovers created by something that did not love.
And because we love, so will our children’s children. Our complicated values
are the gift that we give to tomorrow.
23
|
Quantified Humanism
|
This
chapter is on the tricky question of how we should apply value theory to our ordinary moral intuitions and
decision-making. The cash value of a normative theory is how well it
translates into normative practice. When should we trust our vague, naĂŻve
snap impulses, and when should we ditch them for a more explicit, informed,
sophisticated and systematic model?
|
The human brain can’t represent large
quantities. Scope insensitivity (aka scope neglect) manifests itself when the
willingness to pay for an altruistic action is affected very little by the
scope of the action, even as exponentially more lives are at stake. For
example, studies find that an environmental measure (cleaning up polluted
lakes) that will save 200,000 birds doesn’t conjure anywhere near a hundred
times the emotional impact and willingness-to-pay of a measure that would save
2,000 birds. This may be caused by “evaluation by prototype” or “purchase of
moral satisfaction”.
Saving one human life might feel just as good
as saving the whole world, but we ought to maximize and save the world when
given the choice. Just because you have saved one life does not mean your duty
to save lives has been satisfied. Beyond the warm glow of moral satisfaction, there is a huge difference:
however valuable a life is, the whole world is billions of times as valuable.
Choosing to save one life when you could have saved more is damning yourself as
thoroughly as any murderer. To be an effective altruist you need to process
those unexciting inky zeroes on paper.
The Allais
Paradox shows that experimental subjects, when making decisions, violate
the axiom of independence (which says
that preferring X over Y implies preferring P(x) over P(y) independently of
(1-P)*(Z)) of decision theory. You are offered two pairs of gambles, each with
a choice: the first set gives you $24K with certainty or a 33/34 chance of winning $27K and 1/34 chance of nothing; the
second set gives you a 34% chance of winning $24K and 66% chance of nothing or a 33% chance of winning $27K and 67%
chance of nothing. Most people prefer 1A over 1B but also prefer 2B over 2A.
This is inconsistent, because 2A is the same as 1A with 1/3 probability, and 2B
is the same as 1B with 1/3 probability. This inconsistency turns you into a
money pump, because dynamic preference reversals can be exploited by trading
bets for a price. Beware departing the Bayesian way!
When hearing about the Allais Paradox, some
people defend their silly, incoherent preferences. Their intuition tells them
that the certainty of $24K should
count for something. But the elegance of Bayesian decision theory is not
pointless. You are a flawed piece of machinery. If you want to achieve a goal,
warm fuzzies and intuitions can lead you astray – so do the math and maximize
expected utility. Preference reversals and the certainty effect (treating a shift from 0.99 to 1 as special) make
you run around in circles. It’s not steering the future; it’s a mockery of
optimization.
Would you rather have a googolplex people get
dust specks in their eyes, which irritates their eyes a little for a fraction
of a second (the “least bad” bad thing), or one person horribly tortured for 50
years? Many people would choose dust
specks, or refuse to answer.
They’d feel indignant at anyone who suggests torture. To merely multiply
utilities seems too cold-blooded. But altruism isn’t the warm fuzzy feeling you
get from being altruistic; it’s about helping others whatever the means. When
you face a difficult and uncomfortable choice, you have to grit your teeth and choose
anyway.
Suppose a disease or war is
killing people, and you only have enough resources to implement one of the
following policies: option 1 saves 400 lives with certainty; option 2 saves 500
lives with 90% probability and no lives with 10% probability. Many people would
refuse to gamble with human lives. But if the options were (1) that 100 people
die with certainty, or (2) that there is a 90% chance that nobody dies and 10%
chance that 500 people die, the majority would choose the second option. The exact same gamble, framed differently,
causes circular preferences. People prefer certainty, and they refuse to trade
off sacred values (e.g. life) for unsacred ones. But our moral preferences
shouldn’t be circular. If policy A is better than B and B is better than C,
then A should really be better than C. If you actually want to help people,
it’s not about your feelings; just shut up and multiply!
Why should we accept that e.g. two harmful
events are worse than one? Like anything, utilitarianism
is built on intuitions; but ignoring utilitarianism leads to moral
inconsistencies (like circular preferences). Our intuitions, the underlying
cognitive tricks that we use to build our thoughts, are an indispensable part
of our cognition; but many of those intuitions are incoherent or undesirable
upon reflection. Our intuitions about how
to get to the destination are messed up due to things like scope
insensitivity, so when lives are at stake we should care more about the
destination than the journey. If you try to “renormalize” your intuition, you
wind up with what is essentially utilitarianism. So if you want to save as many
lives as possible, shut up and multiply.
As Lord Acton said, “power tends to corrupt,
and absolute power corrupts absolutely.” The possibility of unchecked power is
tempting, and can
corrupt even those who aren’t
evil mutants; perhaps for the evolutionary reason that exploiting power helped
our ancestors leave more offspring (ceteris
paribus). Hence, young revolutionaries with good intentions will execute
the adaptation for seeking power by arguing that the old power-structure is
corrupt and should be overthrown (repeating the cycle). So can you trust your
own seeming moral superiority?
The human brain is
untrustworthy hardware, so we reflectively endorse deontological principles
like “the ends don’t justify the means”. We have bizarre rules like “for the
good of the tribe, do not cheat to seize power even for the good of the tribe”,
to prevent people’s corrupted hardware from computing the idea that it would be
righteous and altruistic to seize power for themselves. Thus we can live in
peace. But superintelligent Friendly AI would follow classical decision theory,
so it may well (and rightly) kill one person to save five. In the Trolley Problem thought experiment, we
humans cannot occupy the epistemic state the thought experiment stipulates, but
a hypothetical AI might. If the ends don’t justify the means, what does? In
fact, deontological prohibitions are just consequentialist reasoning at one
meta-level up.
We evolved to feel ethically inhibited from
violating the group code, because ethically cautious individuals reproduced
more. The rare instances of punishment outweighed the value of e.g. stealing;
likewise for hurting others “for the greater good”. You can get
caught even if you think you
can get away with it. Some people justify lying by appealing to expected utility;
but maintaining lies is complex and when they collapse, you can get hurt. It is
easier to recover from honest mistakes when you have ethical constraints.
Ethics can sometimes save you from
yourself.
Ethical
injunctions (simple exceptionless
principles) against self-deception or murdering innocents are useful for
protecting you from your own cleverness when you’re tempted to do what seems
like the right thing, because you are likely to be mistaken that these are
right. When you lie, you take a black
swan bet that can blow up and undo all the good it ever did. Knowing when you are better off with a false
belief is the epistemically difficult part – it’s like trying to know which
lottery ticket will win. Do not be impressed with people who say in grave
tones, “it is rational to do unethical thing X because it will have benefit Y”.
They will abandon their ethics at the very first opportunity.
A rationalist acquires his or her powers from
having something to protect. Rationalists must value something more than
“rationality”; something more than your own life and pride must be at stake,
and you must have a desperate need to succeed. It takes something really scary
to cause you to override your intuitions with math. Only then will you grow beyond
your teachings and master the Way. In the dilemma where you can save 400 lives
with certainty or save 500 lives with
90% probability (and no lives with 10% probability), your daughter is one of
the 500 but you don’t know which one. Will you refuse to gamble with human
lives and choose the comforting feeling of certainty because you think it is
“rational” to choose certainty, or will you shut up and multiply to notice that
you have an 80% chance of saving your daughter in the first case, and 90% in
the second? Hopefully you care more about your daughter’s life than your pride
as a rationalist.
There is a chance, however remote, that novel
physics experiments could destroy the Earth. For example, some people fear that
the Large Hadron Collider (LHC) might create a black hole. They have even
assigned a 1 in 1,000,000 probability of the LHC destroying the world. Banning
novel physics experiments may be infeasible, but supposing it could be done,
would it be wise given the risk? Should we ban
physics? But these made-up
numbers give an undue air of authority. Just debate the general rule of banning
physics experiments without assigning specific probabilities.
Eliezer Yudkowsky does not
always advocate using probabilities. Don’t make up verbal probabilities using a
non-numerical procedure. Your brain is not so well-calibrated that you can pull
a number out of thin air, call it a “probability”, perform deliberate Bayesian
updates, and arrive at an accurate map. You may do better with your nonverbal
gut feeling, for example when trying to catch a ball. You have evolved
abilities to reason in the presence of uncertainty. The laws of probability
theory govern, but often we can’t compute them; don’t bother trying if it won’t
help.
A very famous problem in decision theory is Newcomb’s Problem. A superintelligence
called Omega (who can perfectly predict you)
asks you to pick a transparent box [Box A] with $1000 inside and/or an
opaque one [Box B] that has $1 million inside only if you were predicted to pick it, Box B, alone (otherwise it
contains nothing). Do you take both boxes, or only Box B? Causal decision theory (the traditional dominant view) says to take
both, because the money is already in the boxes, and Omega has already left. But
a rationalist should win the $1
million regardless of the algorithm needed, so you should take only Box B. It
may appear to causal decision theorists as if the “rational” move of two-boxing
is consistently punished, but that’s the wrong attitude to take. Rationalists
should not envy others’ choices; the winning choice is the reasonable choice. Rationalists should win. If your
particular ritual of cognition consistently fails to yield good results, change
the ritual – don’t change the definition of winning. The winning Way is
currently Bayescraft, but if it ever turns out that Bayes systematically fails
relative to a superior alternative, then it has to go out the window. Pay
attention to the money!
Interlude: The Twelve Virtues of Rationality
The first virtue is curiosity: a burning itch to relinquish
your ignorance, and to know. The second virtue is relinquishment: not flinching from experiences that might destroy
your beliefs, but evaluating beliefs before arriving at emotions. P.C. Hodgell
said, “That which can be destroyed by the truth should be.” The third virtue is
lightness: being quick to follow the
evidence wherever it leads and surrendering to the truth. Let the winds of
evidence blow you about as though you are a leaf, with no direction of your
own.
The fourth virtue is evenness: not being selective about
which arguments you inspect for flaws or attending only to favorable evidence. Use
reason, not rationalization. The fifth virtue is argument: to strive for exact honesty rather than thinking it
“fair” to balance yourself evenly between positions. Do not avoid arguing when
truth is not handed out in equal portions before a debate. The sixth virtue is empiricism: asking not which beliefs to
profess, but which experiences to anticipate. Base your knowledge in the roots
of observation and the fruits of prediction.
The seventh virtue is simplicity: keeping additional
specifications to a minimum. Each additional detail is another chance for the
belief to be wrong or the plan to fail. The eighth virtue is humility: to take specific actions in
anticipation of your own errors in your beliefs and plans. You are fallible,
but do not boast of modesty. The ninth virtue is perfectionism: always seeking to do better so that you do not halt
before taking your first steps, and settling for no less than the answer that
is perfectly right. The more errors you correct in yourself, the more you
notice, and the more you can advance.
The tenth virtue is precision: to shift your beliefs by
exactly the right amount in response to each piece of evidence, in accordance
with probability theory. Narrower statements expose themselves to a stricter
test, and are more useful. The eleventh virtue is scholarship: studying many sciences and absorbing their power as
your own, especially those that impinge upon rationality. Consume fields like
decision theory, psychology, and more. Before these eleven virtues is the nameless virtue: to above all carry your
map through to reflecting the territory, not by asking whether it is “the Way”
to do this or that, or by believing the words of the Great Teacher, but by
asking whether the sky is blue or green. Every step of your reasoning must cut
through to the correct answer. If you fail to achieve a correct answer, it is
futile to protest that you acted with propriety.
These then are twelve virtues
of rationality: curiosity, relinquishment, lightness, evenness, argument,
empiricism, simplicity, humility, perfectionism, precision, scholarship, and
the void.
Comments
Post a Comment