Rationality
Added 23 March 2018: See also my summary and notes here on Stuart Sutherland's "Irrationality" for an introduction to similar content.
This is a summary of Rationality: From AI to Zombies, which is a book-form compilation of Eliezer Yudkowsky's Less Wrong Sequences of 2006-2009. LessWrong.com is a "community blog devoted to refining the art of human rationality."
The PDF version can be found at this link. The HTML version is below the line.
Preface
Contents
Glossary
This is a summary of Rationality: From AI to Zombies, which is a book-form compilation of Eliezer Yudkowsky's Less Wrong Sequences of 2006-2009. LessWrong.com is a "community blog devoted to refining the art of human rationality."
The PDF version can be found at this link. The HTML version is below the line.
Rationality: Abridged
The writings of
Eliezer Yudkowsky
Summarized by
Quaerendo
Preface
THE PURPOSE OF this book is to summarize
Eliezer Yudkowksy’s Sequences in
order to provide a shorter and more accessible introduction to the foundational
ideas of the rationality community, and to serve as a common reference point. The
original Sequences were the result of
two years of daily blog posts on OvercomingBias.com
(founded by his co-blogger economist Robin Hanson) and later LessWrong.com from 2006 to 2009, dealing
with the topics of rational belief and decision-making and the underlying
sciences, mathematics and philosophy. In 2015, an edited and reorganized
version of the sequences was published as “Rationality: From AI to Zombies”,
which consists of six books worth of essays, plus introductions by the editor,
Rob Bensinger.
It is a must-read for those
who want to improve their thinking and fulfill their goals. RAZ serves both as
an introduction to thinking about thinking, and a resource for people
interested in digging deeper into epistemology, metacognition, and how to be less
wrong. It discusses the math of probability theory and decision theory, and the
science of cognitive and social psychology and behavioral economics, which have
exposed dozens of systematic flaws in human reasoning. It is one of the best
places to start for people who are new to Less
Wrong.
Eliezer Yudkowsky works as a decision theorist and
researcher at the Machine Intelligence Research Institute (MIRI), a nonprofit with
the mission of ensuring that smarter-than-human artificial intelligence has a
positive impact. The findings of cognitive science and ideas of naturalistic
philosophy explained in RAZ help to motivate why MIRI’s research program
exists. In addition to co-founding MIRI and Less Wrong, Yudkowsky has also
written the fanfic “Harry Potter and the Methods of Rationality”, which is
often cited as the most popular work of Harry
Potter fanfiction.
Yudkowsky was moved to write these essays by his
professional challenges in AI theory and his own philosophical mistakes, but he
has also inspired a wider community of intellectuals and lifehackers on Less Wrong, which in turn helped seed the
effective altruism movement (an effort to identify the most high-impact humanitarian
charities and causes) as well as the establishment of the Center for Applied
Rationality (CFAR), a nonprofit organization that aims to translate the science
of rationality into useful techniques for self-improvement.
With the benefit of hindsight, Yudkowsky would have
prioritized writing about how to practice the skills without knowing the
theory, and how people can do better in their everyday lives; he would have
focused more on rational action, not just rational belief; he would have better
organized the content; and he would have written more courteously. Not doing
these was a mistake. Nevertheless, he still believes his two years of blog
posting was better than nothing, and that glimpsing the rhythm of this valuable
way of thinking has helped a surprising number of people a surprising amount.
I hope that this summary can
extend the reach of the Sequences even further. A summary was needed because
the original series of essays was way too long for some people, especially for
newcomers to the rationality community who wanted an overview of the core
propositions but didn’t have a lot of time available. Furthermore, the existing
summaries out there tend to be too short for the unacquainted to fully
comprehend the ideas or place them in proper context. Therefore what makes this
summary different, is that it is not overly
short, and may even be redundant at times (deliberately so, because
explaining one point from different angles can help different people understand
it better). Of course, it can be useful not just for newcomers but also for
older members of the community, to refresh their memories and identify points
of departure.
In that spirit, this summary
includes all 333 blog posts from RAZ in addition to the supplemental interludes
and introductions by Rob Bensinger that are often left out of other summaries.
It also sometimes links to other essays that did not make it into RAZ, as a way
of providing additional background. Pictures and illustrations have also been
reproduced here. At the end, a partial glossary is appended. By clicking the
headline of a section (green links), you can read the full article on Less Wrong to dive in deeper, and to
read or leave comments.
Each one of the 26 Sequences
is summarized across two or three pages on average. While the result is
something like a short book in itself, it is still significantly shorter than
the 1,750 pages of Rationality: From AI
to Zombies. Hence the title: “Rationality: Abridged”.
Finally, and needless to say,
the ideas contained herein are not my own, so I don’t take credit for them.
Michael (alias
Quaerendo)
January, 2018
January, 2018
Note: Since this page would otherwise have been too long, I have divided it up into six separate entries.
Contents
Preface
Contents
Part I: Map and Territory
Predictably Wrong
Fake Beliefs
Noticing Confusion
Mysterious Answers
Interlude: The Simple Truth
Part II: How to Actually
Change Your Mind
Overly Convenient Excuses
Politics and Rationality
Against Rationalization
Against Doublethink
Seeing with Fresh Eyes
Death Spirals
Letting Go
Interlude: The Power of
Intelligence
Part III: The Machine in the
Ghost
The Simple Math of Evolution
Fragile Purposes
A Human’s Guide to Words
Interlude: An Intuitive
Explanation of Bayes’s Theorem
Part IV: Mere Reality
Lawful Truth
Reductionism 101
Joy in the Merely Real
Physicalism 201
Quantum Physics and Many
Worlds
Science and Rationality
Interlude: A Technical
Explanation of Technical Explanation
Part V: Mere Goodness
Fake Preferences
Value Theory
Quantified Humanism
Interlude: The Twelve Virtues
of Rationality
Part VI: Becoming Stronger
Yudkowsky’s Coming of Age
Challenging the Difficult
The Craft and the Community
Glossary
§ a priori.
A proposition that is reasonable to believe even without any experiential
evidence. A priori claims are in some
way introspectively self-evident, or justifiable using only abstract reasoning.
Pure mathematics is often claimed to be a
priori, while scientific knowledge is claimed to be a posteriori, or dependent on (sensory) experience. These two terms
shouldn’t be confused with prior and posterior probabilities.
§ affective death spiral.
A halo effect that perpetuates and exacerbates itself over time.
§ AI-Box Experiment.
A demonstration by Yudkowsky that people tend to overestimate how hard it is to
manipulate human beings, and therefore underestimate the dangers of building an
Unfriendly AI that can only interact with its environment through verbal
communication. One participant plays the role of an AI, while another plays a
human whose job it is to interact with the AI without voluntarily releasing it
from its “box”. Yudkowsky and a few other people who have role-played the AI
have succeeded in getting the human supervisor to agree to release them, which
suggests that a superhuman intelligence would have an even easier time
escaping.
§ algorithm.
A specific procedure for computing some function. A mathematical object
consisting of a finite, well-defined sequence of steps that concludes with some
output determined by its initial input. Multiple physical systems can simultaneously
instantiate the same algorithm.
§ amplitude.
A quantity in a configuration space, represented by a complex number.
Amplitudes are physical, not abstract or formal. The complex number’s modulus
squared (i.e. its absolute value multiplied by itself) yields the Born
probabilities, but we don’t know why.
§ anchoring.
The cognitive bias of relying excessively on initial information after
receiving relevant new information.
§ anthropomorphism.
The tendency to assign human qualities to non-human phenomena or objects.
§ beisutsukai.
Japanese for "Bayes user." A fictional order of high-level
rationalists, also known as the Bayesian Conspiracy.
§ bias.
(a) A cognitive bias. In Rationality:
From AI to Zombies, this is the default meaning. (b) A statistical bias.
(c) An inductive bias. (d) Colloquially: prejudice or unfairness.
§ black box.
Any process whose inner workings are mysterious or poorly understood.
§ bucket.
See “pebble and bucket.”
§ comparative advantage.
An ability to produce something at a lower cost than someone else could. This
is not the same as having an absolute
advantage; you may be a better cook in general than someone, but that person
will still have a comparative advantage over you at cooking some specific dishes.
Your cooking skills make your time more valuable. The worse cook may have a
comparative advantage at baking bread, for example, since it doesn’t cost them
much to spend a lot of time on baking, whereas you could be spending that time
creating a large number of high-quality dishes. Baking bread is more costly for
the good cook than for the bad cook because the good cook is paying a larger
opportunity cost (by giving up more valuable opportunities to be doing other
things).
§ conjunction.
A compound sentence asserting two or more distinct things, such as "A and B" or "A even though B." The conjunction
fallacy is the tendency to count some conjunctions as more probable than their
components even though they can’t be more probable (and are almost always less
probable).
§ decision theory.
(a) The mathematical study of correct decision-making in general, abstracted
from an agent's particular beliefs, goals, or capabilities. (b) A well-defined
general-purpose procedure for arriving at decisions, e.g. causal decision
theory.
§ entanglement.
(a) Causal correlation between two things. (b) In quantum physics, the mutual
dependence of two particles' states upon one another. Entanglement in this
sense occurs when a quantum amplitude distribution cannot be factorized.
§ entropy.
(a) In thermodynamics, the number of different ways a physical state may be
produced (its Boltzmann entropy). For example, a slightly shuffled card deck
has lower entropy than a fully shuffled one, because there are many more
configurations a fully shuffled deck is likely to end up in. (b) In information
theory, the expected value of the information contained in a message (its
Shannon entropy). A random variable’s Shannon entropy is how many bits of
information one would be missing (on average) if one did not know the
variable’s value. Boltzmann entropy and Shannon entropy have turned out to be
equivalent: a system’s thermodynamic disorder corresponds to the number of bits
needed to fully characterize it.
§ epistemic.
Concerning knowledge.
§ eutopia.
Yudkowsky’s term for a utopia that’s actually nice to live in, as opposed to
one that’s unpleasant or infeasible.
§ evolution.
(a) In biology, change in a population’s heritable features. (b) In other
fields, change of any sort.
§ expected utility.
A measure of how much an agent’s goals will tend to be satisfied by some
decision, given uncertainty about the decision’s outcome. Accepting a 5% chance
of winning a million dollars will usually leave you poorer than accepting a 100%
chance of winning one dollar, because nine times out of ten, the certain
one-dollar gamble has higher actual utility. Nevertheless, we say that the 10%
shot at a million dollars is better because it has higher expected utility in all cases.
§ fitness.
See “inclusive fitness.”
§ formalism.
A specific way of logically or mathematically representing something.
§ function.
A relation between inputs and outputs such that every input has exactly one
output. A mapping between two sets in which every element in the first set is
assigned a single specific element from the second.
§ graph.
In graph theory, a mathematical object consisting of simple atomic objects
(vertices, or nodes) connected by lines (edges) or arrows (arcs).
§ happy death spiral.
See “affective death spiral.”
§ hedonic.
Concerning pleasure.
§ heuristic.
An imperfect method for achieving some goal. A useful approximation. Cognitive
heuristics are innate, humanly universal brain heuristics.
§ inclusive fitness.
The degree to which a gene causes more copies of itself to exist in the next
generation. Inclusive fitness is the property propagated by natural selection.
Unlike individual fitness, which is a specific organism’s tendency to promote
more copies of its genes, inclusive fitness is held by the genes themselves.
Inclusive fitness can sometimes be increased at the expense of the individual
organism’s overall fitness.
§ instrumental value.
A goal that is only pursued in order to further some other goal.
§ Machine Intelligence Research
Institute. A small non-profit organization that works on
mathematical research related to Friendly AI. Yudkowsky co-founded MIRI in
2000, and is the senior researcher there.
§ magisterium.
Stephen J. Gould’s term for a domain where some community or field has
authority. Gould claimed that science and religion were separate and
non-overlapping magisteria, meaning that religion has authority to answer
questions of “ultimate meaning and moral value” (but not empirical fact) and
science has authority to answer questions of empirical fact (but not meaning or
value).
§ map and territory.
A metaphor for the relationship between beliefs (or other mental states) and
the real-world things they purportedly refer to.
§ Maxwell’s equations.
In classical physics, a set of differential equations that model the behavior
of electromagnetic fields.
§ meme.
Richard Dawkins’s term for a thought that can be spread through social
networks.
§ meta level.
A domain that is more abstract or derivative than some domain it depends on (the
"object level"). A conversation can be said to operate on a meta
level when it switches from discussing a set of simple or concrete objects to
discussing higher-order or indirect features of those objects.
§ metaethics.
A theory about what it means for ethical statements to be correct, or the study
of such theories. Whereas applied ethics speaks to questions like "Is
murder wrong?" and "How can we reduce the number of murders?",
metaethics speaks to questions like "What does it mean for something to be
wrong?" and "How can we generally distinguish right from wrong?"
§ Minimum Message Length
Principle. A formalization of Occam’s Razor that judges
the probability of a hypothesis based on how long it would take to communicate
the hypothesis plus the available data. Simpler hypotheses are favored, as are
hypotheses that can be used to concisely encode the data.
§ MIRI.
See “Machine Intelligence Research
Institute.”
§ money pump.
A person who is irrationally willing to accept sequences of trades that add up
to an expected loss.
§ motivated cognition.
Reasoning and perception that is driven by some goal or emotion of the reasoner
that is at odds with accuracy. Examples of this include non-evidence-based
inclinations to reject a claim (motivated
skepticism), to believe a claim (motivated
credulity), to continue evaluating an issue (motivated continuation), or to stop evaluating an issue (motivated stopping).
§ mutual information.
For two variables, the amount that knowing about one variable tells you about
the other's value. If two variables have zero mutual information, then they are
independent; knowing the value of one does nothing to reduce uncertainty about
the other.
§ nanotechnology.
Technologies based on the fine-grained control of matter on a scale of
molecules, or smaller. If known physical law (or the machinery inside
biological cells) is any guide, it should be possible in the future to design
nanotechnological devices that are much faster and more powerful than any
extant machine.
§ natural selection.
The process by which heritable biological traits change in frequency due to
their effect on how much their bearers reproduce.
§ negentropy.
Negative entropy. A useful concept because it allows one to think of
thermodynamic regularity as a limited resource one can possess and make use of,
rather than as a mere absence of entropy.
§ Newcomb’s Problem.
A central problem in decision theory. Imagine an agent that can predict your
decisions in advance decides to either fill two boxes with money, or fill one
box, based on their prediction. They put $1000 in a transparent box no matter
what, and they then put $1 million in an opaque box if (and only if) they
predicted that you’d only take the opaque box. The predictor tells you about
this, and then leaves. Which do you pick? If you take both boxes, you get only
the $1000, because the predictor foresaw your choice and didn’t fill the opaque
box. On the other hand, if you only take the opaque box, you leave with $1
million. So it seems like you should take only the opaque box. However, many
people object to this strategy on the grounds that you can’t causally control
what the predictor did in the past; the predictor has already made their
decision at the time when you make yours, and regardless of whether or not they
placed the $1 million in the opaque box, you’ll be throwing away a free $1000
if you choose not to take it. This view that we should take both boxes is
prescribed by causal decision theory, which (for much the same reason)
prescribes defecting in Prisoner’s Dilemmas (even if you’re playing against a
perfect atom-by-atom copy of yourself).
§ normality.
(a) What’s commonplace. (b) What’s expected, prosaic, and unsurprising.
Categorizing things as “normal” or weird” can cause one to conflate these two
definitions, as though something must be inherently
extraordinary or unusual just because one finds it surprising or difficult to
predict. This is an example of confusing a feature of mental maps with a
feature of the territory.
§ normativity.
A generalization of morality to include other desirable behaviors and outcomes.
If it would be prudent and healthy and generally a good idea for me to go
jogging, then there is a sense in which I should
go jogging, even if I’m not morally obliged to do so. Prescriptions about what
one ought to do are normative, even when the kind of “ought” involved isn’t
moral or interpersonal.
§ object level.
A domain that is relatively concrete -- e.g. the topic of a conversation, or
the target of an action. One might call one’s belief that murder is wrong
"object-level" to contrast it with a meta-level belief about moral
beliefs, or about the reason murder is wrong, or about something else that
pertains to murder in a relatively abstract and indirect way.
§ objective.
(a) Remaining real or true regardless of what one’s opinions or other mental
states are. (b) Conforming to generally applicable moral or epistemic norms
(e.g. fairness or truth) rather than to one’s biases or idiosyncrasies. (c)
Perceived or acted on by an agent. (d) A goal.
§ Objectivism.
A philosophy and social movement invented by Ayn Rand, known for promoting
self-interest and laissez-faire capitalism as “rational.”
§ Occam’s Razor.
The principle that, all else being equal, a simpler claim is more probable than
a relatively complicated one. Formalizations of Occam’s Razor include
Solomonoff induction and the Minimum Message Length Principle.
§ odds ratio.
A way of representing how likely two events are relative to each other. For
example, if I have no information about which day of the week it is, the odds
are 1:6 that it’s Sunday. If x:y is the odds ratio, the probability of x is x /
(x + y); so the prior probability that it’s Sunday is 1/7. Likewise, if P is my
probability and I want to convert it into an odds ratio, I can just write P :
(1 - P). For a percent probability, this becomes P : (100 - P). If my
probability of winning a race is 40%, my odds are 40:60, which can also be
written 2:3. Odds ratios are useful because they are easy to update. If I
notice that the mall is closing early, and that’s twice as likely to happen on
a Sunday as it is on a non-Sunday (a likelihood ratio of 2:1), I can simply
multiply the left and right sides of my prior that it’s Sunday (1:6) by the
evidence’s likelihood ratio (2:1) to arrive at a correct posterior probability
of 2:6, or 1:3. This means that if I guess it’s Sunday, I should expect to be
right 1/4 of the time -- 1 time for every 3 times I’m wrong. This is usually faster
to calculate than Bayes’s Rule for real-numbered probabilities.
§ Omega.
A hypothetical arbitrarily powerful agent used in various thought experiments.
§ ontology.
An account of the things that exist, especially one that focuses on their most
basic and general similarities. Things are “ontologically distinct” if they are
of two fundamentally different kinds.
§ optimization process.
Yudkowsky’s term for an agent or agent-like phenomenon that produces
surprisingly specific (e.g. rare or complex) physical structures. A
generalization of the idea of efficiency and effectiveness, or “intelligence.”
The formation of water molecules and planets isn’t “surprisingly specific” in
this context, because it follows in a relatively simple and direct way from
garden-variety particle physics. For similar reasons, the existence of rivers
does not seem to call for a particularly high-level or unusual explanation. On
the other hand, the existence of trees seems too complicated for us to usefully
explain it without appealing to an optimization process such as evolution.
Likewise, the arrangement of wood into a well-designed dam seems too
complicated to usefully explain without appealing to an optimization process
such as a human, or a beaver.
§ Overcoming Bias.
The blog where Yudkowsky originally wrote most of the content of Rationality: From AI to Zombies. It can
be found at [www.overcomingbias.com], where it now functions as the personal
blog of Yudkowsky’s co-blogger, Robin Hanson. Most of Yudkowsky’s writing is
now hosted on the community blog Less
Wrong.
§ pebble and bucket.
An example of a system for mapping reality, analogous to memory or belief. One
picks some variable in the world, and places pebbles in the bucket when the
variable’s value (or one’s evidence for its value) changes. The point of this
illustrative example is that the mechanism is very simple, yet achieves many of
the same goals as properties that see heated philosophical debate, such as
perception, truth, knowledge, meaning, and reference.
§ phase space.
A mathematical representation of physical systems in which each axis of the
space is a degree of freedom (a property of the system that must be specified
independently) and each point is a possible state.
§ phlogiston.
A substance hypothesized in the 17th century to explain phenomena
such as fire and rust. Combustible objects were thought by late alchemists and
early chemists to contain phlogiston, which evaporated during combustion.
§ photon.
An elementary particle of light.
§ physicalism.
The belief that all mental phenomena can in principle be reduced to physical
phenomena. May also be referred to as “materialism”.
§ positive bias.
Bias toward noticing what a theory predicts you’ll see instead of noticing what
a theory predicts you won’t see.
§ possible world.
A way the world could have been. One can say “there is a possible world in which
Hitler won World War II” instead of “Hitler could have won World War II,”
making it easier to contrast the features of multiple hypothetical or
counterfactual scenarios. Not to be confused with the worlds of the many-worlds
interpretation of quantum physics or Max Tegmark's Mathematical Universe
Hypothesis, which are claimed (by their proponents) to be actual.
§ posterior probability.
An agent’s beliefs after acquiring evidence. Contrasted with its prior beliefs,
or priors.
§ prior probability.
An agent’s information -- beliefs, expectations, etc. -- before acquiring some
evidence. The agent’s beliefs after processing the evidence are its posterior
probability.
§ Prisoner’s Dilemma.
A game in which each player can choose to either "cooperate" or
"defect" with the other. The best outcome for each player is to defect
while the other cooperates, and the worst outcome is to cooperate while the
other defects. Mutual cooperation is second-best, and mutual defection is
second-worst. On conventional analyses, this means that defection is always the
correct move: it improves your reward if the other player independently
cooperates, and it lessens your loss if the other player independently defects.
This leads to the pessimistic conclusion that many real-world conflicts that
resemble Prisoner’s Dilemmas will inevitably end in mutual defection even
though both players would be better off if they could find a way to force
themselves to mutually cooperate. A minority of game theorists argue that
mutual cooperation is possible even when the players cannot coordinate,
provided that the players are both rational and both know that they are both
rational. This is because two rational players in symmetric situations should
pick the same option; so each player knows that the other player will cooperate
if they cooperate, and will defect if they defect.
§ probability.
A number representing how likely a statement is to be true. Bayesians favor
using the mathematics of probability to describe and prescribe subjective
states of belief, whereas frequentists generally favor restricting probability
to objective frequencies of events.
§ probability theory.
The branch of mathematics concerned with defining statistical truths and
quantifying uncertainty.
§ problem of induction.
In philosophy, the question of how we can justifiably assert that the future
will resemble the past (scientific induction) without relying on evidence that
presupposes that very fact.
§ proposition.
Something that is either true or false. Commands, requests, questions, cheers,
and excessively vague or ambiguous assertions are not propositions in this
strict sense. Some philosophers identify propositions with sets of possible
worlds. They think of propositions like “snow is white” not as particular
patterns of ink in books, but rather as the thing held in common by all
logically consistent scenarios featuring white snow. This is one way of
abstracting away from how sentences are worded, what language they are in etc.
and merely discussing what makes the sentences true or false.
§ quantum mechanics.
The branch of physics that studies subatomic phenomena and their nonclassical
implications for larger structures, and the mathematical formalisms used by
physicists to predict such phenomena. Although the predictive value of such
formalisms is extraordinarily well-established experimentally, physicists
continue to debate how to incorporate gravitation into quantum mechanics,
whether there are more fundamental patterns underlying quantum phenomena, and
why the formalisms require a “Born rule” to relate the deterministic evolution
of the wavefunction under Schrödinger’s equation to observed experimental
outcomes. Related to the last question is a controversy in philosophy of
physics over the physical significance of quantum-mechanical concepts like
“wavefunction”, for instance whether this mathematical structure in some sense
exists objectively, or whether it is merely a convenience for calculation.
§ quark.
An elementary particle of matter.
§ rationalist.
A person interested in rationality, especially one who is attempting to use new
insights from psychology and the formal sciences to become more rational.
§ rationality.
The property of employing useful cognitive procedures. Making systematically
good decisions (instrumental rationality)
based on systematically accurate beliefs (epistemic
rationality).
§ recursion.
A sequence of similar actions that each build on the result of the previous
action.
§ reduction.
An explanation of a phenomenon in terms of its origin or parts, especially one
that allows you to re-describe the phenomenon without appeal to your previous
conception of it.
§ reductionism.
(a) The practice of scientifically reducing complex phenomena to simpler
underpinnings. (b) The belief that such reductions are generally possible.
§ representativeness heuristic.
A cognitive heuristic where one judges the probability of an event based on how
well it matches some mental prototype or stereotype.
§ Schrödinger equation.
A fairly simple partial differential equation that defines how quantum
wavefunctions evolve over time. This equation is deterministic. It is not known
why the Born Rule, which converts the wavefunction into an experimental
prediction, is probabilistic; though there have been many attempts to make
headway on that question.
§ scope insensitivity.
A cognitive bias where large changes in an important value have little or no
effect on one's behavior.
§ screening off.
Making something informationally irrelevant. A piece of evidence A screens off
a piece of evidence B from a hypothesis C if, once you know about A, learning
about B doesn’t affect the probability of C.
§ search tree.
A graph with a root node that branches into child nodes, which can then either
terminate or branch once more. The tree data structure is used to locate
values. In chess, for example, each node can represent a move, which branches
into the other player’s possible responses, and searching the tree is intended
to locate winning sequences of moves.
§ separate magisteria.
See “magisterium.”
§ sequences.
Yudkowsky’s name for short series of thematically linked blog posts or essays.
§ Singularity.
One of several claims about a radical future increase in technological
advancement. Kurzweil’s “accelerating change” singularity claims that there is
a general, unavoidable tendency for technology to improve faster and faster.
Vinge’s “event horizon” singularity claims that intelligences will develop that
are too advanced for humans to model. Yudkowsky’s “intelligence explosion”
singularity claims that self-improving AI will improve its own ability to
self-improve, thereby rapidly achieving superintelligence. These claims are
often confused with one another.
§ Solomonoff induction.
An attempted definition of optimal (albeit computationally infeasible)
reasoning. A combination of Bayesian updating with a simplicity prior that
assigns less probability to percept-generating programs the longer they are.
§ subjective.
(a) Conscious, experiential. (b) Dependent on the particular distinguishing
features of agents, for example mental states. (c) Playing favorites,
disregarding others’ knowledge or preferences, or otherwise violating some norm
as a result of personal biases. Importantly, something can be subjective in
sense (a) or (b) without being subjective in sense (c). For example, one’s ice
cream preferences and childhood memories are “subjective” in a perfectly
healthy sense.
§ superintelligence.
An agent much smarter (more intellectually resourceful, rational, etc.) than
present-day humans. This can be a purely hypothetical agent or it can be a
predicted future technology.
§ System 1.
The brain’s fast, automatic, emotional, and intuitive judgments.
§ System 2.
The brain’s slow, deliberative, reflective, and intellectual judgments.
§ Taboo.
A game by Hasbro where you try to get teammates to guess what word you have in
mind while avoiding conventional ways of communicating it. Yudkowsky uses this
as an analogy for the rationalist skill of linking words to the concrete
evidence you use to decide when to apply them. Ideally, one should be know what
one is saying well enough to paraphrase the message in several different ways,
and to replace abstract generalizations with concrete observations.
§ terminal value.
A goal that is pursued for its own sake, and not just to further some other
goal.
§ territory.
See “map and territory.”
§ theorem.
A statement that has been mathematically or logically proven.
§ Traditional Rationality.
Yudkowsky’s term for the scientific norms and conventions espoused by thinkers
like Richard Feynman, Thomas Kuhn, Karl Popper, Carl Sagan, Martin Gardner, and
Charles S. Peirce. Yudkowsky contrasts this with the ideas of rationality in
contemporary mathematics and cognitive science.
§ truth-value.
A proposition’s truth or falsity. True statements and false statements have
truth-values, but questions, imperatives, and strings of gibberish do not.
“Value” is meant here in a mathematical sense, not a moral one.
§ Unfriendly AI.
A hypothetical smarter-than-human artificial intelligence that causes a global
catastrophe by pursuing a goal without regard for humanity’s well-being.
Yudkowsky predicts that superintelligent AI will be “Unfriendly” by default,
unless a special effort goes into researching how to give AI stable, known,
humane goals. Unfriendliness doesn’t imply malice, anger, or other human
characteristics. A completely impersonal optimization process can be
“Unfriendly” even if its only goal is to make paperclips. This is because even
a goal as innocent as “maximize the expected number of paperclips” could
motivate an AI to treat humans as competitors for physical resources, or as
threats to the AI’s aspirations.
§ updating.
Revising one’s beliefs in light of new evidence. If the updating is
epistemically rational (i.e. it follows the follows the rules of probability
theory) then it counts as Bayesian inference.
§ utilitarianism.
An ethical theory asserting that one should act in a way that causes the most net
benefit to people. Standard utilitarianism argues that acts can be justified even
if they are morally counterintuitive and harmful, provided that the benefit
outweighs the harm.
§ utility.
The amount some outcome satisfies a set of goals, as defined by a utility
function.
§ utility function.
A function that ranks outcomes by how well they satisfy some set of goals.
§ utility maximizer.
An agent that always picks actions with better outcomes over ones with worse
outcomes (relative to its utility function). An expected utility maximizer is more realistic, given that real-world
agents must deal with ignorance and uncertainty. It picks the actions that are
likeliest to maximize its utility, given the available evidence. An expected
utility maximizer’s decisions would sometimes be suboptimal in hindsight or
from an omniscient perspective; but they won’t be foreseeably inferior to any
alternative decision given the agent’s available evidence. Humans can sometimes
be usefully modeled as expected utility maximizers with a consistent utility
function, but this is at best an approximation, since humans are not perfectly
rational.
§ utilon.
Yudkowsky’s name for a unit of utility, i.e. something that satisfies a goal.
The term is deliberately vague, to permit discussion of desired and desirable
things without relying on imperfect proxies such as monetary value and
self-reported happiness.
§ wavefunction.
A complex-valued function used in quantum mechanics to explain and predict the
wave-like behavior of physical systems at small scales. Realists about the
wavefunction treat it as a good characterization of the way the world really
is, more fundamental than earlier atomic models. Anti-realists disagree,
although they grant that the wavefunction is a useful tool by virtue of its
mathematical relationship to observed properties of particles (the Born Rule).
§ winning.
Yudkowsky’s term for getting what you want. The result of instrumental
rationality.
§ zombie.
In philosophy, a perfect atom-by-atom replica of a human that lacks a human’s
subjective awareness. Zombies behave exactly like humans, but they lack
consciousness. Some philosophers argue that the idea of zombies is coherent --
that zombies, although not real, are at least logically possible. They conclude
from this that facts about first-person consciousness are logically independent
of physical facts, and that our world breaks down into both physical and
nonphysical components. Most philosophers reject the idea that zombies are
logically possible, though the topic continues to be actively debated.
❖
Comments
Post a Comment