Optimized Hope, Or, The Wisdom of the Algorithms

Book Review:

Brian Christian & Tom Griffiths, "Algorithms to Live By: The Computer Science of Human Decisions", William Collins, 2016.


After finishing "Decisive" by Chip and Dan Heath, which prescribes a somewhat algorithmic approach to making decisions using the WRAP process, I started reading "Algorithms to Live By". This is a book about what humans can learn from computers, specifically from computer algorithms -- i.e. sequences of steps that are used to solve problems. The problems faced by computer scientists, like how to allocate processing power, when to switch between different tasks, how to use memory resources, and when to collect more data, have parallels in the everyday problems of human thought and action and interaction. As such, the computer scientist's solutions to these problems (whether optimal or merely approximate) can give us some wisdom. Sometimes that wisdom is bittersweet, for instance:
"Life is full of problems that are, quite simply, hard. And the mistakes made by people often say more about the intrinsic difficulties of the problem than about the fallibility of human brains."
As is noted in the introduction of the book, Brian Christian is a writer who studied computer science and philosophy, while Tom Griffiths is a professor in psychology and statistics. Together, they set out to gain "not only a set of concrete takeaways for the problems around us, not only a new way to see the elegant structures behind even the hairiest human dilemmas, not only a recognition of the travails of humans and computers as deeply conjoined, but something even more profound: a new vocabulary for the world around us, and a chance to learn something truly new about ourselves."

What follows is my summary of each chapter, with some concluding evaluative remarks.

***

Chapter 1 is about optimal stopping problems. When searching for a love relationship, or a secretary to hire, or a place to park your car, or even someone to sell your house to, you face an all-important question: how much time should you spend searching before you commit or quit? How many options should you consider before choosing? Well, computer science can tell you the answer: look at the first 37% (or any percentage between 35-40% for an approximation) of candidates in the pool without choosing, and then pick the first one that is better than the best you've seen yet. This way, you won't stop too early or too late (on average), because you maximize the probability of getting the best option -- which happens to also be 37% in this case (which may seem low, but it's a lot better than picking randomly). Fortunately, if you want to hire someone based on an objective criterion, your chance of hiring the best applicant is higher (around 58%) if you immediately accept an applicant if they score above a certain percentile (say the 95th percentile if it is the first applicant you look at, and the 50th percentile if it's the second-to-last applicant you look at). Of course, these algorithms change when you allow the possibility to recall previous applicants, or the possibility that you are rejected. In reality, people tend to stop early due to the endogenous time cost of searching.

The second chapter is connected to the first in that it also deals with the way we spend our time: when should we try out new things (explore) and when should we savor our familiar favorites (exploit)?  This trade-off is relevant to choosing a restaurant, making new friends or spending time with old friends, listening to music, and even concerns drug companies (how much money to invest in R&D to discover new drugs) and big tech firms (using A/B testing to maximize click-through rates). The optimal balance between exploring and exploiting depends on the interval over which you are choosing, and in general, the value of exploration goes down over time while the value of exploitation goes up over time. In formal terms, this is known as the multi-armed bandit problem, which is named after the dilemma of picking which slot machine to play in a casino. The solution is given by the Gittins index, which is the guaranteed payout rate that you would prefer to playing the slot machine again (assuming geometric discounting and no switching costs). The Gittins index tells us that testing a completely unknown machine can be more valuable than pulling the handle on a machine that you know pays out seven times out of ten! Alternatively, the Upper Confidence Bound algorithm can be used to minimize your amount of regret, by choosing the option for which the top of the confidence interval is highest (i.e. the highest value that the option could reasonably have based on the information available so far). Both the Gittins index and Upper Confidence Bound are higher than the expected value initially, but converge to the expected value in the long run. One implication of these algorithms is that doctors can improve outcomes for patients who are partaking in a clinical trial as they gain information about which option is better (so-called "adaptive design trials"). Another implication is that exploration becomes even more important when the world is constantly changing -- unless you are near the end of your life, in which case it makes sense to focus on your most meaningful existing social connections and savor your favorite restaurants.

In Chapter 3, Christian and Griffiths deal with sorting algorithms, used in computers to organize information and used by search engines like Google to show you the most relevant web pages. In our daily lives, we use sorting to arrange our bookshelves in alphabetical order, to sort our socks when doing laundry, and to rank sports teams. The key challenge of sorting is that the more items you have to sort, the more time it takes per item (i.e. the opposite of economies of scale). In computer science, Big-O notation is used to measure the worst-case performance of an algorithm as a function of the size of the problem and the algorithm's running time. For example, O(1) is "constant time", meaning that it always takes the same amount of time. O(n) is "linear time", meaning that when n doubles, the performance time also doubles. O(n^2) is "quadratic time", meaning that when n doubles, the time required quadruples. There is also O(2^n) or "exponential time", where each additional n doubles the amount of work, and O(n!) or "factorial time", which is too horrifying to talk about. There are a number of common sorting algorithms:
  • Bubble Sort, which runs in quadratic time: you run down a list and for each pair that is out-of-order, you switch them around, and then continue scanning the list. Repeat until the whole list is in order.
  • Insertion Sort, which is slightly faster than Bubble Sort but still in quadratic time: choose one item at random, then pick another one and insert it either before or after the first one. Repeat for each additional item, finding the right spot to insert it.
  • Mergesort, which runs in "linearithmic time" O(n log n), which lies in between linear and quadratic time: you divide the list to be sorted, then sort each stack, and then merge the stacks back together. This is the best way to achieve a fully ordered set.
  • Bucket Sort, which can run in linear time if you don't need a fully ordered set or item-to-item comparisons: you just group your list of n items into m buckets, where each bucket is simply a chunk of unsorted data.
Basically, if you want to alphabetize your bookshelf, avoid Bubble Sort; instead, get some friends to help you do a Mergesort, or combine Bucket Sort with Insertion Sort. However, you probably shouldn't bother sorting your bookshelf in the first place! This is because there is a trade-off between sorting and searching. The purpose of sorting is to save you time later on when you need to search for something, but the time you spend organizing your books is rarely worth it. Likewise in competitive sports, the purpose is to a large extent to enjoy the games -- not necessarily to find the champion in the minimum number of games (in which case you'd use Single Elimination tournaments, which run in linear time, rather than the quadratic time of the Round-Robin format). Incidentally, if you really wanted to be confident in the ranking of teams, you could use an algorithm called Comparison Counting Sort, which is very robust to noise (statistical chance) because it compares each team to every other team and ranks them according to win-loss record, similar to a Round-Robin tournament. Chapter 3 ends with a discussion of social hierarchies, which are established by violence in many species of animals, but which humans can do by using cardinal numbers that serve as benchmarks (e.g. income or age) -- thus obviating the need for head-to-head match-ups. Not only is this approach more computationally scalable, it may also help reduce war between nations.

The next chapter is about caching, which refers to the management of memory in order to save time and money. What information should you keep, and how should you arrange it? Computers have caches in their processors, hard drives, operating systems, web browsers and servers. When a cache gets full, the computer needs to evict an item -- and it makes sense to get rid of the data you will need again the longest from now. Hence the least recently used (LRU) principle, which is much more efficient than overwriting old data at random, or evicting whatever has been sitting in the cache the longest. It works due to temporal locality -- i.e. the fact that in the near future, you are most likely to use a piece of information you've used recently. The implications of LRU include advice for libraries: put your recently acquired books at the back and the most recently returned (read) ones in the front lobby. Additionally, when cleaning your house, LRU is useful for deciding what to keep and what to throw away. When organizing your files and documents (whether physical or on your computer), put the most recently accessed files in the left-hand side of the filing cabinet or on top of the pile. You can also exploit geographic proximity (just like web servers and staging warehouses do, in order to quickly cater to areas where the product is in high demand); for example, put your stuff closest to the place where they are typically used. Finally, don't be dismayed if you find it more difficult to remember things as you get older. The human memory works best like a computer cache: it puts the most used facts at the front of your mind, and the more things you store in your memory, the longer it inevitably takes to retrieve the exact piece of information that you need.

Chapter 5 deals with scheduling, or what to do and in what order. This is relevant not only for construction projects but for tasks as mundane as doing your laundry: if you have more than one load of clothes to wash, what is the best approach? Selmer Johnson found that you should first do the load that will wash the quickest, and do the load that dries the quickest last. But what if you're dealing with scheduling for a single machine... such as yourself? Well, if you're going to do all of your tasks, the total time it takes will be the same regardless of the ordering. In case you have deadlines, you can minimize your "maximally late" tasks using the Earliest Due Date rule: start with the task due soonest. Similarly, if you want to minimize the number of rotten foods in your fridge, Moore's Algorithm says to start by eating the food with the earliest "best before" date, and throw away the biggest (slowest to consume) item if you won't have enough time to eat the next item. In terms of getting things done, the algorithm called Shortest Processing Time suggests that you always do the quickest task you can. However, since some tasks are more important (or effortful) than others, you can multiply each task's completion time by its weight, and try to minimize the sum. The weighted version of Shortest Processing Time works by dividing the weight of each task by its duration and starting with the one with the highest "importance-per-unit-time". Interestingly, a similar strategy is used by animals (who pursue foods in order of highest caloric-energy-to-time-required ratio) and in debt reduction (pay off the debts with the highest interest rates first). Perhaps even more interestingly, Christian and Griffiths argue that procrastination is a problem of solving for the wrong metric, rather than being lazy -- procrastinators are simply acting to reduce the number of outstanding tasks on their minds (hence focusing on the trivial). Scheduling gets more complicated when a certain task cannot be started until another is finished ("precedence constraints"), but it gets easier when you are able to interrupt one task and switch to another ("preemption"). Both Earliest Due Date and Shortest Processing Time can still be used: just switch to a new task if it is due sooner than the current one or can be finished faster. However, switching tasks (known in computer science as a context switch) incurs overhead costs (e.g. delays and errors), which can reduce the amount of actual work you do. In the extreme scenario this is called thrashing, and feels like paralysis. To combat this, you can set a minimum amount of time to work on any particular task (e.g. "pomodoros"), or use the idea of interrupt coalescing, where you set aside a block of time to handle all potential interruptions at once.

***

Let's take a quick break by examining the cover art on the various editions of this book.
The first two seem more "computer-sciency" than the last in my impression.

***

In Chapter 6, the authors present a concept that is not strictly speaking from computer science, but probability theory: Bayes's Rule. This should be familiar to rationalists. You can make a prediction from a single data point using the logic developed by Reverend Thomas Bayes (in 1740s England) and Pierre-Simon Laplace (in 1770s France). Bayes provided the insight that you could reason forward from hypothetical pasts, and then work backward to the most probable one. Laplace actually used calculus to prove that a vast number of hypotheses could be distilled into a single estimate, yielding Laplace's Law. For example, suppose you buy a ticket for a raffle without knowing how many prizes are to be won; Laplace's Law says that you should expect the proportion of winning tickets in the pool to be your number of winning tickets (w) plus one, divided by your number of attempts (n) plus two: so (w+1)/(n+2). If your first ticket was a winner, then your estimate will be 2/3 whereas if you buy three tickets, all of which are winners, then your estimate will be 4/5. But Laplace also developed the ironically-named Bayes's Rule, which is a formula for combining one's preexisting beliefs with observed evidence -- simply by multiplying their probabilities together. Bayes's Rule always requires you to use some prior. In case you don't know anything about the time span you are trying to predict, you can use an "uninformative prior" in the form of the Copernican Principle, which assumes that the current moment isn't special and is therefore likely (on average) to be at the halfway point. Hence, you should guess that the phenomenon will last as long into the future as it already has lasted. However, in reality we can often get richer priors by looking at distributions of data, of which there are three common types: the normal distribution (or "Gaussian bell curve"), for phenomena like human life-span and height; power-law distributions (or "scale-free distributions"), for phenomena like town population and movie box-office grosses; and the Erlang distribution, for phenomena that are invariant like radioactive decay or the amount of time politicians stay in office. Thus we get three rules:
  • Normal prior: use the "average rule". Simply predict that someone will live to the average age of death. Be surprised when the person dies early (but not when their death is overdue).
  • Power-law prior: use the "multiplicative rule". Multiply the quantity observed so far by a constant factor (which happens to be 2 in the case of the Copernican principle). Be surprised the longer you have been waiting for a power-law event (e.g. the collapse of a corporation) to happen, because for such events you'd expect something long-lasting to keep lasting long.
  • Erlang prior: use the "additive rule". Always predict that things will go on just a constant amount longer. Be surprised... well, all events in an Erlang distribution are equally surprising regardless of when they occur.
Studies have shown that in the real world, people's predictions implicitly reflect a surprisingly good understanding of which distribution to use (presumably because our brains carry around information about movie grosses, human life spans and political terms of office)... as long as they have good priors. More surprisingly, Christian and Griffiths suggest that in Mischel's famous marshmallow test, the toddlers who succumbed to temptation weren't actually displaying poor self-control, but expected the experimenter to be unreliable (i.e. they used the multiplicative rule rather than the average rule). The chapter closes with a brief discussion of how the news and social media tend to report on interesting, salient and uncommon things, and as such do not give us accurate priors about the true frequency of events.

The next chapter is about overfitting -- i.e. the phenomenon that a more complex, multifactorial model that better fits the data is not always better at prediction. Indeed, when you tune a statistical or machine-learning model exactly to a specific data set, it can produce wildly varying results if the data is altered slightly (which is inevitable due to random noise). Overfitting reflects an "idolatry of data" as Christian and Griffiths call it: you focus too much on what you're able to measure, as opposed to what really matters. Overfitting happens in daily life. For example, low body fat and high muscle mass are proxy measures of good health, but that doesn't mean that consuming steroids to build muscle and eating an extreme weight-loss diet will make you healthy. In the business world, incentive structures can create unintended consequences (e.g. maximizing production while neglecting maintenance). Military and law enforcement personnel can acquire so-called "training scars", which is when they reflexively use procedures that have been drilled into them in training regardless of whether these are appropriate in the situation at hand. So what can we do about this problem? One approach is cross validation: testing to what extent the model generalizes to data it hasn't seen (e.g. by withholding some of the available data points, or using different metrics). Another approach is inspired by Occam's Razor: all things being equal, the simplest model is probably the correct one. In computer science, complex models are penalized using regularization (such that only factors with a big impact on the result remain in the equation, which makes it more robust). In nature, the requirements of metabolism act as a constraint on organism complexity. In our daily lives, using simple heuristics might sometimes be rational, because less information processing can improve accuracy (especially because uncertainty creates a gap between what we can measure and what matters). Overfitting tells us that when we have limited information, the simplest plans may be the best ones (as we tend to come up with the most important factors first), and that cultural conservatism can "buffer us against the boom-and-bust cycle of fads."

Chapter 8 tackles the question of intractable problems in the form of the traveling salesman problem, which requires a fearsome O(n!) approach ("factorial time") to solve by brute force. The traveling salesman problem basically asks, how can you visit every town in a circuit while visiting each town only once and minimizing your total distance? This is a kind of "constrained optimization" problem, and the issue is that as the number of towns grows, the list of possible routes connecting them grows exponentially. Finding the optimal/perfect solution is beyond the reach of computers, because the Cobham-Edmonds thesis says that problems are only tractable if they can be solved by an efficient algorithm, which is regarded as one that runs in polynomial time (for example O(n^2)) or less. But faced with an intractable problem, we don't have to surrender. Computer scientists employ a trick known as Constraint Relaxation, which makes the problem tractable by reducing it to a less constrained, ideal version in order to make progress, which can be ported back to the real world. For example, the traveling salesman problem can be relaxed by letting the salesman visit the same place more than once and be able to retrace steps for free. There are similar problems (e.g. where to place fire trucks so that every house can be reached in a fixed amount of time) that can be relaxed by turning discrete variables (e.g. number of trucks) into continuous ones, and then rounding the results to whole numbers or interpreting them as probabilities. Other problems are helped by Lagrangian Relaxation, which turns impossibilities into merely costly penalties -- for example, when planning seating arrangements for a wedding, you might ideally want to sit ten people per table, but you could relax this rule by allowing overfull tables with an elbow-room penalty. Although relaxation may not get you the perfect answer, it can get fairly close at a fraction of the computational resources and time -- and therefore allow you to make progress.

The topic of Chapter 9 is randomness. Specifically, the chapter continues the thread of intractable problems by showing how randomized algorithms can solve really hard problems when nothing else helps. During the Second World War, Stanislaw Ulam realized that simulating a nuclear chain reaction and sampling from it was an alternative to doing an exact calculation and analysis of every possibility (which is near impossible). This led to the development of Monte Carlo Methods, which are central to scientific computing today. (Monte Carlo is the name of a casino in Monaco; it happened that Ulam was playing solitaire when he had his insight, a game with an overwhelming space of possibilities.) Randomized algorithms can even help identify prime numbers faster than "deterministic" algorithms -- which is important because modern encryption (used by your laptop and phone) works by multiplying together secret primes, as trying to factor them back out from the product would take far too long to be worth it. Surprisingly, the authors managed to include John Rawls's "veil of ignorance" thought experiment in this chapter, arguing that there are too many variables to take into account when deciding on an ideal society from behind the veil -- so let's use random sampling! For example, the charity GiveDirectly distributes unconditional cash transfers, and randomly selects a cash recipient to interview each week, publishing the results with transparency on their website. (I am glad that Christian and Griffiths gave a shout-out to an effective charity; if you want to read more about organizations similar to GiveDirectly, check out my page on effective altruism.) What randomized algorithms do, essentially, is to trade off certainty (error probability) against time and space. Thus, applied to the traveling salesman problem (or an international vacation plan), the tactical use of randomness can outperform "greedy" algorithms that always take the shortest route available at every step, because the greedy algorithm may only reach a local optimum in the solution-space rather than the global optimum -- in other words, it may look like your solution cannot be improved by any small tweaks, even though you may still be missing the global maximum solution quality. Randomness can help by temporarily experimenting with worse solutions so that you may later climb a hill with a higher peak. In fact, there are a number of strategies:
  • Jitter: make a few random small changes, even if they are for the worse, and then go back to hill climbing.
  • Random-Restart: start again from a random new point.
  • Metropolis Algorithm: named after Nicholas Metropolis, who worked with Ulam on the Manhattan Project, this algorithm says to try out different small-scale tweaks on a solution, and to always accept improvements but occasionally accept bad tweaks too. In fact, your likelihood of following a bad idea should be inversely proportional to how bad an idea it is.
  • Simulated Annealing: based on an analogy to physics, this approach starts with a lot of randomness (like molecules at high temperature) and "cools down" by gradually reducing the amount of randomness injected over time. At some point you should settle down.
The chapter closes with a consideration of how human creativity and "aha!" moments are influenced by chance. Innovation is arguably the outcome of our brains generating lots of new ideas randomly, and then retaining the best ones. Hence, introducing a random element into your thinking (e.g. by using Eno and Schmidt's "Oblique Strategies" card deck, or setting Wikipedia's "random article" link as your browser's default homepage) or into your life (e.g. joining clubs for books, wines and chocolates-of-the-month) can expose you to possibilities you otherwise would never have considered.

In Chapter 10, the penultimate chapter, Christian and Griffiths analogize computer networking to human communication. Both are based on protocol, i.e. a shared convention of norms, expectations and procedures. The Internet works mostly according to Transmission Control Protocol (TCP), which uses packet switching rather than circuit switching -- making it more analogous to the snail mail postal system (which delivers letter, or "packets") than to telephone lines (which use a dedicated channel for each connection). Since there are many ways for data to flow, packet-switched networks become more reliable the larger they get. However, the Byzantine generals problem asks: how can you know whether someone has received your message? In TCP, this is resolved by sending acknowledgement packets (ACKs), with the consequence that a considerable amount of online traffic is actually control messages. Yet another problem is how to handle unreliable transmissions. Whether a silence is indicative of a problem depends on how long the round-trip between sender and receiver takes on average. But when both parties try to reconnect at the same time, the result is interference. So one possible solution (and indeed, the one that is actually used in cases of networking failure) is to wait for a certain period of time before trying to transmit again, and for each successive failure to double this delay length -- this is known as Exponential Backoff. There are implications for human society: instead of giving people a fixed number of chances, we could require exponentially increasing periods of sobriety, which means that our patience will tend toward zero in the long-run but never quite reach it, such that we never have to give up on someone completely. Yet another problem for packet-switched networks is congestion, something TCP handles using Additive Increase, Multiplicative Decrease (AIMD). Basically, when a connection drops a packet, AIMD cuts the transmission rate in half, and then increases the number of packets in flight linearly, creating a "sawtooth" like shape in the bandwidth. A similar kind of flow control in seen in ant colonies (successful round-trips of foragers prompt more to leave the nest whereas unsuccessful ones cause a reduction in foraging activity) and human communications (every text reply encourages another whereas every unreturned message diminishes the flow). Hypothetically, one could apply the AIMD algorithm in corporations to promote employees annually, either one step up the ladder or a couple of steps down. This could mitigate the Peter Principle, which says that "every employee tends to rise to his/her level of incompetence." Chapter 10 makes a few other interesting points, for instance that human "backchannels" (e.g. saying uh-huh, yeah, hmm, oh and nodding one's head when listening) are akin to ACKs in TCP, and that buffers (in the form of queues) only work correctly when they allow you to maximize throughput; if your buffer is too large, you build up an excessive workload, causing bufferbloat. The solution for bufferbloat is Tail Drop, which simply means rejecting every packet (or customer at a crêpe stand) after a certain point. The main issue with modern communication is not that we're always connected, but that we are always buffered. What we need is less latency (as opposed to larger bandwidth).

Finally, Chapter 11 presents us with Game Theory (or more specifically, algorithmic game theory), which can help us deal with problems we cause each other (as opposed to problems derived from the fundamental structure of the world or our limited information-processing capacities). For example, in games of poker, the players can get sucked into the rabbit hole of recursion when they try to simulate each other's minds -- you believe that your opponent has a certain hand, but you also believe that your opponent believes that you believe this, et cetera. Game theory tells us that every two-player game has at least one Nash equilibrium, which is a set of strategies that both players can follow such that neither of them will regret their strategy given what the other has played. Seeking the equilibrium side-steps the problem of recursion, but unfortunately game theory doesn't tell us how to get there. In fact, finding Nash equilibria has been proven to be intractable. There are types of games where we know the equilibria, but here we face another problem: those equilibria aren't necessarily good. For example, in the Prisoner's Dilemma, your dominant strategy (i.e. the best response to all of your opponent's possible strategies) is to defect and rat your accomplice out to the police, even though this leads to an outcome that is worse than if you both cooperated and stayed silent. Likewise in the Tragedy of the Commons, each farmer (or corporation) gains personally by being more reckless than their competitors, but when everyone acts this way, the whole Earth gets ravaged. The bad news is that we cannot do anything to shift these dominant strategies from within the game; the good news is that there is such a thing as mechanism design, which "reverses" game theory by asking, what rules will give us the behavior we want to see? We learn that adding a "Godfather" in the case of the prisoner's dilemma can make defection less attractive, and that using government regulations to limit shop hours and mandate minimum vacation days prevents the race-to-the-bottom of everyone being exhausted without improving their relative positions. One might even argue that human emotions (e.g. anger, compassion, guilt, love) are evolution's way of inducing us to cooperate, since they are mostly involuntary and non-rational, and thus they can replace externally-enforced contracts (plus as a bonus they save us from recursive overthinking). Incidentally, paying attention to the actions of others can lead to fads, herd-following bubbles and even economic disaster due to information cascades; the problem is that others often base their actions on the behavior of other people rather than those people's beliefs -- but this essentially makes their actions uninformative. Beware of situations where public information seems to exceed private information! Information cascades, like the prisoner's dilemma, remind us that sometimes games just have lousy rules. One game that does not have lousy rules, however, is the Vickrey auction, which is an auction format in which each participant bids secretly but pays only the amount of the second-highest bid. These rules incentivize people to be honest, and there is no gain from strategically masking the truth. Moreover, according to the revelation principle, any game that can be played for you by some agent to whom you'll tell the truth (e.g. a lawyer) can be transformed into an honesty-is-best game by incorporating the behavior you want from the agent into the rules of the game itself. If you can't adopt an optimal strategy that avoids recursion, and if you can't change a game with a bad equilibrium, then you can at least choose to play games where honesty is the dominant strategy.

Algorithms to Live By closes with a consideration of "computational kindness", which is the term the authors use to refer to the rather ethical concern of reducing the computational load we impose on each other. You can personally benefit by using good algorithmic approaches (e.g. the 37% rule, the LRU criterion for maintaining your cache, the Upper Confidence Bound approach to the explore/exploit trade-off) and, in the case of intractable problems, "good enough" approaches (e.g. heuristics, approximations, strategic randomness). But if you remember that even the best possible process won't always yield the best outcome, you will be kinder to yourself when things don't go your way. (Recall the similarity to the Heath brothers' emphasis on process.) Yet what computer science affirms is that cognitive labor is bad; therefore whenever you can, try to reduce the computational cost of your interactions with others. For example, by politely asserting your preferences and reducing the number of options you give people, you are actually being computationally kinder toward them than if you used language like "What do you want to do tonight? I'm flexible." Likewise, designers, architects, and urban planners should take an algorithmic perspective to make the kinds of computational problems we face easier (and therefore reduce tension and stress). In doing so, we acknowledge the fact that sometimes, being rational means making trade-offs rather than finding the perfect solution.

***

My initial impression of this book was a kind of intrigue, but that gradually transformed into astonishment. Not only does this book take a new perspective on common problems, it does so in a way that is insightful. One might be skeptical of the book's premise, but the authors managed to make it work. Christian and Griffiths explain the key concepts from computer science well, and they relate those concepts to everyday experience, both theoretically and empirically by referencing psychological studies or anecdotal examples. Furthermore, the writing is clear, concise and non-repetitive. The sub-headings and occasional diagrams aid the comprehensibility of the book. For readers who want more detail, there is an extensive "Notes" section at the end. Finally, it is impressive how the authors made a book about computer science into a humanistic book, with a message of hope and optimism.

I say "hope and optimism" because a recurring theme throughout "Algorithms to Live By" is that problems, even impossibly hard ones, can usually be approached using techniques that will get you "close enough" to the theoretically optimal solution. Hence we can be optimistic, and rationally so, that "the best is yet to come". Alternatively, we can see that the way humans typically approach such situations is in some cases already close to the computer-science-suggested algorithm; meaning that despite all our irrationalities and biases, we are still surprisingly good.

For me, the key takeaways (which are often counterintuitive) include these nuggets:
  • The 37% Rule may be the best you can do in dating, but you still face a 63% chance of not finding the best match within your pool of candidates.
  • Old people are not necessarily "set in their ways"; they are just (rationally) choosing to exploit the things they know are good instead of exploring new possibilities, since exploitation has greater value to them than exploration.
  • You should only sort your stuff if it helps you save time searching later. And you should "sort" people and countries according to cardinal measurements rather than ordinal ranks.
  • The best way to store your stuff is to put the most recently used items in the most accessible locations. But beware: the more stuff you have, the longer it will take you to find what you need (this applies to your memories too).
  • To be productive, do things in the order of earliest deadline first, or quickest to finish first. To avoid procrastination and paralysis, weight your tasks by importance and schedule a minimum period of uninterrupted work.
  • To make good predictions from one data point, multiply the prior probability of your hypothesis with the likelihood of the data point. If you lack good preexisting information, assume that the trend will continue for as long as it has already existed.
  • Good models don't necessarily capture all the observable data points; so make sure that you are measuring only the few factors that really matter.
  • Some problems cannot be solved (even by the best computers!) unless you relax the rules.
  • Creativity may consist of randomly sampling from a pool of ideas, and sometimes what may seem like a bad idea in the short-run might actually be exactly what you need in the long-run.
  • Good communication requires you to acknowledge what the other person is saying (even if it's just a "hmm, yeah" or nod) and to avoid talking over them. Additionally, you should make it a policy to delete unread emails or text messages once a while, lest you suffer bufferbloat.
  • Instead of playing mind games, find the dominant strategy or find a game where people can win by being honest. (But don't follow the herd unless you know people's reasons for why they are acting as they are.)
  • Being assertive about your preferences rather than forcing others to guess what you want is actually kinder on them from a computational perspective.
Hypothetically, one could combine these rules and the algorithms mentioned in this book into the WRAP process from "Decisive", especially the "decision playlist" aspect of the W stage (widening the frame of options). Come to think of it, at some point I might collect pointers from all the books related to thinking and decision-making that I've read and collate them into a massive "playlist".

Overall, I think this book deserves 5/5 stars. It probably won't be for everyone, but for nerds it is recommended.

Comments

Popular posts from this blog