Alignment Bootstrapping Is Dangerous

AI companies want to bootstrap weakly-superhuman AI to align superintelligent AI. I don’t expect them to succeed. I could give various arguments for why alignment bootstrapping is hard and why AI companies are ignoring the hard parts of the problem; but you don’t need to understand any details to know that it’s a bad plan.

When AI companies say they will bootstrap alignment, they are admitting defeat on solving the alignment problem, and saying that instead they will rely on AI to solve it for them. So they’re facing a problem of unknown difficulty, but where the difficulty is high enough that they don’t think they can solve it. And to remediate this, they will use a novel technique never before used in history—i.e., counting on slightly-superhuman AI to do the bulk of the work.

If they mess up and this plan doesn’t work, then superintelligent AI kills everyone.

And they think this is an acceptable plan, and it is acceptable for them to build up to human-level AI or beyond on the basis of this plan.

What?

Continue reading
Posted on

We won't solve non-alignment problems by doing research

Introduction

Even if we solve the AI alignment problem, we still face non-alignment problems, which are all the other existential problems1 that AI may bring.

People have written research agendas on various imposing problems that we are nowhere close to solving, and that we may need to solve before developing ASI. An incomplete list of topics: misuse; animal-inclusive AI; AI welfare; S-risks from conflict; gradual disempowerment; risks from malevolent actors; moral error.

The standard answer to these problems, the one that most research agendas take for granted, is “do research”. Specifically, do research in the conventional way where you create a research agenda, explore some research questions, and fund other people to work on those questions.

If transformative AI arrives within the next decade, then we won’t solve non-alignment problems by doing research on how to solve them.

Continue reading
Posted on

Do Disruptive or Violent Protests Work?

Previously, I reviewed the five strongest studies on protest outcomes and concluded that peaceful protests probably work (credence: 90%).

But what about disruptive or violent protests?

Peaceful protests use nonviolent, non-disruptive tactics such as picketing and marches.

Disruptive protests use nonviolent, in-your-face tactics such as civil disobedience, sit-ins, and blocking roads.

Violent protests use violence.

There isn’t much evidence on the other two categories of protest. My best guesses are:

  • Violent protests probably don’t work. (credence: 80%)
  • Violent protests may reduce support for a cause, but it’s unclear. (credence: 40%)
  • For disruptive protests, it’s hard to say whether they have a positive or negative impact on balance. I’m about evenly split on whether a randomly-chosen disruptive protest is net helpful, neutral, or harmful.
  • A typical disruptive protest doesn’t work as well a typical peaceful protest. (credence: 80%)
  • Peaceful protests are a better idea than disruptive protests. (credence: 90%)
Continue reading
Posted on

Epistemic Spot Check: Expected Value of Donating to Alex Bores's Congressional Campaign

Political advocacy is an important lever for reducing existential risk. One way to make political change happen is to support candidates for Congress.

In October, Eric Neyman wrote Consider donating to Alex Bores, author of the RAISE Act. He created a cost-effectiveness analysis to estimate how donations to Bores’s campaign change his probability of winning the election. It’s excellent that he did that—it’s exactly the sort of thing that we need people to be doing.

We also need more people to check other people’s cost-effectiveness estimates. To that end, in this post I will check Eric’s work.

I’m not going to talk about who Alex Bores is, why you might want to donate to his campaign, or who might not want to donate. For that, see Eric’s post.

Continue reading
Posted on

Writing Your Representatives: A Cost-Effective and Neglected Intervention

Is it a good use of time to call or write your representatives to advocate for issues you care about? I did some research, and my current (weakly-to-moderately-held) belief is that messaging campaigns are very cost-effective.

In this post:

Continue reading
Posted on

Do Small Protests Work?

TLDR: The available evidence is weak. It looks like small protests may be effective at garnering support among the general public. Policy-makers appear to be more sensitive to protest size, and it’s not clear whether small protests have a positive or negative effect on their perception.

Previously, I reviewed evidence from natural experiments and concluded that protests work (credence: 90%).

My biggest outstanding concern is that all the protests I reviewed were nationwide, whereas the causes I care most about (AI safety, animal welfare) can only put together small protests. Based on the evidence, I’m pretty confident that large protests work. But what about small ones?

I can see arguments in both directions.

On the one hand, people are scope insensitive. I’m pretty sure that a 20,000-person protest is much less than twice as impactful as a 10,000-person protest. And this principle may extend down to protests that only include 10–20 people.

On the other hand, a large protest and a small protest may send different messages. People might see a small protest and think, “Why aren’t there more people here? This cause must not be very important.” So even if large protests work, it’s conceivable that small protests could backfire.

What does the scientific literature say about which of those ideas is correct?

Continue reading
Posted on

My Third Caffeine Self-Experiment

Last year I did a caffeine cycling self-experiment and I determined that I don’t get habituated to caffeine when I drink coffee three days a week. I did a follow-up experiment where I upgraded to four days a week (Mon/Wed/Fri/Sat) and I found that I still don’t get habituated.

For my current weekly routine, I have caffeine on Monday, Wednesday, Friday, and Saturday. Subjectively, I often feel low-energy on Saturdays. Is that because the caffeine I took on Friday is having an aftereffect that makes me more tired on Saturday?

When I ran my second experiment, I took caffeine four days, including the three-day stretch of Wednesday-Thursday-Friday. I found that my performance on a reaction time test was comparable between Wednesday and Friday. If my reaction time stayed the same after taking caffeine three days in a row, that’s evidence that I didn’t develop a tolerance over the course of those three days.

But if three days isn’t long enough for me to develop a tolerance, why is it that lately I feel tired on Saturdays, after taking caffeine for only two days in a row? Was the result from my last experiment incorrect?

So I decided to do another experiment to get more data.

This time I did a new six-week self-experiment where I kept my current routine, but I tested my reaction time every day. I wanted to test two hypotheses:

  1. Is my post-caffeine reaction time worse on Saturday than on Mon/Wed/Fri?
  2. Is my reaction time worse on the morning after a caffeine day than on the morning after a caffeine-free day?

The first hypothesis tests whether I become habituated to caffeine, and the second hypothesis tests whether I experience withdrawal symptoms the following morning.

The answers I got were:

  1. No, there’s no detectable difference.
  2. No, there’s no detectable difference.

Therefore, in defiance of my subjective experience—but in agreement with my earlier experimental results—I do not become detectably habituated to caffeine on the second day.

However, it’s possible that caffeine habituation affects my fatigue even though it doesn’t affect my reaction time. So it’s hard to say for sure what’s going on without running more tests (which I may do at some point).

Continue reading
Posted on

Will Welfareans Get to Experience the Future?

Epistemic status: This entire essay rests on two controversial premises (linear aggregation and antispeciesism) that I believe are quite robust, but I will not be able to convince anyone that they’re true, so I’m not even going to try.

Cross-posted to the Effective Altruism Forum.

If welfare is important, and if the value of welfare scales something-like-linearly, and if there is nothing morally special about the human species1, then these two things are probably also true:

  1. The best possible universe isn’t filled with humans or human-like beings. It’s filled with some other type of being that’s much happier than humans, or has much richer experiences than humans, or otherwise experiences much more positive welfare than humans, for whatever “welfare” means. Let’s call these beings Welfareans.
  2. A universe filled with Welfareans is much better than a universe filled with humanoids.

(Historically, people referred to these beings as “hedonium”. I dislike that term because hedonium sounds like a thing. It doesn’t sound like something that matters. It’s supposed to be the opposite of that—it’s supposed to be the most profoundly innately valuable sentient being. So I think it’s better to describe the beings as Welfareans. I suppose we could also call them Hedoneans, but I don’t want to constrain myself to hedonistic utilitarianism.)

Even in the “Good Ending” where we solve AI alignment and governance and coordination problems and we end up with a superintelligent AI that builds a flourishing post-scarcity civilization, will there be Welfareans? In that world, humans will be able to create a flourishing future for themselves; but beings who don’t exist yet won’t be able to give themselves good lives, because they don’t exist.

Continue reading
Posted on

How Much Does It Cost to Offset an LLM Subscription?

Is moral offsetting a good idea? Is it ethical to spend money on something harmful, and then donate to a charity that works to counteract those harms?

I’m not going to answer that question. Instead I’m going to ask a different question: if you use an LLM, how much do you have to donate to AI safety to offset the harm of using an LLM?

I can’t give a definitive answer, of course. But I can make an educated guess, and my educated guess is that for every $1 spent on an LLM subscription, you need to donate $0.87 to AI safety charities.

Continue reading
Posted on

← Newer Page 1 of 10