Why would AI companies use human-level AI to do alignment research?

Many plans for how to safely build superintelligent AI have a critical section that goes like this:

  1. Develop AI that’s powerful enough to do AI research, but not yet powerful enough to pose an existential threat.
  2. Use it to assist with alignment research, thus greatly accelerating the pace of work—hopefully enough to solve all alignment problems.

You could call this process “alignment bootstrapping”.

This is a central feature of DeepMind’s plan (see “Amplified oversight”), Anthropic’s plan (see “Scalable Oversight”), and independent plans written by Sam Bowman (an AI safety manager at Anthropic), Joshua Clymer (a researcher at Redwood Research), and Marius Hobbhahn (CEO of Apollo Research).

There are various reasons why alignment bootstrapping could fail1 even if implemented well, and some of those plans acknowledge this. But I’m also concerned about whether alignment bootstrapping will be implemented at all.

When the time comes, will AI companies actually spend their resources on alignment bootstrapping?

When AI companies have human-level AI systems, will they use them for alignment research, or will they use them (mostly) to advance capabilities instead?

Continue reading
Posted on

The Triple-Interaction-Effects Argument

In this post I will explain the most impressive argument I heard in 2024.

First, some context:

There is an ongoing debate in the bodybuilding/strength training community about how much protein you should eat while losing weight.

Some say you should eat more protein if you’re losing weight:

If you’re eating less, your body is under extra pressure to cannibalize your muscles. Therefore, you should eat more protein to cancel this out.

The standard rebuttal:

Experimental trials have found that muscle gains max out when subjects eat 0.7–0.8 grams of protein per pound of bodyweight, and that’s true both when participants are maintaining weight and when they’re losing weight. There doesn’t appear to be a difference.

And the counter-rebuttal:

Almost all research looks at novice lifters. Experienced athletes have a more difficult time gaining muscle,1 so losing weight will have a bigger negative impact on them, and therefore they need to eat more protein.

I used to believe this. Then I heard the most impressive argument of 2024.

I heard the argument in a YouTube video by Menno Henselmans:

It’s possible that in trained individuals there is a triple interaction effect, because that’s what you’re arguing here. If you’re saying that protein requirements increase in an energy deficit, but only in strength-trained individuals, then you are arguing for a triple interaction effect. […] That is very, very, very rare. Triple interaction effects, biologically speaking, simply do not occur much.

I didn’t understand what he was talking about. I spent two days pondering what it meant. On the third day, it finally clicked and I realized he was right.

To claim that trained lifters should eat more protein on an energy deficit, you’d need to believe that:

  1. Above a certain level of protein intake (0.7–0.8 grams per pound), additional protein has no effect on muscle growth.2
  2. Most of the time, trained athletes don’t need more protein than novices.
  3. Novices don’t need more protein while losing weight than while maintaining/gaining weight.
  4. HOWEVER, (a) among trained individuals who are (b) losing weight, the ones (c) who eat more protein (beyond 0.7–0.8 g/lb) gain more muscle.

The first variable (protein intake) has no interaction with muscle growth.

The second variable (trained vs. untrained) has no interaction with muscle growth.

The third variable (losing vs. maintaining weight) has no interaction with muscle growth.

The first and second variables together (protein intake + trained/untrained) have no interaction with muscle growth.

The first and third variables together (protein intake + losing/maintaining weight) have no interaction with muscle growth.

HOWEVER, when you put all three variables together, an interaction suddenly appears—a triple interaction effect.

This is a very strange claim. If all three variables together affect muscle growth, then you would expect each variable individually to affect muscle growth. And at least you would expect two out of three variables together to affect muscle growth.

(In fact, it is mathematically impossible to construct a differentiable function f(x, y, z) that is constant with respect to x, constant with respect to y, and constant with respect to z, but not constant overall. Although you could have a function f(x, y, z) where the slope with respect to each individual variable is close to 0, but not quite 0.)

Not to say a triple interaction effect can’t occur in the real world. It could be that muscle growth does depend on each of (protein intake, training experience, calorie deficit), but the relationships are so weak that the studies failed to pick them up.

But if you believe the studies’ results are correct, then it seems difficult—maybe even impossible—to still believe that trained lifters need to eat more protein while on a calorie deficit.

***

This was the best argument I heard in 2024 because:

  • If you think about it, it’s obviously correct. It changed my mind as soon as I understood it.
  • It’s difficult to come up with. (I’ve never heard anyone else make this argument.)

Notes

  1. I’m conflating gaining strength with putting on muscle. There’s a difference, but we can consider them the same thing for the purposes of this post. 

  2. This claim is somewhat controversial, but let’s assume it’s true for the sake of this argument.

    Randomized controlled trials find no benefit to more than ~0.7 g/lb, and I quoted a range of 0.7–0.8 g/lb to account for variation between individuals. But the existing studies aren’t that great so I don’t have high confidence that that’s the correct range. 

Posted on

You can now read my reading notes

Since 2015, I have been taking notes on most articles I read. I figured other people might find them useful, so I cleaned them up and published them on my website. You can find them via the new “Notes” tab.

I will update the page every once in a while as I read more articles and take more notes.

I also have notes on every educational book I’ve read since 2015, but the notes are on physical paper (can you believe it?). I might digitize them at some point.

Posted on

There Are Three Kinds of "No Evidence"

David J. Balan once proposed that there are two kinds of “no evidence”:

  1. There have been lots of studies directly on this point which came back with the result that the hypothesis is false.
  2. There is no evidence because there are few or no relevant studies.

I propose that there are three kinds of “no evidence”:

  1. The hypothesis has never been studied.
  2. There are studies, the studies failed to find supporting evidence, but they wouldn’t have found supporting evidence even if the hypothesis were true.
  3. There are studies, the studies should have found supporting evidence if the hypothesis were true, and they didn’t.

Example of type 1: A 2003 literature review found that there were no studies1 showing that parachutes could prevent injury when jumping out of a plane.

Example of type 2: In 2018, there was finally a randomized controlled trial2 on the effectiveness of parachutes, and it found no difference between the parachute group and the control group. However, participants only jumped from a height of 0.6 meters (~2 feet). I don’t know about you, but this result does not make me want to jump out of a plane without a parachute.

Like in the parachute example, you see type-2 “no evidence” whenever the conditions of a study don’t match the real-world environment. You also see type-2 “no evidence” when an experiment is underpowered. Say you want to test the hypothesis that boys are taller than girls. So you go find your niece Sally and your neighbor’s son James and it turns out Sally is an inch taller than James. Your methodology was valid—you can indeed test the hypothesis by finding some people and measuring their heights—but your sample size was too small.

(The difference between type 2 and type 3 can be a matter of degree. The more powerful a study is, the stronger its “no evidence” if it fails to find an effect.)

Notes

Posted on

The 7 Best High-Protein Breakfast Cereals

Updated 2025-03-19 to add Catalina Crunch Cinnamon Toast.

(I write listicles now)

(there are only 7 eligible high-protein breakfast cereals, so the ones at the bottom are still technically among the 7 best even though they’re not good)

If you search the internet, you can find rankings of the best “high-protein” breakfast cereals. But most of the entries on those lists don’t even have that much protein. I don’t like that, so I made my own list.

This is my ranking of genuinely high-protein breakfast cereals, which I define as containing at least 25% calories from protein.

Many food products like to advertise how many grams of protein they have per serving. That number doesn’t matter because it depends on how big a serving is. Hypothetically, if a food had 6g protein per serving but each serving contained 2000 calories, that would be a terrible deal. The actual number that matters is the proportion of calories from protein.

My ranking only includes vegan cereals because I’m vegan. Fortunately most cereals are vegan anyway. The main exception is that some cereals contain whey protein, but that’s not too common—most of them use soy, pea, or wheat protein instead.

High-protein cereals, ranked by flavor

Continue reading
Posted on

My submission for Worst Argument In The World

Scott Alexander once wrote:

David Stove once ran a contest to find the Worst Argument In The World, but he awarded the prize to his own entry, and one that shored up his politics to boot. It hardly seems like an objective process.

If he can unilaterally declare a Worst Argument, then so can I.

If those guys can unilaterally declare a Worst Argument, then so can I. I declare the Worst Argument In The World to be this:

“A long time ago, not-A, and also, not-B. Now, A and B. Therefore, A caused B.”

Example: In 1820, pirates were everywhere. Now you hardly ever see pirates, and global temperatures are rising. Therefore, the lack of pirates caused global warming.

(This particular argument was originally made as a joke, but I will give some real examples later.)

Naming fallacies is hard. Maybe we could call this the “two distant points in time fallacy”. For now I’ll just call it the Worst Argument.

Continue reading
Posted on

← Newer Page 1 of 6