You have one job: Solving problems. You have multiple tools. Maybe you use code as a tool to solve some problems. Maybe you use design for others. Maybe you use good communication and negotiation skills.

Mike Acton, How much time should I spend coding versus managing?

If you seek tranquility, do less. Or, more accurately, do what’s essential.

Marcus Aurelius, Meditations, Book 4.24

This post is part of the work done at Conjecture.

Refine, the alignment research incubator we are running at Conjecture, finished its first cohort a few weeks ago. So now is a good time to take stock, share what we’ve learned, and discuss its future.

Let’s get this out of the way first: we are not planning any new cohort in the foreseeable future. There are multiple reasons for this, which I’ll expand on in this post. But to summarize:

Running Refine in a way that would fully aim at the stated target would require more effort
SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine
The work we’re doing in Conjecture’s epistemology team is far more fundamental and neglected than field-building according to me, at least in the current climate.

Now for the details.

The Target

The key idea behind Refine was to create more conceptual alignment researchers with their own radically different agendas, rather than new researchers following established approaches. To create more researchers like John, Paul, Vanessa, Evan, Steve, and the others.

How we operationalized this goal was to look for relentlessly resourceful thinkers with unorthodox shapes of minds for the alignment community.

The Result

Now that the first cohort is over, how well have we hit this target? Out of 5 participants

2 are pursuing their own research bets, though these are not radically different from established approaches
1 is still building theirs
1 has found a neglected field-building opportunity
1 feels like they still need to upskill before working directly on alignment.

Based only on The Target above, this is 0/5.

Of course that doesn’t mean the program didn’t have positive outcomes and externalities! On the contrary, I’m really happy how a lot of things turned out, and I’ve heard from all participants that they got a lot out of Refine. Non-negligeable accomplishments include:

Feedback from multiple alignment researchers that Refine participants had a deep model of the alignment problem at the end of the program.[1]
Refine participants all around improved their productivity, some on writing and others on iterating on ideas.
All Refine participants met and talked with many alignment researchers and newcomers like them, considerably expanding their network and understanding of the alignment space.
Participants posted around 25 posts in total on the Alignment Forum, some of which I find exciting.
I got a crash course in management that helped me upskill quickly.
We had a lot of great moments and support from each other.
In our leaving survey, all participants said they would highly recommend the program, and that it was more counterfactually useful than what they would have done instead by default.
I expect most, if not all, participants to make relevant contributions to the field.

None of these are irrelevant. Yet if we focus on the original metric, the pilot of Refine failed. Having reflected on this, I have some thoughts on how we could have better aimed at this target (whether it is the correct target is a question for a later section).

It all amounts to lack of optimization.

Failing to Optimize

The first place where we failed to optimize for wildly different research agendas was in the selection population. Given where we advertised (various EA and rationalists websites, Slacks, and Discords), we drew a crowd homogeneous along many dimensions. There was no way we were going to end up with a linguist or a sociologist for example. That would have required more targeted outreach effort.

This failure mode is shared by all training programs I know about: even PIBBSS, which successfully brought together a more diverse cohort, had trouble with the fields most different from alignment, like the social sciences.

Our second lack of optimization came from the selection process itself. If you want to create independent conceptual researchers that work on the problem right after your program, you need to push really hard for the following traits:

Want to work on conceptual alignment ASAP
Can tolerate the emotional and material difficulties of independent research
Is able to generate their own ideas
Is able to make mistakes and update

Looking back, most of the participants in the first cohort scored well along these lines, but all of them have at least one of these traits where they need to improve.

Last but not least, the process within Refine itself could have better focused on guiding participants to build a gears-level model of alignment. What we ended up doing was mostly discussing Unbounded Atomic Optimization and Epistemological Vigilance, and providing feedback on ideas. Whereas I currently see more explicit exercises (like building a treeory of change), an early focus on poking as many holes as possible in models of alignment, and a sweeping tour of the state of the art, as necessary first steps to produce worthwhile conceptual alignment research quickly.

In the end, all participants of the first cohort learned a deep model of the alignment problem, but better program structure could have accelerated this. And with such a deep gears-level model from the start, all the mentoring focused on pushing ideas towards the most relevant form for alignment would have been vastly more effective, as there would have been significantly less “translation effort” from the mentor side.

The Right Target?

Note that the above assumes that Refine’s original goal, creating more conceptual alignment researchers with their own radically different agendas, was the right one.

But is it really? Even if it is a good one, is it the most important one, or the most crucial one to solving alignment?

I have updated toward no. Or rather, I have updated toward being suspicious of targets that look as instrumental as this one.

For creating new research directions, and new researchers, sidelines the key difficulty in solving alignment: finding what needs to be done concretely to solve alignment and the best profile for such endeavours. Instead of figuring these hard questions, you delegate them to the future, to the next generation.

On a problem with longer timelines, this might be the right move: let the compound interest do the work. Even with short timelines, if I had no ideas and no plans for addressing these hard questions, passing the buck might have been the best decision.

But I have an angle and a plan. Figuring out how to tackle alignment, why it is hard, and how to deal with these difficulties is literally the task of my epistemology team at Conjecture. In these conditions, me spending that much time on field-building seems like a bad bet: I’m doing a worst job than literally most field-builders I know (only really providing my own idiosyncratic ideas that can be shared anyway) while neglecting an angle of attack on the problem that is completely neglected and appears promising to me and Conjecture.

I’m excited to see SERI MATS and other programs step up for making new alignment researchers, and will continue to encourage them and give them feedback. But my personal arena is elsewhere.