We Are Conjecture, A New Alignment Research Startup

Conjecture is a new alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research. We have VC backing from, among others, Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried. Our founders and early staff are mostly EleutherAI alumni and previously independent researchers like Adam Shimi. We are located in London.

Of the options we considered, we believe that being a for-profit company with products[1] on the market is the best one to reach our goals. This lets us scale investment quickly while maintaining as much freedom as possible to expand alignment research. The more investors we appeal to, the easier it is for us to select ones that support our mission (like our current investors), and the easier it is for us to guarantee security to alignment researchers looking to develop their ideas over the course of years. The founders also retain complete control of the company.

We're interested in your feedback, questions, comments, and concerns. We'll be hosting an AMA on the Alignment Forum this weekend, from Saturday 9th to Sunday 10th, and would love to hear from you all there. (We'll also be responding to the comments thread here!)

Our Research Agenda

We aim to conduct both conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search.

Our research agenda is still actively evolving, but some of the initial directions are:

New frames for reasoning about large language models:
What: Propose and expand on a frame of GPT-like models as simulators of various coherent text-processes called simulacra, as opposed to goal-directed agents (upcoming sequence to be published on the AF, see this blogpost for preliminary thoughts).
Why: Both an alternative perspective on alignment that highlights different questions, and a high-level model to study how large language models will scale and how they will influence AGI development.
Scalable mechanistic interpretability:
What: Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be:
Locating and editing factual knowledge in a transformer language model.
Using deep learning to automate deep learning interpretability - for example, training a language model to give semantic labels to neurons or other internal circuits.
Studying the high-level algorithms that models use to perform e.g, in-context learning or prompt programming.
Why: Provide tools to implement alignment proposals on neural nets, and insights that reframe conceptual problems in concrete terms.
History and philosophy of alignment:
What: Map different approaches to alignment, translate between them, explore ideas that were abandoned too fast, and propose new exciting directions (upcoming sequence on pluralism in alignment to be published on the AF).
Why: Help alignment research become even more pluralist while still remaining productive. Understanding historical patterns helps put our current paradigms and assumptions into perspective.

We target the Alignment Forum as our main publication outlet, and aim to regularly publish posts there and interact with the community through it. That being said, our publication model is non-disclosure-by-default, and every shared work will go through an internal review process out of concern for infohazards.

In addition to this research, we want to create a structure hosting externally funded independent conceptual researchers, managed by Adam Shimi. It will also include an incubator for new conceptual alignment researchers to propose and grow their own research directions.

How We Fit in the Ecosystem

Our primary goal at Conjecture is to conduct prosaic alignment research which is informed by the ultimate problem of aligning superintelligence. We default to short timelines, generally subscribe to the scaling hypothesis, and believe it is likely that the first AGI will be based on modern machine-learning architectures and learning methods.

We believe that combining conceptual research, applied research, and hosting independent researchers into one integrated organization is a recipe for making promising untapped research bets, fostering collaboration between the more high concept work and the experimental side, and truly scaling alignment research.

Among the other existing safety orgs, we consider ourselves closest in spirit to Redwood Research in that we intend to focus primarily on (prosaic) alignment questions and embrace the unusual epistemology of the field. Our research agenda overlaps in several ways with Anthropic, especially in our acceptance of the Scaling Hypothesis and interest in mechanistic interpretability, but with more emphasis on conceptual alignment.

We Are Hiring!

If this sounds like the kind of work you’d be interested in, please reach out!

We are always looking to hire more engineers and researchers. At the time of writing, we are particularly interested in hiring devops and infrastructure engineers with supercomputing experience, and are also looking for one to two fullstack/frontend webdevs, preferably with data visualization experience. We are located in London and pay is competitive with FAANG. If you have experience with building, serving, and tuning large scale ML models and experiments, or have done interesting alignment theory work, we’d love to hear from you. We also accept Alignment Forum posts as applications!

We will open applications for the incubator in about a month, and are interested in hearing from any funded independent conceptual researcher who would like to be hosted by us.

If you don’t fit these descriptions but would like to work with us, please consider reaching out anyways if you think you have something interesting to bring to the table.

And if you’re around London and would like to meet, feel free to drop us an email as well!

Conjecture is a new alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research. We have VC backing from, among others, Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried. Our founders and early staff are mostly EleutherAI alumni and previously independent researchers like Adam Shimi. We are located in London.

Of the options we considered, we believe that being a for-profit company with products[1] on the market is the best one to reach our goals. This lets us scale investment quickly while maintaining as much freedom as possible to expand alignment research. The more investors we appeal to, the easier it is for us to select ones that support our mission (like our current investors), and the easier it is for us to guarantee security to alignment researchers looking to develop their ideas over the course of years. The founders also retain complete control of the company.

We're interested in your feedback, questions, comments, and concerns. We'll be hosting an AMA on the Alignment Forum this weekend, from Saturday 9th to Sunday 10th, and would love to hear from you all there. (We'll also be responding to the comments thread here!)

Our Research Agenda

We aim to conduct both conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search.

Our research agenda is still actively evolving, but some of the initial directions are:

New frames for reasoning about large language models:
What: Propose and expand on a frame of GPT-like models as simulators of various coherent text-processes called simulacra, as opposed to goal-directed agents (upcoming sequence to be published on the AF, see this blogpost for preliminary thoughts).
Why: Both an alternative perspective on alignment that highlights different questions, and a high-level model to study how large language models will scale and how they will influence AGI development.
Scalable mechanistic interpretability:
What: Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be:
Locating and editing factual knowledge in a transformer language model.
Using deep learning to automate deep learning interpretability - for example, training a language model to give semantic labels to neurons or other internal circuits.
Studying the high-level algorithms that models use to perform e.g, in-context learning or prompt programming.
Why: Provide tools to implement alignment proposals on neural nets, and insights that reframe conceptual problems in concrete terms.
History and philosophy of alignment:
What: Map different approaches to alignment, translate between them, explore ideas that were abandoned too fast, and propose new exciting directions (upcoming sequence on pluralism in alignment to be published on the AF).
Why: Help alignment research become even more pluralist while still remaining productive. Understanding historical patterns helps put our current paradigms and assumptions into perspective.