Human decision processes are not well factored
Human decision processes are not well factored
Human decision processes are not well factored
Conjecture
Feb 17, 2023
A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny instead of openly considering it. Such reactions are quite common when challenged: it takes active effort not to react purely defensively and consider the critic’s models, even when they are right.
How can we understand why this occurs? Seen from a stereotypical view of human behavior, values and preferences about the world are decoupled from our world model and beliefs. An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things. This view works to some degree: we know facts about all sorts of things, from politics to biology, seem to have values about various circumstances, and so on. Cases like political values interfering with our ability to update our beliefs are then modeled as irrational noise on top of this clean model of human behavior. If we can eliminate this noise and get ourselves to overcome our irrational urges, we can (in principle) behave rationally.
But to make this model of factored rationality work, we need a lot of corrections to account for all of the cognitive biases that humans display. Each one adds more parameters and noise, and there are a lot of them! It should make us suspicious that we are taking a model of the world, and each time we come across something that contradicts that view, we just add more noise and parameters to it.
Another potential model that avoids this ad-hoc addition of noise is seeing this unwillingness to update as something more fundamental: our values, beliefs, and decision theory are entangled and do not exist independently.
Consequences of taking this model seriously include:
Changing beliefs can change values and vice versa, making us more resistant to updating
To have sharp beliefs and values, we must actively implement them. This does not happen by default
Even after we implement a belief or value, and a decision theory around it, it is still local. The implementation may still clash with other parts of the messy processes driving our behavior
Implementing values and beliefs isn’t free and takes time and effort to do well, so we need to decide when this is worthwhile
To continue the example of accepting critical feedback, say I discuss an idea I have of a tool to build with one of my colleagues. He pushes me on a few practical details: I need to be more concrete about the use case, if there are better ways to do it, and if this is the best use of my time. But in its original form, my idea wasn’t a clean set of claims about the world, which I use to make decisions about what to build. Instead, they got tangled with my values: I like my ideas; they are mine, after all. If I put in the work to untangle my model of reality with these emotions and values, accepting that it might feel bad, I can more directly apply the evidence and models he presents to my idea.
In this model, humans do not have cleanly separated values, world models, and decision theory! One method of dealing with this is explicitly implementing locally consistent beliefs and values and a decision theory based on them. This implementation is limited: it is, at best, locally consistent and takes time and energy to create.
A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny instead of openly considering it. Such reactions are quite common when challenged: it takes active effort not to react purely defensively and consider the critic’s models, even when they are right.
How can we understand why this occurs? Seen from a stereotypical view of human behavior, values and preferences about the world are decoupled from our world model and beliefs. An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things. This view works to some degree: we know facts about all sorts of things, from politics to biology, seem to have values about various circumstances, and so on. Cases like political values interfering with our ability to update our beliefs are then modeled as irrational noise on top of this clean model of human behavior. If we can eliminate this noise and get ourselves to overcome our irrational urges, we can (in principle) behave rationally.
But to make this model of factored rationality work, we need a lot of corrections to account for all of the cognitive biases that humans display. Each one adds more parameters and noise, and there are a lot of them! It should make us suspicious that we are taking a model of the world, and each time we come across something that contradicts that view, we just add more noise and parameters to it.
Another potential model that avoids this ad-hoc addition of noise is seeing this unwillingness to update as something more fundamental: our values, beliefs, and decision theory are entangled and do not exist independently.
Consequences of taking this model seriously include:
Changing beliefs can change values and vice versa, making us more resistant to updating
To have sharp beliefs and values, we must actively implement them. This does not happen by default
Even after we implement a belief or value, and a decision theory around it, it is still local. The implementation may still clash with other parts of the messy processes driving our behavior
Implementing values and beliefs isn’t free and takes time and effort to do well, so we need to decide when this is worthwhile
To continue the example of accepting critical feedback, say I discuss an idea I have of a tool to build with one of my colleagues. He pushes me on a few practical details: I need to be more concrete about the use case, if there are better ways to do it, and if this is the best use of my time. But in its original form, my idea wasn’t a clean set of claims about the world, which I use to make decisions about what to build. Instead, they got tangled with my values: I like my ideas; they are mine, after all. If I put in the work to untangle my model of reality with these emotions and values, accepting that it might feel bad, I can more directly apply the evidence and models he presents to my idea.
In this model, humans do not have cleanly separated values, world models, and decision theory! One method of dealing with this is explicitly implementing locally consistent beliefs and values and a decision theory based on them. This implementation is limited: it is, at best, locally consistent and takes time and energy to create.
A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny instead of openly considering it. Such reactions are quite common when challenged: it takes active effort not to react purely defensively and consider the critic’s models, even when they are right.
How can we understand why this occurs? Seen from a stereotypical view of human behavior, values and preferences about the world are decoupled from our world model and beliefs. An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things. This view works to some degree: we know facts about all sorts of things, from politics to biology, seem to have values about various circumstances, and so on. Cases like political values interfering with our ability to update our beliefs are then modeled as irrational noise on top of this clean model of human behavior. If we can eliminate this noise and get ourselves to overcome our irrational urges, we can (in principle) behave rationally.
But to make this model of factored rationality work, we need a lot of corrections to account for all of the cognitive biases that humans display. Each one adds more parameters and noise, and there are a lot of them! It should make us suspicious that we are taking a model of the world, and each time we come across something that contradicts that view, we just add more noise and parameters to it.
Another potential model that avoids this ad-hoc addition of noise is seeing this unwillingness to update as something more fundamental: our values, beliefs, and decision theory are entangled and do not exist independently.
Consequences of taking this model seriously include:
Changing beliefs can change values and vice versa, making us more resistant to updating
To have sharp beliefs and values, we must actively implement them. This does not happen by default
Even after we implement a belief or value, and a decision theory around it, it is still local. The implementation may still clash with other parts of the messy processes driving our behavior
Implementing values and beliefs isn’t free and takes time and effort to do well, so we need to decide when this is worthwhile
To continue the example of accepting critical feedback, say I discuss an idea I have of a tool to build with one of my colleagues. He pushes me on a few practical details: I need to be more concrete about the use case, if there are better ways to do it, and if this is the best use of my time. But in its original form, my idea wasn’t a clean set of claims about the world, which I use to make decisions about what to build. Instead, they got tangled with my values: I like my ideas; they are mine, after all. If I put in the work to untangle my model of reality with these emotions and values, accepting that it might feel bad, I can more directly apply the evidence and models he presents to my idea.
In this model, humans do not have cleanly separated values, world models, and decision theory! One method of dealing with this is explicitly implementing locally consistent beliefs and values and a decision theory based on them. This implementation is limited: it is, at best, locally consistent and takes time and energy to create.
Latest Articles
Dec 2, 2024
Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
An overview of Conjecture's approach to "Cognitive Software," and our build path towards a good future.
Feb 24, 2024
Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes
Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes
The following are the summary and transcript of a discussion between Paul Christiano (ARC) and Gabriel Alfour, hereafter GA (Conjecture), which took place on December 11, 2022 on Slack. It was held as part of a series of discussions between Conjecture and people from other organizations in the AGI and alignment field. See our retrospective on the Discussions for more information about the project and the format.
Feb 15, 2024
Conjecture: 2 Years
Conjecture: 2 Years
It has been 2 years since a group of hackers and idealists from across the globe gathered into a tiny, oxygen-deprived coworking space in downtown London with one goal in mind: Make the future go well, for everybody. And so, Conjecture was born.
Sign up to receive our newsletter and
updates on products and services.
Sign up to receive our newsletter and updates on products and services.
Sign up to receive our newsletter and updates on products and services.
Sign Up