“Plans are worthless, but planning is everything.” - Dwight D. Eisenhower
In most fields it is possible to just iteratively solve problems, without much of a long-term plan. In fact, this can be the best strategy as your results will frequently be unexpected. Managing risk from advanced AI is not like most fields. There are two big problems with taking a merely iterative approach. First, while there is a lot more shape than there used to be, the field is still somewhat ‘pre-paradigmatic’. Amongst other things this means there are almost as many theories as there are theorists. Many arguments are had over what types of research are actually useful (and even what is actively harmful), and it isn’t clear that separate agendas particularly complement each other (or ‘stack’). Second, the approach of trying some stuff, seeing what fails, and then patching the failures, does not work if the failure modes are potentially existential. This seems to suggest some kind of long-term plan is necessary, albeit one that is lightly held.
More generally in science, the best work is done when theory and experiment are feeding off one-another. If you want to improve your model of the world, you have query it through experiment. But you need theory to process that information, as well as to decide what experiment to do in the first place (and what to do next). It doesn’t have to be rigorous theory, although this sometimes helps, but it does need to provide structure.
My plan is to work through a generating process for research. This will operate as a feedback loop to guide my work such that it is both experimentally testable and keeps long-term considerations in mind. It looks to balance tinkering and planning, tactics and strategy. Doing this has the added bonus of forcing me to clarify a lot of my assumptions, aiding both myself and others in understanding exactly what I am trying to achieve. And it also acts as a systematic entry point to what is a frequently bewildering problem.
The process proceeds through an indefinite feedback loop of 8 steps. Each refers to a particular thing that needs to be defined, sketched out, or implemented, which together provide a skeleton for the research agenda.
The steps are:
The Problem
What I Believe
What Success Looks Like
Where We Are Now
Theoretical Solution
Minimum Viable Experiment
Experimental Results
What Was Learnt?
Now, I will give a bit more detail about each.
1. The Problem
Briefly describe the actual problem you are trying to solve. Why are we even here going through a step-by-step plan? This should be succint and to the point, as detail is for later steps.
2. What I Believe
List your key background assumptions pertaining to the Problem. Most importantly, you should concentrate on anything non-standard and load-bearing. Too many discussions of AI risk descend into people talking past each other, so being upfront setting some context is important. This exercise is also great for clarifying your thoughts.
3. What Success Looks Like
Describe what you see a successful end state looking like in as much detail as seems sensible. Predicting the future is hard, but it is crucial to give it your best shot if you are not going to simply iterate into the dark. This step grounds everything that comes later – it is a kind of mission statement – and has the added benefit of making your goals legible to others (which should make you easier to collaborate with or give constructive feedback to).
4. Where We Are Now
Give an overview of those aspects of the current situation that seem most relevant to you (together, steps 3 and 4 form a more detailed view of the Problem: ‘we are here, but we need to be there’). Clearly, too much is going on to say everything, but it is important to state the things that particularly need fixing. As with (3), this should make you easier to collaborate with as other people get to see what you believe are the most significant issues.
5. Theoretical Solution
Given where you are and where you want to go, how in theory are you going to get there? Basically, given your current world model, state what sequence of steps would be required. Do not worry at this stage whether you actually know how to do any of them (although it is of course good if you do). This can be formal or informal as seems appropriate.
6. Minimum Viable Experiment
Identify the parts of your Theoretical Solution that most require experimental validation. Or, phrased from the other direction: what experiments can teach you the most about it? Develop a tractable experimental plan to do this.
7. Experimental Results
Do the planned experiment and report back on the results.
8. What Was Learnt?
Aggregate and summarise what important lessons were learnt from your iteration through the loop, and in particular from your Minimum Viable Experiment.
Now, we feed our new knowledge back into the start of the process, redoing steps 1-5 asking what has changed. It is important to record these changes for the same reasons as a forecaster keeping score. It highlights what parts of your model are working and what keeps getting revised. Then, we construct a new Minimum Viable Experiment and go again.
Now that’s all laid out, onto step 1!
If you have any feedback, please leave a comment. Or, if you wish to give it anonymously, fill out my feedback form. Thanks!