This is an idea I had for my capstone project on EA Cambridge’s AGI Safety Fundamentals (governance track) course. I am sharing live progress here for better accountability and feedback.
It is of paramount importance that developers of large AI models are fully aware of the dangers of large misaligned models (which is perhaps the greatest source existential of risk facing humanity this century). My hope is that if we could ensure that all developers of large models had both a deep understanding of these risks, and that these risks were salient in their minds whilst working, then the chance of building a misaligned human level artificial intelligence would be reduced, and the chance of existential catastrophe would accordingly be reduced.
Summary of the theories of impact
- AI developers adopt safer practices in their work
- AI developers are empowered to whistleblow on dangerous/reckless projects
- AI developers value working for safer companies/organisations, and companies competing for talent will recognise this
- Occupational licensing and regulation discourages the supply of AI developers, causing progress to slow
- Katja Grace’s survey (2017) https://arxiv.org/pdf/1705.08807.pdf
- GCRI survey (2020) https://gcrinstitute.org/papers/055_agi-2020.pdf
The main result here is a distribution of responses for subjective probabilities of different outcomes. This is only really useful for my argument if there is a ‘true’ benchmark to compare against. e.g. if a respondent puts a 0.1% chance on ‘catastrophically bad outcome’, is this evidence they don’t care enough about safety? What does ‘enough’ mean? Is there an ideal amount we want researchers to care about safety?
The GCRI survey identified 72 AGI projects in 2020 versus 45 in 2017. About half of projects were based in the US, which probably suggests that US-based regulation should be a priority.
Many of the projects had no stated emphasis or awareness of safety and some were actively dismissive of safety concerns.
The main issue with interpreting this survey is trying to weight projects by how likely they are to succeed. Presumably how much DeepMind cares about safety matters substantially more than a small university project in Finland
Another issue is that dangerous AGI might not come from an organisation explicitly trying to build an AGI - it might just be a very capable narrow AI model
- Short answer: I don’t know, because this is quite a technical question and no one knows exactly how AGI will happen anyway
- A more tractable question: are there robustly good practices that developers of large AI models could adopt that reduce the risk of catastrophic outcomes?
“slowing down AI development is robustly good”
https://www.nber.org/papers/w25262 (2018) - reduces employment in equillibrium by 17-27%. pretty good methodology (boundary discontinuities in US states)
- issues with external validity: AI sector is probably a lot more fast growing and lucrative and international than the industries surveyed in these papers.
- All these studies show increased wages too (due to reduced supply)
- Another theory of change: labour is less mobile
- industry self regulation? https://openai.com/blog/cooperation-on-safety/
- global vs national?
- existing vs new regulatory body?
(maybe look at institute nuclear of power operators)
Hunch: probably national for now
Hunch: new body probably needed eventually.
- To whom should it apply?
- Developers writing or designing code to work on models of (>X parameters) or with access to compute of (>X flops), whether private/corporate/academic/nonprofit
- & their managers
- university compsci syllabuses
- How burdensome can regulators be before companies seek to escape their jurisdiction?
- How do countries coordinate on regulations to avoid races to the bottom?
- What can we learn from other industries? Which are the best comparisons? What do analogous regulations look like?
- some bio links: https://www.knaw.nl/en/news/publications/a-code-of-conduct-for-biosecurity/@@download/pdf_file/20071092.pdf and https://www.ukhsa-protectionservices.org.uk/nadp/courses/
- What should be taught (and examined) to practitioners, and how often should it be refreshed?
- Robustly good development principles: infosec, verifiability, transparency
- Awareness arguments for misalignment risk, frontiers in safety research and possible methods of solving some of these problems
- other burdensome regulation: background checks, ethics/values test (could this backfire?)
LH: thinks managers matter more than developers (they have the profit max motive). also leverage - fewer people to influence/regulate. ’human crunch time’ - a developer comes across a big capable AI. who makes the decision to let it wild? These are the people who would presumably matter most.
Ideally I’d want something burdensome enough that fewer people work on AI
- How to make it happen
- Politics and history of how industries become regulated. specifically with occupational licensing.
- The Precipice, Toby Ord, 2020