Below you will find data on the Plurarity Public Comment on U.S. AI Safety Institute Guidance. Public Commenting is an important opportunity to have a voice on the topic at hand, and essential to providing input in the development of effective rules and regulations that serve the community.
The U.S. AI Safety Institute’s guidelines for managing AI misuse risks are commendable, especially their focus on mitigating risks before deployment. The principles in Objectives 5 and 6, which emphasize both pre-deployment safety and ongoing post-deployment monitoring, are particularly strong. The recommendations for independent third-party safety research, external safety testing, and internal reporting protections are also welcomed. Overall, the draft guidance offers a solid foundation for ensuring the safety of dual-use AI models for the public.
We commend the leadership of the U.S. AI Safety Institute for establishing guidelines for managing misuse risk and, importantly, for its bedrock principle that risk be properly managed and mitigated before AI deployment. This is a positive step forward for AI safety and a promising direction for the development standards which should exist in the field. In particular, we felt the recommendations in Objectives 5 and 6 were especially strong. We appreciated the recognition of the full lifecycle of AI to ensure safety – not only considering the pre-deployment stages of AI development for safety, but also the emphasis on post-deployment monitoring and response. We agree with the need to provide safe harbors for independent third-party safety research, the need to establish a robust regime of both external safety testing of models as well as protections for internal reporting of safety concerns, and the creation of other internal processes and norms that will set an organization up for success. We believe that this draft guidance is a strong starting point for guidelines needed to ensure that dual-use foundation models are as safe as possible for the public.
Below, we offer a few suggestions for your consideration in revision which we believe will further strengthen the guidance:
Consider open-source models.
Many leading companies today are releasing open-source AI models (e.g. Meta’s Llama 3), which have already seen rapid adoption. This guidance does not seem to adequately address the open-source approach to model deployment. For example, mentions of model theft (Objective 3) would not be as relevant. If open-source development is purposely out-of-scope, we believe it would be helpful to add more information addressing this in the Scope and Key Challenges sections.
Too much is left up to developers’ own risk threshold and determinations.
While the guidance is understandably written to be flexible, we worry that far too much deference would be left up to developers’ own risk thresholds without clear guidance to standardize or categorize the degree of risk, such as in Objective 2 Practice 2.1, Objective 3 Practice 3.2, and Objective 5 Practice 5.3. An organization’s own interpretation of the risks of its AI models is very subjective, and history suggests that market incentives will drive many developers to underrate the safety risks of its AI systems for the public. To address this, we recommend further defining guidance for these acceptable risk thresholds for developers to follow.
The guidance seems to assume a high degree of internal capacity and expertise among model developers to understand and parse dual-use risks to society. How will such developers, especially lesser-resourced ones but even the largest companies and labs, acquire and leverage multifaceted risk expertise to ascertain what kinds and degrees of risk their models pose to individuals and society? If they are making use of external expertise, how can the public trust and validate that external expertise? This concern applies to most of the objectives in the draft guidance. We recommend the guidance advise model developers on how best to find, requisition and make use of threat/risk expertise to pressure test their models.
Create clear, transparent, and public deployment criteria to guide decision making.
Defined criteria and thresholds, established in the early phases of a project, should be used to make decisions that either justify deployment or are used as a “tripwire” which can block or rollback a deployment. While this type of consideration is mentioned in Objective 5 Practice 5.3, we believe it should be strengthened, and that clear, transparent, and public criteria – aligned with risk thresholds and mitigation processes – should be established by AI developers well in advance to guide deployment decisions.
Perhaps a flowchart-style decision tree (e.g. Frontier AI Regulation Blueprint) for these objectives would be helpful so that “stop, go back” steps might be included when deployment criteria are not met.
Cultivate internal incentive and cultures and norms for anticipating, reporting and mitigating safety risks upstream in the development process.
We appreciate including recommendations for creating incentive systems, such as in Objective 6 Practice 6.5, but believe the draft guidance would benefit from additional measures and a strong overall call for developers to institute robust internal incentives systems and cultures of addressing safety risks. Besides strengthening and expanding the bounty program, perhaps there are rewards for employees who identify and report safety issues, channels of direct communication established to company leaders, and other mechanisms that encourage raising concerns in order to align all actors with identifying and mitigating risks early and often.
We support Objective 6 Practice 6.3 establishing protections for internal reporting, but would recommend moving this to the pre-deployment stage, as well as including documentation about how the whistleblower protections have been communicated to staff. It is important that employees clearly understand their protections, and documenting communications encourages companies to be more transparent and forthright with their employees about such protections.
Include rapid incident reporting to relevant authorities.
As this is such a quickly evolving space, information sharing will be critical in refining the ability to identify and respond to threats. In Objective 7, if a misuse issue does occur, we would also recommend adding some reporting, escalation, or notification to relevant authorities and partners, as well as the U.S. AI Safety Institute. We would also encourage sharing findings back with the developer of the proxy model used. Generally, additional guidance on how information should flow, and to whom, would be incredibly helpful in strengthening this iterative process.
Add direct references to help operationalize this guidance where possible.
We think this guidance could be strengthened by a few direct references of common benchmarks (Objective 1 Practice 1.3), proxy models one might use (Objective 4), and case studies– perhaps using a referenced proxy model– of the suggested threat models and impact assessments.
The comments provided are from members of our research community. For any additional information, please reach out to Sarah Hubbard (sarah_hubbard@hks.harvard.edu).
The year 2024 was dubbed “the largest election year in global history” with half the world’s population voting in national elections. Earlier this year, we hosted an event on AI and the 2024 Elections where scholars spoke about the potential influence of artificial intelligence on the election cycle– from misinformation to threats on election infrastructure. This webinar offered a reflection and exploration of the impacts of technology on the 2024 election landscape.
Earlier this year, the Allen Lab for Democracy Renovation hosted a convening on the Political Economy of AI. This collection of essays from leading scholars and experts raise critical questions surrounding power, governance, and democracy as they consider how technology can better serve the public interest.
As a part of the Allen Lab’s Political Economy of AI Essay Collection, David Gray Widder and Mar Hicks draw on the history of tech hype cycles to warn against the harmful effects of the current generative AI bubble.
The year 2024 was dubbed “the largest election year in global history” with half the world’s population voting in national elections. Earlier this year, we hosted an event on AI and the 2024 Elections where scholars spoke about the potential influence of artificial intelligence on the election cycle– from misinformation to threats on election infrastructure. This webinar offered a reflection and exploration of the impacts of technology on the 2024 election landscape.
Earlier this year, the Allen Lab for Democracy Renovation hosted a convening on the Political Economy of AI. This collection of essays from leading scholars and experts raise critical questions surrounding power, governance, and democracy as they consider how technology can better serve the public interest.