Overview of Meta AI safety policies prepared for the UK AI Safety Summit

UPDATED

OCT 20, 2023

RESPONSIBLE CAPABILITY SCALING

Meta has deep experience and commitment in building state of the art AI systems and products with responsibility and safety as a priority. Our Fundamental AI Research (FAIR) team has spent the past decade working to develop safer AI systems, and we continue to invest in our Responsible AI Team to support industry leading work around issues such as privacy, fairness, accountability, and transparency. We built upon this strong foundation when developing our approach for recent releases, including our ecosystem of Llama models and new generative AI experiences.

We prioritised safety and responsibility throughout, and produced numerous artefacts (see Annex I) which we believe demonstrate how companies can build and release both innovative open source and consumer AI products with safety in mind.

In relation to frontier AI systems, we have signed up to the White House Commitments. We believe the commitments are an important first step in ensuring responsible guardrails are established for AI as we look towards a future with increasingly capable systems. We joined these commitments because they represent an emerging industry-wide consensus around the things that we have been building into our products for years. Our five pillars of responsible AI have been core to our own development of AI for years, and the White House AI commitments’ themes of safety, security, and trust, reflect those same values. These commitments, built with input from leading AI companies across industry, in close collaboration with the White House, strike a reasonable balance of addressing today’s concerns and the potential risks of the future.

Responsible AI work is iterative – better solutions are developed, and new challenges emerge, requiring us to continuously adapt and innovate. As demonstrated by recent discussions on frontier AI systems, the AI community is now looking towards the next generation of AI technologies. The AI Safety Summit will be an excellent opportunity for us to continue to explore the adaptations and innovations that our community may need to develop in the coming years. We are excited to be a part of that process, and look forward to more of these discussions at the AISS.

MODEL EVALUATIONS AND RED TEAMING

Our recent public launches of Llama 2 and some of our new consumer AI products announced in October as part of our Connect developer conference, highlight our thoughtful and careful approach to AI development. As part of the Llama 2 launch, we released both a Responsible Use Guide and the Llama 2 research paper. The research paper includes extensive information about how we fine-tuned Llama 2, and the benchmarks we evaluated the model’s performance against. The research papers that we released alongside our recent announcement of new generative AI features also include such sections (see Annex 1), as does our report on Building Generative AI Responsibly.

The Llama 2 paper goes into great depth about the steps we took to identify, evaluate and mitigate risks in our model. The risk categories that we considered can be broadly divided into the following three categories:

  • Illicit and criminal activities (e.g., terrorism, theft, human trafficking);

  • Hateful and harmful activities (e.g., defamation, self harm, eating disorders, discrimination); and

  • Unqualified advice (e.g., medical advice, financial advice, legal advice).

And for each of those risk categories, we explored numerous potential attack vectors. For example:

  • Psychological manipulation (e.g., authority manipulation),

  • Logic manipulation (e.g., false premises),

  • Syntactic manipulation (e.g., misspelling),

  • Semantic manipulation (e.g., metaphor),

  • Perspective manipulation (e.g., role playing),

  • Non-English languages, and others.

The paper outlines different techniques that we used to fine-tune for safety, with examples of the improvements that fine-tuning delivered. We also provide further guidance on responsible fine-tuning in the Responsible Use Guide that we published to support developers building with Llama 2, including detailed safety alignment efforts and evaluation results.

One key element of testing Llama 2 was our significant investments in red teaming with various groups of internal employees, contract workers, and external vendors. The red teamers probed our models across a wide range of risk categories (such as criminal planning, human trafficking, regulated or controlled substances, sexually explicit content, unqualified health or financial advice, privacy violations, and more), as well as different attack vectors (such as hypothetical questions, malformed/misspelt inputs, or extended dialogues). Additionally, we conducted specific tests to determine the capabilities of our models to facilitate the production of weapons (e.g. nuclear, biological, chemical, and cyber); findings on these topics were marginal and were mitigated.

Our red teaming efforts for Llama 2 also included external red teaming efforts. We submitted our model to the DEFCON convention in Las Vegas this August alongside other companies like Anthropic, Google, Hugging Face, Stability, and OpenAI where over 2500 hackers analysed and stress tested their capabilities - making this - to date - the largest public red-teaming event for AI. Our extensive testing through both internal and external red teaming is continuing to help improve our AI work across Meta.

MODEL REPORTING AND INFORMATION SHARING

As mentioned above, in relation to frontier AI systems, in July of this year, we signed up to the White House Commitments. We believe the commitments are an important first step in ensuring responsible guardrails are established for AI as we look towards a future with increasingly capable systems. We joined these commitments because they represent an emerging industry-wide consensus around the things that we have been building into our products for years. Commitment 2 specifically addresses information sharing, where companies commit to “Work toward information sharing among companies and governments regarding trust and safety risks, dangerous or emergent capabilities, and attempts to circumvent safeguards.”

When we released Llama 2, we developed and shared a number of accompanying artefacts in order to share information about how we developed the model and help developers, researchers, and policymakers better understand the capabilities and limitations. These artefacts were designed to provide more information about the process of developing Llama 2 and the steps we took to do so responsibly. In our licence terms and acceptable use policy we included a list of uses which we prohibit the use of Llama for. We also provided guidance to developers about how to fine-tune the models for safety as they adapt them to their use cases. Below is a list of of these artefacts, and a short summary of each one:

The Llama 2 launch site

This is the announcement of our release of Llama 2. It includes information about the model, our partnerships, and how we’ve prioritised responsibility. This page also includes the link to request to download the model, as well as links to a number of other resources, including this newsroom post we published, which explains more about why we believe opening up access to Llama is such an important step.

Responsible Use Guide

The Responsible Use Guide provides an overview of how Llama 2 was developed, and has extensive information about the safety mitigations we’ve put in place.

We think this Guide will be useful to a range of audiences, including developers who want to use Llama, and policy professionals who want to learn more about the process of building an LLM with safety as a priority.

The Responsible Use Guide provides an overview of the responsible AI considerations that go into developing generative AI tools and the different mitigation points that exist for LLM-powered products. It explains how Llama 2 was developed in language that is accessible to non-experts, and we hope that policy professionals will find it an informative resource to learn about Llama 2, and the steps we have taken to develop it responsibly.

It also provides instructions on fine tuning, including best practices for building responsibly at each stage of development, with a focus on levels of data, fine-tuning, inputs (prompt engineering), outputs (evaluations, integrity), and user interactions. We would encourage all audiences that are interested in learning about fine-tuning, not just the developers who will follow these instructions, to read this section.

Licence and Acceptable Use Policy (available with each download)

We’re releasing Llama 2 under a bespoke commercial licence. We believe in open innovation, and we do not want to place undue restrictions on how others can use our model. However, we do want people to use it responsibly. To that end, we have created an Acceptable Use Policy that sets out prohibited use cases, which is incorporated by reference in the licence for the model. To see the list of prohibited use cases, see the Acceptable Use Policy.

Research Paper and Model Card

The research paper explains how the model was built, including how we worked to identify and mitigate risk, and some of the ways we improved the safety of the models through fine-tuning.

In addition to the Research Paper, we introduced a Llama 2 model card. Model cards are widely used in the AI research community, and are designed as a way to enable transparency by sharing relevant risks, and limitations of a model. Our Llama 2 model card provides details of the model’s architecture, its intended use cases, and how it was trained and evaluated. These resources are more technical than the responsible use guide, for example, and therefore may be better suited to audiences with some knowledge of AI and ML development.

For our more recent generative AI releases, we produced two new AI system cards – one for text generation and one for image generation – which are new additions to Meta’s existing set of 22 system cards. We have also provided resources to support people to understand the privacy safeguards in our new generative AI features and help them understand how the features work.

REPORTING STRUCTURE FOR VULNERABILITIES FOUND AFTER MODEL RELEASE

Even with extensive pre-release testing, we recognize the importance of continuing to collect feedback about the performance of our models after release. Feedback from our community is instrumental to the development of generative AI features on our technologies and provides the opportunity to shape future experiences. We include opportunities for people to provide feedback on their experience chatting with AIs at Meta or with AI images. Specifically, in-app feedback tools enable people to report responses or image outputs they consider unsafe or harmful. This feedback will be reviewed by humans to determine if our policies have been violated, and the results will be used in ongoing model training to improve safety and performance over time.

As mentioned above, in relation to frontier AI systems, in July of this year, we signed up to the White House Commitments. We believe the commitments are an important first step in ensuring responsible guardrails are established for AI as we look towards a future with increasingly capable systems. We joined these commitments because they represent an emerging industry-wide consensus around the things that we have been building into our products for years. Commitment 4 specifically addresses reporting for vulnerabilities, with companies agreeing to “establish[] for systems within scope bounty systems, contests, or prizes to incent the responsible disclosure of weaknesses, such as unsafe behaviours, or to include AI systems in their existing bug bounty programs.”

We offer a variety of channels for collecting feedback and bug reports, each appropriate and tailored to the specific issues. For example, for our release of Llama 2 we provided the following mechanisms for reporting:

In certain key areas we are taking additional steps. For example, we regularly consult with our Youth Advisory Council and Safety Advisory Council to help develop features that protect the safety and privacy of young people online. These Councils are made up of independent online and child safety experts, who provide expertise, perspective and insights that inform Meta's approach to safety. Our Family Center provides more information about how we consult advisors to develop age-appropriate safeguards and experiences. We will continue to work closely with parents, young people and youth safety, privacy and well-being experts as we develop generative AI features.

In addition to this, where appropriate, we at times release our models under a variety of open source licences. Over the past decade, we have released over a thousand models and frameworks under non-commercial licences. Our thousands of open source releases have helped us—as well as the broader community of developers—build safer and more robust systems. With thousands of open source contributors working to make open AI systems better, we can more quickly find potential risks in systems and work to mitigate them. Not every model or technology is appropriate for open source licences, but where it is, an open approach helps us draw upon the collective wisdom and ingenuity of the AI community, academics, civil society, nonprofits, and more, to both realise the benefits of AI technologies and improve our understanding of how to manage and mitigate the potential risks. An open approach allows for real collaboration with diverse stakeholders, helps ensure that models are performing in the ways that we as a society expect, and allows the community to iterate more quickly on safety and other quality improvements.

POST-DEPLOYMENT MONITORING FOR PATTERNS OF MISUSE

See response above.

SECURITY CONTROLS INCLUDING SECURING MODEL WEIGHTS

We perform training and evaluation of AI models ranging in sensitivity from purely open source research on public data to proprietary models used as part of consumer products. We utilise computing environments that are appropriately secured for the level of sensitivity of the project. These environments are actively managed, are physically secured, and are supported by a variety of insider threat detection programs corresponding to the level of risk.

As mentioned above, in relation to frontier AI systems, in July of this year, we signed up to the White House Commitments. We believe the commitments are an important first step in ensuring responsible guardrails are established for AI as we look towards a future with increasingly capable models. We joined these commitments because they represent an emerging industry-wide consensus around the things that we have been building into our products for years. Commitment 3 specifically addresses investments in “cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights.” For our unreleased AI model weights for frontier AI systems as defined in the White House Commitments, we are committed to treating them as core intellectual property for our business, especially with regards to cybersecurity and insider threat risks. As part of that, we will limit access to those model weights to those whose job function requires and will have in place a robust insider threat detection program consistent with protections provided for our most valuable intellectual property and trade secrets. In addition, for these models, we will work with the weights in an appropriately secure environment to reduce the risk of unapproved release.

IDENTIFIERS OF AI-GENEREATED MATERIAL

In the context of recent generative AI developments, we have worked with industry partners in the Partnership on AI (PAI) and are signatories to the Synthetic Media Framework, which is focused on how to responsibly develop, create, and share synthetic media: the audiovisual content often generated or modified by AI. This framework outlines practices for different actors along the AI value chain to enable them to identify and disclose when content is AI-generated, such as labels, and watermarks, amongst others.

We have also developed our own policies on AI-generated content. For our AIs and AI stickers, for example, we provide a clear notice so people know when they are interacting with an AI and can choose not to engage. We include visible indicators on photorealistic images generated by these experiences to help reduce the chances of people confusing these images with human-generated content. Examples of these indicators include a visible burnt-in watermark on content from the image generator built into our Meta AI assistant, and appropriate in-product measures for other generative AI features. This approach may evolve over time as we learn more about the needs of our community.

We are developing additional techniques to include information on the source of generated images, a concept known as provenance, as part of individual image files. Some of this work is reflected in the image generator built into our Meta AI assistant at launch, but we intend to expand to other experiences as the technology improves. We’re also working with other companies to standardise provenance signalling so it’s possible to provide additional context when images are distributed across different companies’ platforms.

In chats with AIs at Meta, people are able to access additional information about the AI, including how it generates content, the limitations of the AI, and how the data they have shared with the AI is used via in-product education. More information is available in Meta’s Help Center and Meta’s Privacy Center.

Our Generative AI system cards include information for consumers about Meta’s generative AI systems, including an overview of the system, a section describing how it works, an interactive demo, usage tips, information on data usage, and what to be aware of when using generative AI in Meta’s products. We also have a report on Building Generative AI Responsibly to provide additional transparency about our approach.

PRIORITY RESEARCH AND INVESTMENT ON SOCIETAL, SAFETY, AND SECURITY RISKS

Meta continues to make substantial investments in improving AI research, including interdisciplinary research on addressing AI risk. For example, Meta’s Open Loop is a global program that connects policymakers and technology companies to help develop effective and evidence-based policies around AI and other emerging technologies. Through experimental governance methods, Meta’s Open Loop members co-create policy prototypes and test new or existing approaches to policy, guidance frameworks, regulations, and laws. These multi-stakeholder efforts improve the quality of rulemaking processes by ensuring that new guidance and regulation aimed at emerging technology are effective and implementable.

Open Loop has been running theme-specific programs to operationalize trustworthy AI across multiple verticals, such as Transparency and Explainability in Singapore and Mexico, and Human Centred-AI with an emphasis on stakeholder engagement in India.

Meta’s Open Loop program is launching its first policy prototyping program in the United States, which is focused on the National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) 1.0 . The program will give consortium participants the opportunity to explore how the NIST AI RMF could help manage risks while developing and deploying Generative AI systems. At the same time, the program will seek to provide valuable insights and feedback to NIST as they work on future iterations of the RMF.

When NIST announced their AI RMF in January 2023, they included a call for suggestions from industry and the community to improve their playbook. The Open Loop program will bring AI participants together to explore how it is being used in practice by companies as a voluntary framework in the US.

We also launched a program for academic researchers (the Open Innovation AI Research Community) designed to foster collaboration and knowledge-sharing in the field of artificial intelligence. By joining this community, participants have the chance to contribute to a research agenda that addresses the most pressing challenges in the field, and work together to develop innovative solutions that promote responsible and safe AI practices. We believe that by bringing together diverse perspectives and expertise, we can accelerate the pace and progress in AI research.

This community fosters transparency, innovation, and collaboration. University partners explore topics related to privacy, safety, and security of large language models, give input into the refinement of foundational models and set an agenda for future collaborative research.

The group will become a community of practice championing large open-source foundation models where partners can collaborate and engage with each other, share learnings, and raise questions on how to build responsible and safe foundation models. They will also accelerate training of the next generation of researchers. We anticipate that the community will establish the conditions to build better quality future models by growing and diversifying the community of practitioners. We believe a community with partners from diverse geographies and institutions will create a set of positive dynamics to foster more robust and representative models.

Additionally, we recently participated in a workshop as part of UC Berkeley's project, AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models. The Berkeley Standards Profile intends to serve as a guide for developers seeking to apply the principles for governing, mapping, measuring, and managing risk contained in the U.S. National Institute of Standards and Technology (NIST) AI Risk Management Framework to the specific practice of Generative AI model development and deployment. The UC Berkeley researchers used our Llama 2 model to test and improve the Standards Profile, and they published their results in a draft report. Engaging with researchers in these settings, and receiving their feedback on our approach is enormously valuable, and helps us to improve our processes.

DATA INPUT CONTROLS AND AUDIT

Generative AI models take a large amount of data to effectively train, so a combination of sources are used for training, including information that’s publicly available online, licensed data and information from Meta’s products and services. For publicly available online information, we filtered the dataset to exclude certain websites that commonly share personal information about private individuals. Publicly shared posts from Instagram and Facebook – including photos and text – were part of the data used to train the generative AI models underlying the features we announced at Connect. We didn’t train these models using people’s private posts. We also do not use the content of your private messages with friends and family to train our AIs.

For example Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining or fine-tuning datasets include Meta user data.

We ran each dataset used to train Llama 2 through Meta’s standard privacy review process, which is a central part of developing new and updated products, services, and practices at Meta. Through this process, we identify potential privacy risks and develop mitigations for those risks. For example, in training Llama 2 we: (1) ensured that the training data excluded Meta user data; (2) excluded data from certain sites known to contain a high volume of information about private individuals [like directories].

We use the information people share when interacting with our generative AI features, such as Meta AI or businesses who use generative AI, to improve our products and for other purposes. You can read more about the data we collect and how we use your information in our new Privacy Guide, the Meta AI Terms of Service and our Privacy Policy. Additionally, our generative AI tools may retain and use information you share in a chat to provide more personalised responses or relevant information in that conversation, and we may share certain questions you ask with trusted partners, such as search providers, to give you more relevant, accurate, and up-to-date responses.

It’s important to know that we train and tune our generative AI models to limit the possibility of private information that you may share with generative AI features from appearing in responses to other people. We use automated technology and people to review interactions with our AI so we can, among other things, reduce the likelihood that models’ outputs would include someone’s personal information as well as improve model performance.

To give you more control, we’ve built in commands that allow you to delete information shared in any chat with an AI across Messenger, Instagram, or WhatsApp. For example you can delete your AI messages by typing “/reset-ai” in a conversation. Using a generative AI feature provided by Meta does not link your WhatsApp account information to your account information on Facebook, Instagram, or any other apps provided by Meta.

Additional Resources