Prevalence

UPDATED

JUL 29, 2021

Our goal is to minimize the impact caused by violations of our policies on people using our services. We measure prevalence of violating content to gauge how we’re performing against that goal.

What is prevalence

Prevalence considers all the views of content on Facebook or Instagram and measures the estimated percentage of those views that were of violating content. (Learn more about how we define views in “Why we measure the prevalence of views.") This metric assumes that the impact caused by violating content is proportional to the number of times that content is viewed.

Another way to think of prevalence is how many views of violating content we didn’t prevent — either because we haven’t caught the violations early enough or we missed them altogether.

How we measure prevalence

Prevalence of violating content is estimated using samples of content views from across Facebook or Instagram. We calculate it as: the estimated number of views that showed violating content, divided by the estimated number of total content views on Facebook or Instagram. If the prevalence of adult nudity and sexual activity was 0.18% to 0.20%, that would mean of every 10,000 content views, 18 to 20 on average were of content that violated our standards for adult nudity and sexual activity.

1 DOT = 10 VIEWS

10,000 TOTAL VIEWS

20 VIOLATING CONTENT VIEWS

If prevalence was 0.20%, that means for every 10,000 views, 20 views were of violating content. While numbers can be very low, even the smallest number can cause significant impact to people.

Some types of violations occur very infrequently on our services. The likelihood that people view content that violate them is very low, and we remove much of that content before people see it. As a result, many times we do not find enough violating samples to precisely estimate prevalence. In these cases, we can estimate an upper limit of how often someone would see content that violates these policies. For example, if the upper limit for terrorist propaganda was 0.04%, that means that out of every 10,000 views on Facebook or Instagram in that time period, we estimate that no more than 4 of those views contained content that violated our terrorist propaganda policy.

It’s important to note that when the prevalence of a violation type is so low that we can only provide upper limits, this limit may change by a few hundredths of a percentage point between reporting periods. However, changes this small may not be statistically significant; in such cases, these small changes do not indicate an actual difference in the prevalence of this violating content on the service.

Why we measure the prevalence of views

We estimate how often content is seen rather than the amount of content posted because we want to determine how much that content affected people on Facebook or Instagram. A piece of violating content could be published once but seen 1,000 times, 1 million times or not at all. Measuring views of violating content rather than the amount of violating content published better reflects the impact on the community. A small prevalence number can still correspond to a large amount of impact on our services, due to the large number of overall views of content on our services.

We record a content view when a piece of content appears on a user’s screen. Specifically, a view happens when someone:

  • Views a post – even if there are multiple pieces of content in that post, the view is assigned to the post

  • Clicks to enlarge a photo or video player – the view is assigned to the photo or video

How we use sampling to estimate prevalence

We estimate prevalence by sampling content views on Facebook or Instagram.

To do this, we manually review samples of views and the content shown in them. Then we label the samples as violating or not violating according to our policies. The teams who do this sampling review the entire post for violations, even if the sampled view didn’t expose all the content in the post.

Using the portion of these samples that were of violating content, we estimate the percentage of all views that were of violating content. Note that we do not sample from every part of Facebook or Instagram for every violation type.

For certain violation types, we use stratified sampling, which increases the sample rate if the context indicates the content view is more likely to contain a violation. For example, if violations were viewed more frequently in Groups than in News Feed, we would sample views in Groups with a higher probability than we sample views in News Feed. One reason we do this is to reduce the uncertainty due to sampling. We express this uncertainty by quoting a range of values, for example by saying 18 to 20 out of every 10,000 views are on violations for adult nudity and sexual activity. This range reflects a 95% confidence window. This means that if we performed this measurement 100 times using different samples each time, we expect the true number to lie within the range 95 out of the 100 times.

For violation types that are viewed very infrequently, sampling requires a very large number of content samples to estimate a precise prevalence measure. For these types of violations, rather than use stratified sampling, we do random sampling. In these cases, we can only estimate the upper limit — meaning, we are confident that the prevalence of violating views is below that limit, but we cannot precisely say how far below. Our confidence window for these upper limits is also 95%. how often content is seen rather than the amount of content posted because we want to determine how much that content affected people on Facebook or Instagram. A piece of violating content could be published once but seen 1,000 times, 1 million times or not at all. Measuring views of violating content rather than the amount of violating content published better reflects the impact on the community. A small prevalence number can still correspond to a large amount of impact on our services, due to the large number of overall views of content on our services.

Caveats

The people who apply labels to our samples sometimes make mistakes, including labeling violations as non-violating or vice versa. The relative rate of these mistakes could impact the prevalence measurement. We use audits to measure errors and then adjust the prevalence calculation to account for it. For areas such as violent and graphic content, where we cover posts that may be disturbing to some audiences, our prevalence calculation only accounts for views of that content before the cover was added.

Prevalence for fake accounts on Facebook

Prevalence for fake accounts on Facebook is an estimate of the percentage of monthly active Facebook accounts that were fake. Unlike prevalence for content violations, fake accounts prevalence assumes the impact on users is proportional to the number of active fake accounts on Facebook, even if people don’t ever see or experience these accounts.

To estimate the prevalence of fake accounts, we sample monthly active users and label them as fake or not. We define a monthly active user (MAU) as a registered Facebook user who logged in and visited Facebook through our website or a mobile device, or used our Messenger application (and is also a registered Facebook user), in the last 30 days as of the date of measurement.