News Deep Dive Opinion Research Data Resources Events About

OpenAI Releases Privacy Filter: A 1.5B Open-Weight Model for PII Detection

OpenAI released Privacy Filter, a 1.5B parameter bidirectional token-classification model for detecting and redacting personally identifiable information. Released under Apache 2.0 on HuggingFace and GitHub.

OpenAI on April 22, 2026, released Privacy Filter, a lightweight open-weight model designed to detect and redact personally identifiable information (PII) in unstructured text1. The model is positioned as a building block for privacy-by-design workflows, enabling developers to sanitize training data, knowledge-base indexes, system logs, and user-facing content without sending sensitive text to third-party APIs1.

Technical Architecture

Privacy Filter is a bidirectional token-classification model with span decoding1. It starts from an autoregressive pretrained checkpoint and replaces the language-modeling head with a token-classification head, then post-trains with a supervised classification objective1. Unlike a generative LLM, it does not produce text token by token; instead, it labels an entire input sequence in a single forward pass and decodes coherent spans using a constrained Viterbi procedure1.

The model has 1.5 billion total parameters with 50 million active parameters1, making it practical to run in production pipelines with modest compute. It supports a 128,000-token context window1, allowing it to process long documents in a single pass.

Detection Taxonomy

The model classifies tokens into an 8-label privacy taxonomy1:

LabelCoverage
private_personPersonal identity (names, usernames)
private_addressPhysical addresses
private_emailEmail addresses
private_phonePhone numbers
private_urlPersonal URLs
private_datePersonal dates (birthdays, anniversaries)
account_numberFinancial account numbers, credit cards
secretSecrets (passwords, API keys)

Redaction output replaces detected spans with their label in uppercase (e.g., [PRIVATE_PERSON], [ACCOUNT_NUMBER]) to preserve structural information while removing sensitive content1.

Performance

On the PI**I-Masking-300k benchmark, Privacy Filter achieves an F1 of 96% (precision 94.04%, recall 98.04%)1. After correcting annotation errors in the benchmark, the F1 rises to 97.43% (precision 96.79%, recall 98.08%), placing it at state-of-the-art on this benchmark1.

The model is designed to be domain-adaptable: with a small amount of domain-specific labeled data for fine-tuning, the F1 on a target domain can jump from 54% to 96%1.

Availability and Licensing

Privacy Filter is released under the Apache 2.0 license1. It is available immediately from:

  • HuggingFace: openai/privacy-filter1
  • GitHub: openai/privacy-filter1
  • Model Card: a detailed PDF covering architecture, label schema, decoding rules, intended use cases, evaluation setup, and known limitations1

OpenAI reports that it has already deployed a fine-tuned version of Privacy Filter internally within its own privacy workflows1.

Limitations

OpenAI explicitly states that Privacy Filter is not a compliance certification tool and does not replace domain-specific policy review or human oversight in high-sensitivity contexts (legal, healthcare, finance)1. Detection performance depends on the training label schema and decision boundaries; organizations with privacy policies that differ materially from the training distribution may need additional domain evaluation or fine-tuning1.

The model may miss uncommon identifiers or ambiguous personal references, and in low-context settings (especially short text sequences), it may over-redact or under-redact1.

Significance

Privacy Filter represents OpenAI’s effort to contribute practical privacy infrastructure to the open-source community. By releasing a capable, lightweight, Apache-licensed PII detection model, OpenAI lowers the barrier for developers building privacy-respecting AI applications — particularly those who cannot or will not send user data to third-party APIs for sanitization.

The 50M active parameter count is notably small, making the model viable for high-throughput production pipelines where latency and cost matter. The bidirectional architecture (unusual for a model derived from an autoregressive pretrained checkpoint) demonstrates a deliberate design choice: classification accuracy benefits from bidirectional context, even when the base model is autoregressive.

References

Footnotes

  1. OpenAI Official Blog — Introducing OpenAI Privacy Filter, model architecture, performance data, taxonomy, availability, and limitations.
    https://openai.com/index/introducing-openai-privacy-filter/ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20