# Sanitize on input, escape on output: the WordPress rule behind XSS

This is the first post in a series about the security of AI-generated WordPress code: not how people write plugins, but what a coding assistant hands back when you ask it for one, and whether that output is safe to ship. Before judging any of that, we need a yardstick. To tell whether AI-written code is safe, you first have to know what safe output is made of.

So the series starts with two WordPress functions that decide whether output is safe, and that constantly get mistaken for each other: `sanitize_text_field()` and `esc_html()`. They read like two names for the same idea. They are not. They do opposite things, at opposite ends of a request, and mixing them up is one of the most common ways a cross-site scripting (XSS) hole gets into a plugin.

Here is the whole difference on one screen. The same messy string, through both functions:

```plaintext
input    : Loved it & "highly recommend" <script>alert(1)</script> <b>5/5</b>
sanitize : Loved it & "highly recommend" 5/5
esc_html : Loved it &amp; &quot;highly recommend&quot; &lt;script&gt;alert(1)&lt;/script&gt; &lt;b&gt;5/5&lt;/b&gt;
```

One of them threw data away. The other kept all of it. That is the whole lesson. The rest is just the why.

## sanitize\_text\_field(): cleans input

`sanitize_text_field()` is for data coming **in**. The WordPress documentation describes it as a function that "sanitizes a string from user input or from the database". It strips all tags, removes line breaks and extra whitespace, checks for invalid UTF-8, and drops percent-encoded characters. What comes back is a plain, trimmed line of text.

Look at the example again: the `<script>` and `<b>` tags are gone, and the stray spaces are trimmed. The string itself was changed. That is the point of it. You run it when a value arrives (from a form, a URL, an API) and before you store or use it.

## esc\_html(): encodes output

`esc_html()` is for data going **out**. The docs call it "escaping for HTML blocks". It removes nothing. It encodes the characters that carry meaning in HTML: `&` becomes `&amp;`, `<` becomes `&lt;`, `>` becomes `&gt;`, `"` becomes `&quot;`.

In the example nothing was deleted. The `<script>` is still there, written as `&lt;script&gt;`, so the browser prints it as plain text instead of running it. You call `esc_html()` at the moment you echo a value into a page.

Here is the same review on a WordPress page, printed straight from the comment box. The same value, two ways: printed raw the script runs, through `esc_html()` it shows up as plain text, tags and all.

![The same review on a WordPress page printed two ways. Without escaping the script runs and injects a visible badge; through esc_html() the value is shown as plain text, tags and all.](https://cdn.hashnode.com/uploads/covers/6a3e269bd57f12e314f16a11/53bb1af1-9227-4e74-91e9-05be5ef07099.png align="center")

None of this is theoretical. Swap that script for the textbook one, `<script>alert(1)</script>`, and on a real page it runs:

![A WordPress page printing the review without escaping: the browser executes the script and a JavaScript alert box appears.](https://cdn.hashnode.com/uploads/covers/6a3e269bd57f12e314f16a11/d630dd2d-071d-41c2-9c14-d30ada4a918c.png align="center")

## The rule: sanitize on input, escape on output

That line is worth memorizing. Sanitizing and escaping are not two flavors of one step. They are two stages:

*   Sanitize when data comes in, before you store or use it.
    
*   Escape when data goes out, at the moment you print it.
    

And here is the trap. Sanitizing on input is **not** a substitute for escaping on output. It feels like it should be. It is not, for two reasons:

*   Not every value passes through your sanitizer. Plenty of data reaches your output straight from the database, from another plugin, or from code written before that sanitizer existed.
    
*   Sanitizing is lossy. You cannot strip the tags out of content that is meant to contain formatting.
    

Escaping on output is the one step that always applies, whatever the source. It is the last line of defense, and the one that holds.

## One level under the hood

If you want the mechanical reason these behave so differently, it is right there in WordPress core. `esc_html()` ends up calling PHP's `htmlspecialchars()`, which **encodes** characters into entities. `sanitize_text_field()` ends up calling `strip_tags()` plus a few regular expressions, which **delete** content. Same goal of safety, opposite mechanics: one transforms the string, the other strips it down.

## Takeaway

Here is the one habit worth keeping: escape every dynamic value at the moment you output it, and make that the step you never skip. Sanitizing input is good hygiene on top, not a replacement for it.

With these two straight, the next post turns them on real code: I asked a coding assistant to build WordPress features, then read what it did with exactly these functions. I am working through the security of AI-generated WordPress code in the open, one piece at a time, and if something here is wrong or thin, I would rather hear it than keep repeating it.

* * *

*Sources: the WordPress developer reference for* [`sanitize_text_field()`](https://developer.wordpress.org/reference/functions/sanitize_text_field/) *and* [`esc_html()`](https://developer.wordpress.org/reference/functions/esc_html/)*, and* `wp-includes/formatting.php` *in WordPress core.*
