The Curious Case of Windows-1258 Encoding Errors: When Outlook Speaks Vietnamese to French Emails
We initially dismissed the bug reports. A handful of customers were seeing
client_error responses when forwarding emails through CloudMailin—the telltale
sign of an email encoding error that asks customers to contact support. The
emails were marked as Windows-1258, the Vietnamese code page, but customers told
us they didn't have any Vietnamese users. The volume was tiny, so we set it
aside.
Then one user reported something that caught our attention: they had created the email themselves. They had no idea where Vietnamese encoding could possibly be coming from. We jumped on a call and investigated together.
The emails weren't Vietnamese at all—they were French.
What We Discovered
When users replied to or forwarded emails using Outlook 365, something strange was happening. Perfectly valid UTF-8 emails were being converted to Windows-1258, even when the content was entirely in French or Spanish. The accented characters still appeared correct, but the declared charset was wrong—and that charset problem matters more than you might think.
A Quick History of Character Encoding
Before Unicode became the standard, computers needed a way to represent text in different languages. The solution was code pages—lookup tables that mapped byte values to characters. Microsoft Windows introduced a family of these code pages, each designed for a specific region:
| Code Page | Name | Languages |
|---|---|---|
| Windows-1250 | Central European | Polish, Czech, Hungarian |
| Windows-1251 | Cyrillic | Russian, Bulgarian, Serbian |
| Windows-1252 | Western European | English, French, German, Spanish |
| Windows-1253 | Greek | Greek |
| Windows-1254 | Turkish | Turkish |
| Windows-1255 | Hebrew | Hebrew |
| Windows-1256 | Arabic | Arabic |
| Windows-1257 | Baltic | Latvian, Lithuanian |
| Windows-1258 | Vietnamese | Vietnamese |
Windows-1252 became the de facto standard for Western European languages and was so widely used that many systems treated it as synonymous with "Latin" or "ANSI" encoding.
Today, UTF-8 has largely replaced these code pages. It can represent any Unicode character and has become the universal standard for web and email. But legacy code pages still lurk in email systems, causing occasional headaches.
Windows-1252 vs Windows-1258: Spot the Difference
Here's where things get interesting. Windows-1252 and Windows-1258 are nearly identical. Both are single-byte encodings that share the same basic Latin character set. For most Western European characters, the byte values are the same:
| Character | Byte Value | Windows-1252 | Windows-1258 |
|---|---|---|---|
| é | 0xE9 | é (e acute) | é (e acute) |
| à | 0xE0 | à (a grave) | à (a grave) |
| ç | 0xE7 | ç (c cedilla) | ç (c cedilla) |
| ñ | 0xF1 | ñ (n tilde) | ñ (n tilde) |
This is why French text encoded as Windows-1258 still looks correct—the byte values for common accented characters happen to map to the same glyphs in both encodings.
The difference lies in the upper range, where Windows-1258 replaces some characters with Vietnamese-specific combining diacritics:
| Byte Value | Windows-1252 | Windows-1258 |
|---|---|---|
| 0xC3 | Ã (A tilde) | Ã (A tilde) |
| 0xCC | Ì (I grave) | ̀ (combining grave) |
| 0xD2 | Ò (O grave) | ̉ (combining hook above) |
| 0xEC | ì (i grave) | ̣ (combining dot below) |
Vietnamese requires more diacritical combinations than can fit in 128 code points, so Windows-1258 uses combining characters—a fundamentally different approach than Windows-1252.
The Outlook Bug in Action
Let's look at what happens when Outlook 365 processes an email. Here's the original message, sent from Gmail:
From: sender@gmail.com
Subject: =?utf-8?B?UsOpdW5pb24gZHUgMTEgZMOpY2VtYnJl?=
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
The email is properly encoded as UTF-8, with a French subject line ("Réunion du 11 décembre" - "Meeting on December 11th").
Now here's the same email after being forwarded through Outlook 365:
From: recipient@outlook.com
Subject: =?windows-1258?Q?FW:_R=E9union_du_11_d=E9cembre?=
Content-Type: text/plain; charset="windows-1258"
Content-Transfer-Encoding: quoted-printable
Outlook has:
- Converted from UTF-8 to a legacy code page
- Chosen Windows-1258 (Vietnamese) instead of Windows-1252 (Western European)
- Applied this to both the headers and body
The text still displays correctly because =E9 (the byte value for "é") happens
to work in both encodings. But the email now claims to be Vietnamese.
Why Email Parsers Care
You might wonder: if the text displays correctly, what's the problem? This email encoding bug might seem harmless, but it causes real issues downstream.
Email systems don't just display text—they process it. When an email declares
charset="windows-1258", downstream systems trust that declaration:
Search indexing may apply Vietnamese text analysis rules, breaking search for French terms.
Text extraction tools may attempt Vietnamese-specific character normalisation.
Webhook receivers parsing the email programmatically will decode using the wrong character map, potentially corrupting any characters that don't overlap between the encodings.
Archive systems may store the text with incorrect metadata, causing issues years later when someone searches for "réunion" and the system thinks it's looking at Vietnamese.
There's even a term for the garbled text that results from charset problems: mojibake (文字化け), from the Japanese for "character transformation." It's so common that it has its own Wikipedia page. When you see "Réunion" instead of "Réunion", you're looking at mojibake—and a charset mismatch is usually the culprit.
The Fix
At CloudMailin, we now handle this gracefully. Those client_error responses
that first alerted us to this bug are a thing of the past—we detect and process
these mismatched encodings without issue.
The problem however, still remains in Outlook. Outlook is still incorrectly encoding French and Spanish emails as Vietnamese.
What Causes This Bug?
We can only speculate, but there are really two mysteries here.
Why convert from UTF-8 at all? The original email was perfectly valid UTF-8—the universal encoding that can represent any character. Why would Outlook downgrade to a legacy Windows code page when forwarding? Perhaps it's for compatibility with older systems, or a legacy code path that predates UTF-8's dominance. It's surprising to see this behaviour in 2025.
Why Windows-1258 specifically? Windows-1252 (Western European) and Windows-1258 (Vietnamese) are numerically close—code pages 1252 vs 1258. It's possible that:
- A locale detection algorithm is incorrectly identifying Western European text as Vietnamese
- There's an off-by-one or similar bug in code page selection
- Some Outlook configuration is defaulting to the wrong code page
This appears to affect Outlook 365, particularly when replying to or forwarding emails. The original encoding information seems to be lost, and Outlook selects an incorrect replacement.
The Takeaway
Email encoding issues remain surprisingly common in 2025. Even major email clients can introduce bugs that propagate incorrect charset declarations through email chains.
If you're building systems that process email:
- Don't blindly trust Content-Type headers—the declared encoding may not match reality
- Prefer UTF-8 everywhere—it's the universal standard and eliminates these issues
- Consider encoding detection as a fallback when processing emails from unknown sources
- Test with international content—accented characters in French, German, and Spanish are common edge cases
And if you ever see Windows-1258 declared on an email that clearly isn't Vietnamese, you've likely encountered this Outlook bug. Welcome to the club.