The Curious Case of Windows-1258 Encoding Errors: When Outlook Speaks Vietnamese to French Emails

We initially dismissed the bug reports. A handful of customers were seeing client_error responses when forwarding emails through CloudMailin—the telltale sign of an email encoding error that asks customers to contact support. The emails were marked as Windows-1258, the Vietnamese code page, but customers told us they didn't have any Vietnamese users. The volume was tiny, so we set it aside.

Then one user reported something that caught our attention: they had created the email themselves. They had no idea where Vietnamese encoding could possibly be coming from. We jumped on a call and investigated together.

The emails weren't Vietnamese at all—they were French.

What We Discovered

When users replied to or forwarded emails using Outlook 365, something strange was happening. Perfectly valid UTF-8 emails were being converted to Windows-1258, even when the content was entirely in French or Spanish. The accented characters still appeared correct, but the declared charset was wrong—and that charset problem matters more than you might think.

A Quick History of Character Encoding

Before Unicode became the standard, computers needed a way to represent text in different languages. The solution was code pages—lookup tables that mapped byte values to characters. Microsoft Windows introduced a family of these code pages, each designed for a specific region:

Code Page	Name	Languages
Windows-1250	Central European	Polish, Czech, Hungarian
Windows-1251	Cyrillic	Russian, Bulgarian, Serbian
Windows-1252	Western European	English, French, German, Spanish
Windows-1253	Greek	Greek
Windows-1254	Turkish	Turkish
Windows-1255	Hebrew	Hebrew
Windows-1256	Arabic	Arabic
Windows-1257	Baltic	Latvian, Lithuanian
Windows-1258	Vietnamese	Vietnamese

Windows-1252 became the de facto standard for Western European languages and was so widely used that many systems treated it as synonymous with "Latin" or "ANSI" encoding.

Today, UTF-8 has largely replaced these code pages. It can represent any Unicode character and has become the universal standard for web and email. But legacy code pages still lurk in email systems, causing occasional headaches.

Windows-1252 vs Windows-1258: Spot the Difference

Here's where things get interesting. Windows-1252 and Windows-1258 are nearly identical. Both are single-byte encodings that share the same basic Latin character set. For most Western European characters, the byte values are the same:

Character	Byte Value	Windows-1252	Windows-1258
é	0xE9	é (e acute)	é (e acute)
à	0xE0	à (a grave)	à (a grave)
ç	0xE7	ç (c cedilla)	ç (c cedilla)
ñ	0xF1	ñ (n tilde)	ñ (n tilde)

This is why French text encoded as Windows-1258 still looks correct—the byte values for common accented characters happen to map to the same glyphs in both encodings.

The difference lies in the upper range, where Windows-1258 replaces some characters with Vietnamese-specific combining diacritics:

Byte Value	Windows-1252	Windows-1258
0xC3	Ã (A tilde)	Ã (A tilde)
0xCC	Ì (I grave)	̀ (combining grave)
0xD2	Ò (O grave)	̉ (combining hook above)
0xEC	ì (i grave)	̣ (combining dot below)

Vietnamese requires more diacritical combinations than can fit in 128 code points, so Windows-1258 uses combining characters—a fundamentally different approach than Windows-1252.

The Outlook Bug in Action

Let's look at what happens when Outlook 365 processes an email. Here's the original message, sent from Gmail:

From: sender@gmail.com
Subject: =?utf-8?B?UsOpdW5pb24gZHUgMTEgZMOpY2VtYnJl?=
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

The email is properly encoded as UTF-8, with a French subject line ("Réunion du 11 décembre" - "Meeting on December 11th").

Now here's the same email after being forwarded through Outlook 365:

From: recipient@outlook.com
Subject: =?windows-1258?Q?FW:_R=E9union_du_11_d=E9cembre?=
Content-Type: text/plain; charset="windows-1258"
Content-Transfer-Encoding: quoted-printable

Outlook has:

Converted from UTF-8 to a legacy code page
Chosen Windows-1258 (Vietnamese) instead of Windows-1252 (Western European)
Applied this to both the headers and body

The text still displays correctly because =E9 (the byte value for "é") happens to work in both encodings. But the email now claims to be Vietnamese.

Why Email Parsers Care

You might wonder: if the text displays correctly, what's the problem? This email encoding bug might seem harmless, but it causes real issues downstream.

Email systems don't just display text—they process it. When an email declares charset="windows-1258", downstream systems trust that declaration:

Search indexing may apply Vietnamese text analysis rules, breaking search for French terms.

Text extraction tools may attempt Vietnamese-specific character normalisation.

Webhook receivers parsing the email programmatically will decode using the wrong character map, potentially corrupting any characters that don't overlap between the encodings.

Archive systems may store the text with incorrect metadata, causing issues years later when someone searches for "réunion" and the system thinks it's looking at Vietnamese.

There's even a term for the garbled text that results from charset problems: mojibake (文字化け), from the Japanese for "character transformation." It's so common that it has its own Wikipedia page. When you see "RÃ©union" instead of "Réunion", you're looking at mojibake—and a charset mismatch is usually the culprit.

The Fix

At CloudMailin, we now handle this gracefully. Those client_error responses that first alerted us to this bug are a thing of the past—we detect and process these mismatched encodings without issue.

The problem however, still remains in Outlook. Outlook is still incorrectly encoding French and Spanish emails as Vietnamese.

What Causes This Bug?

We can only speculate, but there are really two mysteries here.

Why convert from UTF-8 at all? The original email was perfectly valid UTF-8—the universal encoding that can represent any character. Why would Outlook downgrade to a legacy Windows code page when forwarding? Perhaps it's for compatibility with older systems, or a legacy code path that predates UTF-8's dominance. It's surprising to see this behaviour in 2025.

Why Windows-1258 specifically? Windows-1252 (Western European) and Windows-1258 (Vietnamese) are numerically close—code pages 1252 vs 1258. It's possible that:

A locale detection algorithm is incorrectly identifying Western European text as Vietnamese
There's an off-by-one or similar bug in code page selection
Some Outlook configuration is defaulting to the wrong code page

This appears to affect Outlook 365, particularly when replying to or forwarding emails. The original encoding information seems to be lost, and Outlook selects an incorrect replacement.

The Takeaway

Email encoding issues remain surprisingly common in 2025. Even major email clients can introduce bugs that propagate incorrect charset declarations through email chains.

If you're building systems that process email:

Don't blindly trust Content-Type headers—the declared encoding may not match reality
Prefer UTF-8 everywhere—it's the universal standard and eliminates these issues
Consider encoding detection as a fallback when processing emails from unknown sources
Test with international content—accented characters in French, German, and Spanish are common edge cases

And if you ever see Windows-1258 declared on an email that clearly isn't Vietnamese, you've likely encountered this Outlook bug. Welcome to the club.

2025-11-26

Curious Windows-1258 Encoding Errors

The Curious Case of Windows-1258 Encoding Errors: When Outlook Speaks Vietnamese to French Emails

What We Discovered

A Quick History of Character Encoding

Windows-1252 vs Windows-1258: Spot the Difference

The Outlook Bug in Action

Why Email Parsers Care

The Fix

What Causes This Bug?

The Takeaway

Other Articles

Receiving email attachments in S3 compatible storage

Announcing Email Layouts (Beta)

New email previews for sent email