Receiving email attachments in S3 compatible storage
CloudMailin Team
19 Mar 2025
We initially dismissed the bug reports. A handful of customers were seeing
client_error responses when forwarding emails through CloudMailin—the telltale
sign of an email encoding error that asks customers to contact support. The
emails were marked as Windows-1258, the Vietnamese code page, but customers told
us they didn't have any Vietnamese users. The volume was tiny, so we set it
aside.
Then one user reported something that caught our attention: they had created the email themselves. They had no idea where Vietnamese encoding could possibly be coming from. We jumped on a call and investigated together.
The emails weren't Vietnamese at all—they were French.
When users replied to or forwarded emails using Outlook 365, something strange was happening. Perfectly valid UTF-8 emails were being converted to Windows-1258, even when the content was entirely in French or Spanish. The accented characters still appeared correct, but the declared charset was wrong—and that charset problem matters more than you might think.
Before Unicode became the standard, computers needed a way to represent text in different languages. The solution was code pages—lookup tables that mapped byte values to characters. Microsoft Windows introduced a family of these code pages, each designed for a specific region:
| Code Page | Name | Languages |
|---|---|---|
| Windows-1250 | Central European | Polish, Czech, Hungarian |
| Windows-1251 | Cyrillic | Russian, Bulgarian, Serbian |
| Windows-1252 | Western European | English, French, German, Spanish |
| Windows-1253 | Greek | Greek |
| Windows-1254 | Turkish | Turkish |
| Windows-1255 | Hebrew | Hebrew |
| Windows-1256 | Arabic | Arabic |
| Windows-1257 | Baltic | Latvian, Lithuanian |
| Windows-1258 | Vietnamese | Vietnamese |
Windows-1252 became the de facto standard for Western European languages and was so widely used that many systems treated it as synonymous with "Latin" or "ANSI" encoding.
Today, UTF-8 has largely replaced these code pages. It can represent any Unicode character and has become the universal standard for web and email. But legacy code pages still lurk in email systems, causing occasional headaches.
Here's where things get interesting. Windows-1252 and Windows-1258 are nearly identical. Both are single-byte encodings that share the same basic Latin character set. For most Western European characters, the byte values are the same:
| Character | Byte Value | Windows-1252 | Windows-1258 |
|---|---|---|---|
| é | 0xE9 | é (e acute) | é (e acute) |
| à | 0xE0 | à (a grave) | à (a grave) |
| ç | 0xE7 | ç (c cedilla) | ç (c cedilla) |
| ñ | 0xF1 | ñ (n tilde) | ñ (n tilde) |
This is why French text encoded as Windows-1258 still looks correct—the byte values for common accented characters happen to map to the same glyphs in both encodings.
The difference lies in the upper range, where Windows-1258 replaces some characters with Vietnamese-specific combining diacritics:
| Byte Value | Windows-1252 | Windows-1258 |
|---|---|---|
| 0xC3 | Ã (A tilde) | Ã (A tilde) |
| 0xCC | Ì (I grave) | ̀ (combining grave) |
| 0xD2 | Ò (O grave) | ̉ (combining hook above) |
| 0xEC | ì (i grave) | ̣ (combining dot below) |
Vietnamese requires more diacritical combinations than can fit in 128 code points, so Windows-1258 uses combining characters—a fundamentally different approach than Windows-1252.
Let's look at what happens when Outlook 365 processes an email. Here's the original message, sent from Gmail:
From: sender@gmail.com
Subject: =?utf-8?B?UsOpdW5pb24gZHUgMTEgZMOpY2VtYnJl?=
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
The email is properly encoded as UTF-8, with a French subject line ("Réunion du 11 décembre" - "Meeting on December 11th").
Now here's the same email after being forwarded through Outlook 365:
From: recipient@outlook.com
Subject: =?windows-1258?Q?FW:_R=E9union_du_11_d=E9cembre?=
Content-Type: text/plain; charset="windows-1258"
Content-Transfer-Encoding: quoted-printable
Outlook has:
The text still displays correctly because =E9 (the byte value for "é") happens
to work in both encodings. But the email now claims to be Vietnamese.
You might wonder: if the text displays correctly, what's the problem? This email encoding bug might seem harmless, but it causes real issues downstream.
Email systems don't just display text—they process it. When an email declares
charset="windows-1258", downstream systems trust that declaration:
Search indexing may apply Vietnamese text analysis rules, breaking search for French terms.
Text extraction tools may attempt Vietnamese-specific character normalisation.
Webhook receivers parsing the email programmatically will decode using the wrong character map, potentially corrupting any characters that don't overlap between the encodings.
Archive systems may store the text with incorrect metadata, causing issues years later when someone searches for "réunion" and the system thinks it's looking at Vietnamese.
There's even a term for the garbled text that results from charset problems: mojibake (文字化け), from the Japanese for "character transformation." It's so common that it has its own Wikipedia page. When you see "Réunion" instead of "Réunion", you're looking at mojibake—and a charset mismatch is usually the culprit.
At CloudMailin, we now handle this gracefully. Those client_error responses
that first alerted us to this bug are a thing of the past—we detect and process
these mismatched encodings without issue.
The problem however, still remains in Outlook. Outlook is still incorrectly encoding French and Spanish emails as Vietnamese.
We can only speculate, but there are really two mysteries here.
Why convert from UTF-8 at all? The original email was perfectly valid UTF-8—the universal encoding that can represent any character. Why would Outlook downgrade to a legacy Windows code page when forwarding? Perhaps it's for compatibility with older systems, or a legacy code path that predates UTF-8's dominance. It's surprising to see this behaviour in 2025.
Why Windows-1258 specifically? Windows-1252 (Western European) and Windows-1258 (Vietnamese) are numerically close—code pages 1252 vs 1258. It's possible that:
This appears to affect Outlook 365, particularly when replying to or forwarding emails. The original encoding information seems to be lost, and Outlook selects an incorrect replacement.
Email encoding issues remain surprisingly common in 2025. Even major email clients can introduce bugs that propagate incorrect charset declarations through email chains.
If you're building systems that process email:
And if you ever see Windows-1258 declared on an email that clearly isn't Vietnamese, you've likely encountered this Outlook bug. Welcome to the club.
CloudMailin Team
19 Mar 2025
CloudMailin Team
20 Sep 2024
CloudMailin Team
1 Sep 2023