🍋
Menu
Text

Text Encoding

Text Character Encoding

The scheme used to represent characters as bytes — including UTF-8, UTF-16, ISO-8859-1, and Windows-1252.

Detalhe técnico

Text Encoding relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.

Exemplo

```javascript
// Text Encoding: text processing example
const input = 'Sample text for processing';
const result = input
  .trim()
  .split(/\s+/)
  .filter(Boolean);
console.log(result); // ['Sample', 'text', 'for', 'processing']
```

Formatos relacionados

Ferramentas relacionadas

Termos relacionados