UTF-8 Encoder
Convert Unicode text into its UTF-8 byte sequence and view the resulting byte values in hex or decimal.
UTF-8 Encoder
Convert text to UTF-8 byte array representation (decimal values).
About UTF-8 Encoding
UTF-8 encoding shows how JavaScript strings are represented as bytes internally.
Common use cases:
- Understanding string byte sizes for APIs
- Debugging character encoding issues
- Working with binary protocols
- File size calculations for text data
Note: Each number represents one byte (0-255). Unicode characters may use multiple bytes.
What is UTF-8 Encoding?
UTF-8 is a variable-width character encoding that represents every Unicode code point using one to four bytes. ASCII characters (U+0000–U+007F) map to a single byte identical to their ASCII value, making UTF-8 fully backward-compatible with ASCII. Characters outside that range use multi-byte sequences with specific bit patterns that allow decoders to identify boundaries unambiguously.
UTF-8 is the dominant encoding for text on the web, in databases, and in source code, and is the default encoding for JSON, HTML5, and XML.
Common Use Cases
Internationalisation (i18n): Encode multilingual text — Arabic, Chinese, emoji, and more — into bytes that can be stored and transmitted uniformly. Network Protocol Implementation: Construct byte-level payloads for protocols (e.g., WebSocket, HTTP/2) that require UTF-8 encoded strings. File I/O: Confirm the byte sequence written to disk when saving text files to ensure correct encoding rather than relying on system defaults. Cryptographic Input Preparation: Hash functions and MACs operate on bytes; UTF-8 encoding provides a canonical byte representation of text input.
Tips
Always specify UTF-8 explicitly when opening files or making network requests — never rely on the operating system's locale default encoding. A single Unicode code point (e.g., an emoji) may occupy 4 bytes in UTF-8; string length in characters and length in bytes are not the same. The UTF-8 BOM (EF BB BF) is optional and generally discouraged; omit it unless a specific tool or legacy system requires it.