doc: Add UTF-8 byte structure section and resources (#10)

Added a section on UTF-8 byte structure with a diagram and a recommended video for further understanding.
2026-05-15 10:48:38 +00:00 · 2026-04-25 00:51:54 +02:00 · 2026-04-25 00:51:54 +02:00 · 8d0e744991
commit 8d0e744991
parent 7a480b44b9
1 changed files with 12 additions and 23 deletions
--- a/docs/appdev/inputs_api_(utf8).md
+++ b/docs/appdev/inputs_api_(utf8).md
@ -176,6 +176,15 @@ Invalid sequences are replaced with:
 - code point: `0xFFFD`
 - UTF-8 encoding: `0xEF 0xBF 0xBD`
 ---
 ### UTF-8 Byte Structure
 The following diagram illustrates how UTF-8 bytes are structured, including
 ASCII, continuation bytes, and multi-byte sequence headers:
 <img width="815" height="1003" alt="image" src="https://github.com/user-attachments/assets/0d289a94-6037-4039-87a3-125c0c0e83d0" />
 <sub>Source: <a href="https://www.youtube.com/watch?v=vpSkBV5vydg">Nic Barker — "UTF-8, Explained Simply"</a> (YouTube)</sub>
 ---
 ### Control Signals
@ -233,30 +242,10 @@ Example:
 ---
-## Limitations
+## Also worth watching
- No full UTF-8 validation (overlong, surrogates not fully rejected)
+If you want to dive deeper or simply get a better intuitive understanding of UTF-8, the video below is highly recommended:
 - No grapheme cluster handling
 - No Unicode normalization
---
+[Nic Barker — "UTF-8, Explained Simply"](https://www.youtube.com/watch?v=vpSkBV5vydg)
 ## Best Practices
 - Never iterate UTF-8 strings byte-by-byte
 - Always use provided helpers for navigation
 - Separate byte length from character count
 - Handle invalid sequences safely
 ---
 ## Summary
 This module provides essential UTF-8 primitives for userland applications.
 It should be used whenever an application needs to safely:
 - decode UTF-8
 - encode Unicode
 - traverse text
 - handle user input correctly