From 8d0e744991ae399c8f3ec197864055e37a5891aa Mon Sep 17 00:00:00 2001
From: Lluciocc <114759545+Lluciocc@users.noreply.github.com>
Date: Sat, 25 Apr 2026 00:51:54 +0200
Subject: [PATCH] doc: Add UTF-8 byte structure section and resources (#10)

Added a section on UTF-8 byte structure with a diagram and a recommended video for further understanding.
---
 docs/appdev/inputs_api_(utf8).md | 35 +++++++++++---------------------
 1 file changed, 12 insertions(+), 23 deletions(-)
diff --git a/docs/appdev/inputs_api_(utf8).md b/docs/appdev/inputs_api_(utf8).md
index 0c267a7..0a5dc63 100644
--- a/docs/appdev/inputs_api_(utf8).md
+++ b/docs/appdev/inputs_api_(utf8).md
@@ -176,6 +176,15 @@ Invalid sequences are replaced with:
 - code point: `0xFFFD`
 - UTF-8 encoding: `0xEF 0xBF 0xBD`
 
+---
+### UTF-8 Byte Structure
+
+The following diagram illustrates how UTF-8 bytes are structured, including
+ASCII, continuation bytes, and multi-byte sequence headers:
+
+<img width="815" height="1003" alt="image" src="https://github.com/user-attachments/assets/0d289a94-6037-4039-87a3-125c0c0e83d0" />
+<sub>Source: <a href="https://www.youtube.com/watch?v=vpSkBV5vydg">Nic Barker — "UTF-8, Explained Simply"</a> (YouTube)</sub>
+
 ---
 
 ### Control Signals
@@ -233,30 +242,10 @@ Example:
 
 ---
 
-## Limitations
+## Also worth watching
 
-- No full UTF-8 validation (overlong, surrogates not fully rejected)
-- No grapheme cluster handling
-- No Unicode normalization
+If you want to dive deeper or simply get a better intuitive understanding of UTF-8, the video below is highly recommended:
 
----
+[Nic Barker — "UTF-8, Explained Simply"](https://www.youtube.com/watch?v=vpSkBV5vydg)
 
-## Best Practices
 
-- Never iterate UTF-8 strings byte-by-byte
-- Always use provided helpers for navigation
-- Separate byte length from character count
-- Handle invalid sequences safely
-
----
-
-## Summary
-
-This module provides essential UTF-8 primitives for userland applications.
-
-It should be used whenever an application needs to safely:
-
-- decode UTF-8
-- encode Unicode
-- traverse text
-- handle user input correctly