From 8d0e744991ae399c8f3ec197864055e37a5891aa Mon Sep 17 00:00:00 2001
From: Lluciocc <114759545+Lluciocc@users.noreply.github.com>
Date: Sat, 25 Apr 2026 00:51:54 +0200
Subject: [PATCH] doc: Add UTF-8 byte structure section and resources (#10)
Added a section on UTF-8 byte structure with a diagram and a recommended video for further understanding.
---
docs/appdev/inputs_api_(utf8).md | 35 +++++++++++---------------------
1 file changed, 12 insertions(+), 23 deletions(-)
diff --git a/docs/appdev/inputs_api_(utf8).md b/docs/appdev/inputs_api_(utf8).md
index 0c267a7..0a5dc63 100644
--- a/docs/appdev/inputs_api_(utf8).md
+++ b/docs/appdev/inputs_api_(utf8).md
@@ -176,6 +176,15 @@ Invalid sequences are replaced with:
- code point: `0xFFFD`
- UTF-8 encoding: `0xEF 0xBF 0xBD`
+---
+### UTF-8 Byte Structure
+
+The following diagram illustrates how UTF-8 bytes are structured, including
+ASCII, continuation bytes, and multi-byte sequence headers:
+
+
+Source: Nic Barker — "UTF-8, Explained Simply" (YouTube)
+
---
### Control Signals
@@ -233,30 +242,10 @@ Example:
---
-## Limitations
+## Also worth watching
-- No full UTF-8 validation (overlong, surrogates not fully rejected)
-- No grapheme cluster handling
-- No Unicode normalization
+If you want to dive deeper or simply get a better intuitive understanding of UTF-8, the video below is highly recommended:
----
+[Nic Barker — "UTF-8, Explained Simply"](https://www.youtube.com/watch?v=vpSkBV5vydg)
-## Best Practices
-- Never iterate UTF-8 strings byte-by-byte
-- Always use provided helpers for navigation
-- Separate byte length from character count
-- Handle invalid sequences safely
-
----
-
-## Summary
-
-This module provides essential UTF-8 primitives for userland applications.
-
-It should be used whenever an application needs to safely:
-
-- decode UTF-8
-- encode Unicode
-- traverse text
-- handle user input correctly