In normal, if current language in game is English, since the range is 0-127, ASCII is good enough to cover all characters and symbols. The advantage of using ASCII is saving space for its one byte usage.
Unicode is a standard table set that mapping all over the world language character. To adapting other complicate character, it uses two bytes to cover massive possibility of character encoding. For example, for letter “A”, the Unicode is 0x0041, for Chinese “严”, the Unicode is 0x4E25.
UFT8 is a way to represent UNICODE for computer. In the above example, for letter ”A”, if it uses Unicode, one byte is enough while the other is redundant. The UTF8 encoding is more flexible than Unicode. If the highest bit is 0, it tells that there is only one byte storage and the value is stored in 0~6 bit. In contrast, if highest bit is 1, it will use 2 to 4 bytes to represent characters. The number of 1 follows by 0 tells how many bytes will be used.
In Sum, the usage mark of UTF8 is below:
0XXXXXXX: 1 byte
110XXXXX 10XXXXXX: 2 byte
1110XXXX 10XXXXXX 10XXXXXX: 3 bytes
11110XXX 10XXXXXX 10XXXXXX 10XXXXXX: 4 bytes
For instance, for Chinese character “严”, which is 0x4E25 (100111000100101) in Unicode. In UTF8, it uses 3 bytes to store the binary code, so the highest bit is 1 follow by two 1s, after which 0 is a terminate mark. The binary code storage starts from left to right, the value is:
Unicode: 0100 111000 100101
UTF8: 11100100 10111000 10100101