UTF8 Encoding

In normal, if current language in game is English, since the range is 0-127, ASCII is good enough to cover all characters and symbols. The advantage of using ASCII is saving space for its one byte usage.

Unicode is a standard table set that mapping all over the world language character. To adapting other complicate character, it uses two bytes to cover massive possibility of character encoding. For example, for letter “A”, the Unicode is 0x0041, for Chinese “严”, the Unicode is 0x4E25.

UFT8 is a way to represent UNICODE for computer. In the above example, for letter ”A”, if it uses Unicode, one byte is enough while the other is redundant.  The UTF8 encoding is more flexible than Unicode. If the highest bit is 0,   it tells that there is only one byte storage and the value is stored in 0~6 bit. In contrast, if highest bit is 1, it will use 2 to 4 bytes to represent characters. The number of 1 follows by 0 tells how many bytes will be used.

In Sum, the usage mark of UTF8 is below:

0XXXXXXX:                                                                          1 byte

110XXXXX 10XXXXXX:                                                      2 byte

1110XXXX 10XXXXXX 10XXXXXX:                                  3 bytes

11110XXX 10XXXXXX 10XXXXXX 10XXXXXX:              4 bytes

For instance, for Chinese character “严”, which is 0x4E25 (100111000100101) in Unicode.  In UTF8, it uses 3 bytes to store the binary code, so the highest bit is 1 follow by two 1s, after which 0 is a terminate mark. The binary code storage starts from left to right, the value is:

Unicode:         0100     111000     100101

UTF8:               11100100 10111000 10100101

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s