unicode - Reading a file in C as utf-8 and iterating through each character? -


how go reading file in c, iterating through each character can evaluate it? instance, give input file of: 5 ≠ 10, evaluate 5 not equal 10, , print out false. can evaluation part, i'm unsure how approach reading unicode characters in c. i'm asking question, since i've written larger lexer, , want have support unicode, wanted try out on smaller-scale project see how goes.

utf-8 encoding format unicode. you're interested in parsing text , separating out each byte. need calculate unicode code point determine character.

ultimately need:

  1. a parser can distinguish utf-8 character boundaries.
  2. a translator convert data encoded utf-8 unicode code point.
  3. and reference list of code points , semantic meanings.

the not equal sign unicode code point u+2260. encoded in utf-8 0xe2 0x89 0xa0.

edit: should using library parsing utf-8 text. should focusing on finding code points relevant application, , interpreting meaning within application.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -