unicode - Reading a file in C as utf-8 and iterating through each character? -
how go reading file in c, iterating through each character can evaluate it? instance, give input file of: 5 ≠ 10
, evaluate 5 not equal 10, , print out false. can evaluation part, i'm unsure how approach reading unicode characters in c. i'm asking question, since i've written larger lexer, , want have support unicode, wanted try out on smaller-scale project see how goes.
utf-8 encoding format unicode. you're interested in parsing text , separating out each byte. need calculate unicode code point determine character.
ultimately need:
- a parser can distinguish utf-8 character boundaries.
- a translator convert data encoded utf-8 unicode code point.
- and reference list of code points , semantic meanings.
the not equal sign unicode code point u+2260. encoded in utf-8 0xe2 0x89 0xa0.
edit: should using library parsing utf-8 text. should focusing on finding code points relevant application, , interpreting meaning within application.