Tuesday, January 28, 2014

ASCII vs Unicode


  • ASCII
On both Windows/DOS and Unix systems, the 128 most commonly-used characters are each represented by a sequence of 7 bits known as the character’s ASCII code.
They are traditionally stored as bytes (8 bits),
i.e. the 7-bit ASCII code plus a leading zero.
http://www.itk.ilstu.edu/staff/drathke/277web/WebContent/reading/asciiprint.html

Unicode
Java uses Unicode, in which all the characters are represented by 16 bits (2 bytes).
A total of 32,768 different characters are possible in Unicode, thereby allowing it to be a truly international character set.
The first 128 Unicode characters are the same as the ASCII characters, but with an extra leading zero byte in front of them


Unicode Test:
The file is called "testing1.txt" and was created in Notepad on Win2K
The 15 indicates the file is 15 bytes long.

Now I have saved the file as a "Unicode" file.
The file is called testing2.txt.


http://www.itk.ilstu.edu/staff/drathke/277web/WebContent/reading/AsciiandUnicode.html

No comments:

Post a Comment