Jump to content
Xtreme .Net Talk

Recommended Posts

  • *Experts*
Posted

Generally files of a certain type contain a header. For example, a .GIF file contains a header that says "GIF87a" or "GIF89a" so that programs can determine that it is a GIF.

 

All files can be interpreted as text or interpreted as binary.

  • *Gurus*
Posted

You can't for sure, that's the whole point. Each of the encoding types has its own byte-level signature, and some offer bit order marks, however there's nothing mandating that signature to only be used in text files. This is part of the reason why file extensions exist-- to indicate what type of file is being dealt with.

 

For example:

 

  • ASCII uses 7 bits to represent one character.
  • UTF-8 uses one to six octets per character, with the initial octet serving as both an indicator of the number of subsequently used octets and a portion of the character value. UTF-8 is also marked with an opening byte sequence of EF BB BF.

 

And while UTF-8 is relatively easy to spot, there's absolutely nothing to distinguish ASCII with other than by checking whether or not there are bytes in the file that don't map to characters. A null value, 00, would be one indication that the file is not ASCII-encoded for example.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...