.NET
Java
Open Source
Mobile
Database
Architecture
RIA & Web
- CSS
- Flash
- Flex
- HTML
- JavaScript
- Silverlight
- XML
Toolbox

Extensible Markup Language (XML) Tutorial

27 Jun 2003 | by Gez Lemon | Filed in

Comments
PDF

Character Sets

A character set determines which characters are allowed within a document. A restrictive character set only allows certain types of characters. For example, a restrictive character set may only allow uppercase characters, and as its name suggests, a broad character set allows many characters. For example, a broad character set may include Arabic characters.

ASCII

ASCII is a widely used character set. Each character in the ASCII character set is represented by a character encoding value. The ASCII character code for an uppercase "A" is the value 65, and the ASCII character code for a lowercase "a" is the value 97. Pure ASCII is a 7-bit encoding scheme, allowing 128 different values. ANSI extend the ASCII character set to 8-bit to use the full range of 256 characters available in a Byte.

Unicode

The designated character set for XML documents is unicode, which includes characters from around the world. The Universal Character Set (UCS), is an ISO standard that encompasses most of the world's writing systems. UCS uses multi-octet characters with are not compatible many current applications and protocols. The UCS Transformation Formats (UTF) standards were developed to overcome the compatibility issue. The two most widely used encoding schemes for unicode are UTF-8, and UTF-16. UTF-8 uses 8 bits, and is compatible with 7-bit ASCII. UTF-8 is able to represent other characters using two or more byte combinations. UTF-16 uses 16 bit character encoding, and is able to represent 65,356 possible values.

Specifying a Character Set

The markup and the character data for the actual text of the document are both written in unicode by default. This enables XML documents to be created from plain text editors.

The XML declaration may optionally include the character encoding to be used. This allows you to specify an encoding type, other than 8-bit UTF. Notepad for Windows in the UK uses windows-1252 encoding by default. As not all XML parsers understand windows-1252 encoding, it is better to use a standard encoding of ISO-8859-1, which is similar to the encoding used by Notepad. Notepad for Windows 2000 and XP has the ability to save documents in unicode, allowing the encoding attribute to be omitted from the declaration. The following example specifies an encoding of ISO-8859-1.

<?xml version="1.0" encoding="ISO-8859-1"?>

You might also like...

Comments

About the author

Gez Lemon

I'm available for contract work. Please visit Juicify for details.

www.juicystudio.com

Interested in writing for us? Find out more.

XML tutorials

XML books

Access 2010 Bible

The expert guidance you need to get the most out of Access 2010Get the Access 2010 information you need to succeed with this comprehensive reference. If this is your first encounter with Access, you'll appreciate the thorough attention to database fu...

XML forum discussion

Invitation to take part in an academic research study

by researchlab (0 replies)
How to insert & edit unique value using store procedure

by umeshdaiya (0 replies)
How to troubleshoot Epson laser printer?

by daisywyatt618 (0 replies)
view state is stored after the page post-back

by shriniwas.khatri852 (0 replies)
Transfer selected rows from one GridView to another GridView in aspxform(ASP.NET)

by dorsa (0 replies)

XML podcasts

Stack Overflow Podcast: SE Podcast #27 – Dave Winer

Published 9 years ago, running time 1h2m

Jeff & Joel are joined today by Dave Winer, who’s upset that we don’t have a jingle to start the show! He “invented” (well, pioneered, really) the XML-RPC protocol. Dave tells us the story of how and why the protocol came to be. Right now, Dave’s working on a “magnificent symphony of software

Managed hosting by Everycity

Extensible Markup Language (XML) Tutorial

Character Sets

You might also like...

Comments

About the author

Gez Lemon

XML tutorials

XML books

Access 2010 Bible

XML forum discussion

Invitation to take part in an academic research study

by researchlab (0 replies)

How to insert & edit unique value using store procedure

by umeshdaiya (0 replies)

How to troubleshoot Epson laser printer?

by daisywyatt618 (0 replies)

view state is stored after the page post-back

by shriniwas.khatri852 (0 replies)

Transfer selected rows from one GridView to another GridView in aspxform(ASP.NET)

by dorsa (0 replies)

XML podcasts

Stack Overflow Podcast: SE Podcast #27 – Dave Winer

Published 9 years ago, running time 1h2m

Contribute

Web Development

Developer Jobs

Our tools