Xml which characters need to be escaped
The safe way is to escape all five characters in attributes. HTML has its own set of escape codes which cover a lot more characters. In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly representing itself , or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference.
According to the specifications of the World Wide Web Consortium w3C , there are 5 characters that must not appear in their literal form in an XML document , except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:.
Esoterica 0. From Character Data and Markup :. Abridged from: XML, Escaping. Most of the control characters and other Unicode ranges are specifically excluded, meaning I think they can't occur either escaped or direct:. Valid characters in XML. It depends on the context. As mentioned in this other question. This means also that calling for example the character entity is forbidden. If you only escape the five characters. Stack Overflow for Teams — Collaborate and share knowledge with a private group.
Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. What characters do I need to escape in XML documents? Ask Question.
Asked 12 years, 4 months ago. Active 6 months ago. Viewed 1. What characters must be escaped in XML documents, or where could I find such a list? Improve this question. Julius A Julius A Literally none of the answers here are correct. If, on the other hand, the next character is one that can be used in hexadecimal numbers, it won't be clear where the end of the number is. In these cases there are two options. The first is to use a space after the escape. This space is part of the escape syntax, and does not remain after the character escape is parsed.
Alternatively, you can use a 6-digit hexadecimal number, with or without a space. Because any white-space following the hexadecimal number is swallowed up as part of the escape, if you actually want a space to appear after the escaped character you will need to add two spaces after a hexadecimal number of any length. However, escaped characters of any type can appear in any location. This means that you can't start an identifier with an ASCII digit although you can use digits after the first character.
So if the class name you want to refer to happens to begin with a digit you will need to escape it. For example, to select an element in HTML with the class name "", you would write the following.
Note the use of the space to separate the escaped part of the class name from the remainder, so that it's clear where the end of the escape is.
There is no need to also escape the '23' part of the identifier, since digits are allowed after the first position. The following all show valid ways of escaping a sequence of characters, such as those in the sequence of Egyptian hieroglyphs , meaning 'voice'.
The backslash can also be used in CSS before a syntax character to prevent it being read as part of the code. It is almost always preferable to use an encoding that allows you to represent characters in their normal form, rather than using named character references or numeric character references.
Using escapes can make it difficult to read and maintain source code, and can also significantly increase file size. Many English-speaking developers have the expectation that other languages only make occasional use of non-ASCII characters, but this is wrong.
If you were to require numeric character references for all non-ASCII characters, the passage would become unreadable, difficult to maintain and much longer.
It would, of course, be much worse for a language that didn't use Latin characters at all. Using named character references in a document that is parsed as XML may become problematic if the entities are defined externally to your document and the tools that process the XML do not read the external files.
In such cases the entity references will not be replaced by characters. For this reason, if you need to use escapes, it may be safer to use numeric character references, or define the character entities you need inside the document.
Syntax characters. There are three characters that should always appear in content as escapes, so that they do not interact with the syntax of the markup. This would certainly be the case in attribute text when you need to use the same type of quotes as those that surround the attribute value.
These are the characters which are used to markup XML syntax; when they appear as a part of a document rather than for syntax markup, they need to be appropriately escaped. These characters are:. All text that is not markup constitutes character data of the document. Comments can appear anywhere in a document outside of markup.
Within comments, none of the 5 special characters must be escaped or encoded. The following is invalid:. None of the 5 special characters must be encoded within PI statements.
0コメント