|
|
||||||
|
#1
|
|
|
|
|
I have character counter for textarea wich counting the characters.
Special character needs same place as two normal characters because of 16-bit encoding. Counter is counting -2 when special character is added like some language specific char. How to count specials like 1 char? tnx |
|
|
|
#2
|
|
|
|
|
majna wrote:
> I have character counter for textarea wich counting the characters. > Special character needs same place as two normal characters because of > 16-bit encoding. It doesn't. > Counter is counting -2 when special character is added like some > language specific char. "€".length === 1 > How to count specials like 1 char? The same way. ECMAScript 3 implementations use UTF-16 encoded strings. RTFM. PointedEars |
|
#3
|
|
|
|
|
Thomas 'PointedEars' Lahn wrote:
> majna wrote: >> I have character counter for textarea wich counting the characters. >> Special character needs same place as two normal characters because of >> 16-bit encoding. > > It doesn't. > >> Counter is counting -2 when special character is added like some >> language specific char. Should have been -1. But even if most implementations would not be UTF-16 safe, that would not have sufficed. UTF-16 does not mean that the representation of a glyph in that encoding requires always only 16 bits: http://www.unicode.org/faq/utf_bom.html#6 > "€".length === 1 Windows(-1252). Hmpf. Make that "€" any Unicode glyph (such as "â‚") and it is still true. PointedEars |
|
#4
|
|
|
|
|
Thomas 'PointedEars' Lahn :
> "?".length === 1 Should be, since '?' (U+20AC) is represented as a single UTF-16 code point, but it is not, e.g., in spidermonkey, which obviously uses UTF-8: js> e = "?" ? js> e.length 3 js> for (i = 0; i < e.length; i++) {print(e.charCodeAt(i).toString(16))} e2 82 ac But then, OP mentions UTF-8 in the subject line. >> How to count specials like 1 char? > The same way. ECMAScript 3 implementations use UTF-16 encoded strings. > RTFM. Hmmm. Is there *any* implementation that actually respects the requirement of UTF-16? Besides, even assuming UTF-16, some "language specific" characters (whatever that means...) take up more than one code point. Some characters may even use one or more code points according to whether one uses decomposition or not, e.g., 'é' is either U+00E9 or U+0065 U+0301. Short of testing each successive octet (if the implementation uses UTF-8) or code point (if the implementation is correct according to the specs) to see what kind of character it is, I have so far been unable to answer the OP's question. |
|
#5
|
|
|
|
|
Johannes Baagoe wrote:
> Thomas 'PointedEars' Lahn : >> "€".length === 1 > > Should be, since '€' (U+20AC) is represented as a single UTF-16 code > point, You mean code *unit*, _not_ code point. The latter is a completely different thing, the *position* of a Unicode character in the definition tables. Et non sequitur, as I have encoded my first followup accidentally with Windows-1252, that is not the real code point of that character (it is 0x80). With UTF-16, you are correct, except that characters beyond code point 63k, which would require more code units, are seldom used. > but it is not, e.g., in spidermonkey, which obviously uses UTF-8: > > js> e = "€" > € > js> e.length > 3 > js> for (i = 0; i < e.length; i++) {print(e.charCodeAt(i).toString(16))} > e2 > 82 > ac Probably due to your SpiderMonkey build. It works just fine since Mozilla/4.0. > But then, OP mentions UTF-8 in the subject line. Doesn't matter. The used document encoding is transparent to the application. The `value' property of a HTMLTextAreaElement object is of type DOMString, which is fully compatible to ECMAScript (UTF-16) strings. >>> How to count specials like 1 char? >> The same way. ECMAScript 3 implementations use UTF-16 encoded strings. >> RTFM. > > Hmmm. Is there *any* implementation that actually respects the requirement > of UTF-16? Most would nowadays. Even Netscape 4.78 yields 1 for "€".length. > Besides, even assuming UTF-16, some "language specific" characters (whatever > that means...) take up more than one code point. Some characters may even > use one or more code points according to whether one uses decomposition > or not, e.g., 'é' is either U+00E9 or U+0065 U+0301. No unique Unicode glyph has more than one code point, that would be a major flaw in the standard (that does not exist). However, a glyph may be represented by more than one code unit, though, either due to the mere necessity of its higher code point (position), surrogates or composition (and in the latter case it consists of several glyphs with their own code point, and their code units concatenated according to the used encoding). However, that does not matter for implementations of ECMAScript 3. Especially, glyph composition is transparent to the application, if it supports it. http://www.unicode.org/faq/char_combmark.html#2 PointedEars |
|
#6
|
|
|
|
|
Thomas 'PointedEars' Lahn wrote:
> Johannes Baagoe wrote: >> Thomas 'PointedEars' Lahn : >>>> How to count specials like 1 char? >>> The same way. ECMAScript 3 implementations use UTF-16 encoded strings. >>> RTFM. >> Hmmm. Is there *any* implementation that actually respects the requirement >> of UTF-16? > > Most would nowadays. Even Netscape 4.78 yields 1 for "€".length. One might argue then that Netscape 4.78 evaluates the Windows-1252 encoded version of the respective currency mark which is one byte, and that it does not support Unicode. However, "€".charCodeAt() yields 8364 (not 128), String.fromCharCode(8365) yields "â‚", and both "\u20AC".length and String.fromCharCode(8365).length yield 1. PointedEars |
|
#7
|
|
|
|
|
Thomas 'PointedEars' Lahn :
[My version of SpiderMonkey uses UTF-8] > Probably due to your SpiderMonkey build. It works just fine since > Mozilla/4.0. It does indeed in my version of Firefox. Serves me right for sticking with obsolete command-line tools :-) It would appear that if I want a good stand-alone ECMAScript interpreter, I have to compile it myself. > [..] Excellent, thanks a lot. |
|
|
| Similar Threads | |
| How to convert HTML special characters to the real characters with a Java script I read data (e.g. äöüÄÖÜçéàè"') from my MySQL database which I'd like to show in an input box. <?php $mysql_data = "äöüÄÖÜçéàè\"'"; $html_data =... |
|
| special characters é ë typing these characters was easy: just use the keys ' and e and you get é. The same goes for "and e. They make ë. But I had to reinstall windows xp and know these options... |
|
| special characters When typing special characters such as upside down question marks, upside down exclamation points, enyes and accented vowels for a spanish web site done in Front Page, these... |
|
| Replace special characters by non-special characters i'm looking for a way to replace special characters with characters without accents, cedilles, etc. |
|
| windows 2003 web edition. Websites with special characters, danish characters Hello I am moving all our sites from windows 2000 server to windows 20003 web edition server. I have a few .nu sites with danish characters, but the iis6 does not seem to... |
|
|
All times are GMT. The time now is 11:21 AM. | Privacy Policy
|