swift - What does it mean that two strings have the same linguistic meaning? -


in swift documentation comparing strings, found following:

two string values (or 2 character values) considered equal if extended grapheme clusters canonically equivalent. extended grapheme clusters canonically equivalent if have same linguistic meaning , appearance, if composed different unicode scalars behind scenes.

then documentation proceeds following example shows 2 strings "cannonically equivalent"

for example, latin small letter e acute (u+00e9) canonically equivalent latin small letter e (u+0065) followed combining acute accent (u+0301). both of these extended grapheme clusters valid ways represent character é, , considered canonically equivalent:

ok. somehow e , é same , have same linguistic meaning. sure i'll give them that. have taken spanish class sometime , prof wasn't strict on whether used either forms of e, i'm guessing referring to. fair enough

the documentation goes further show 2 strings not canonically equivalent:

conversely, latin capital letter (u+0041, or "a"), used in english, not equivalent cyrillic capital letter (u+0410, or "А"), used in russian. characters visually similar, not have same linguistic meaning:

now here alarm bells go off , decide ask question. seems appearance has nothing because 2 strings exactly same, , admit in documentation. seems string class looking linguistic meaning?

this why ask means strings having same/different linguistic meaning, because e form of e know used in english, have seen é being used in languages french or spanish, why given А used in russian , a used in english, causes string class not equivalent?

i hope able walk through thought process, question mean 2 strings have same linguistic meaning (in code if possible)?

you said:

somehow e , é same , have same linguistic meaning.

no. have misread document. here's document again:

latin small letter e acute (u+00e9) canonically equivalent latin small letter e (u+0065) followed combining acute accent (u+0301).

here's u+00e9: é
here's u+0065: e
here's u+0301:  ´
here's u+0065 followed u+0301: é

so u+00e9 (é) looks , means same u+0065 u+0301 (é). therefore must treated equal.

so why cyrillic А different latin a? utn #26 gives several reasons. here some:

  • “traditional graphology has treated them distinct scripts, …”

  • “literate users of latin, greek, , cyrillic alphabets not have cultural conventions of treating each other's alphabets , letters part of own writing systems.”

  • “even more significantly, point of view of problem of character encoding digital textual representation in information technology, preexisting identification of latin, greek, , cyrillic distinct scripts carried on character encoding, earliest instances of such encodings.”

  • “[a] unified encoding of latin, greek, , cyrillic make casing operations unholy mess, …”

read tech note full details.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -