[Discuss] emoji in my url
Mike Small
smallm at sdf.org
Thu Mar 23 11:16:29 EDT 2017
Eric Chadbourne <sillystring at protonmail.com> writes:
> I just noticed that you can have an emoji URL. I'm I just old or is this moronic?
>
> The url bar should contain plain text and obscure nothing, else how do you know where you are?
Is this a URL with UCS characters? This is what RFC 3986 has to say:
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set
[UCS], the data should first be encoded as octets according to the
UTF-8 character encoding [STD63]; then only those octets that do
not correspond to characters in the unreserved set should be
percent- encoded. For example, the character A would be
represented as "A", the character LATIN CAPITAL LETTER A WITH
GRAVE would be represented as "%C3%80", and the character KATAKANA
LETTER A would be represented as "%E3%82%A2".
This is what it considers unreserved:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
It also says this:
A URI is a sequence of characters from a
very limited set: the letters of the basic Latin alphabet, digits,
and a few special characters.
So I'd say the URI with the emoji is supposed to be encoded (assuming
it's a standard UCS emoji).
But which is more obscure, %01%F6%3C or a little cat face with a wry
smile? I might like a way to get the UCS code point and long description
from the glyph, but I think I'd rather see the kitty by default even if
the character in the actual HTTP stream has to be encoded. Actually,
there is a way outside the browser to find out the codepoint. You could
copy and paste the glyph to the command line and run a command named uni
(included with the Perl module App::Uni on CPAN) on it. So yeah, if your
browser gets %01%F6%3C in a URI and shows you a face instead of the
standard URI encoding I think that's great (if there aren't security
implications from doing that, and if it lets you set this to your
preference). But if it's some stupid thing like what Pidgin does to
certain character pairs then I'm with you. That would be awful.
--
Mike Small
smallm at sdf.org
More information about the Discuss
mailing list