Thursday, April 20, 2017

The аррӏе bites back

I've received a number of inquiries about whether TenFourFox will follow the same (essentially wontfix) approach of Firefox for dealing with those international domain names that happen to be whole-script homographs. The matter was forced recently by one enterprising sort who created just this sort of double using Cyrillic characters for https://www.аррӏе.com/, which depending on your font and your system setup, may look identical to (the site is a proof of concept only).

The circulating advice is to force all IDNs to be displayed in punycode by setting network.IDN_show_punycode to true. This is probably acceptable for most of our users (the vast majority of TenFourFox users operate with a Latin character set), but I agree with Gerv's concern in that Bugzilla entry that doing so disadvantages all other writing systems that are not Latin, so I don't feel this should be the default. That said, I also find the current situation unacceptable and doing nothing, or worse relying on DNS registrars who so far don't really care about anything but getting your money, similarly so. While the number of domains that could be spoofed in this fashion is probably small, it is certainly greater than one, and don't forget that they let the proof-of-concept author register his spoof!

Meanwhile, I'm not sure what the solution right now should be other than "not nothing." Virtually any approach, including the one Google Chrome has decided to take, will disadvantage non-Latin scripts (and the Chrome approach has its own deficiencies and is not IMHO a complete solution to the problem, nor was it designed to be). It would be optimal to adopt whatever solution Firefox eventually decides upon for consistency if they do so, but this is not an issue I'd like to sit on indefinitely. If you use a Latin character set as your default language, and/or you don't care if all domains will appear in either ASCII or punycode, then go ahead and set that pref above; if you don't, or consider this inappropriate, stay tuned. I'm thinking about this in issue 384.

By the way, TenFourFox "FPR0" has been successfully uploaded to Github. Build instructions to follow and the first FPR1 beta should be out in about two to three weeks. I'm also cogitating over a blog post discussing not only us but other Gecko forks (SeaMonkey, Pale Moon, etc.) which for a variety of reasons don't want to follow Mozilla into the unclear misty haze of a post-XUL world. To a first approximation our reasons are generally technical and theirs are primarily philosophical, but we both end up doing some of the same work and we should talk about that as an ecosystem. More later.


  1. A few possible approaches:

    1: Put non-ascii fonts in one font and ascii in another.
    2: If certain characters are in the address to switch to ascii only display.
    3: A modified font that is distinct between the various versions of the characters and make it part of the TenFourFox distribution.

    1. 1 has potential, but is complicated because the address bar is a multi-function control, and could have subtle bugs with fallback fonts (since I couldn't necessarily guarantee which fonts are on all systems and which fonts would therefore actually be used). That said, there are probably fonts that have distinctly different forms for Cyrillic characters, which would simplify this considerably. (see 3)

      2 is the IDN character blacklist. This already exists and is used in Firefox/TenFourFox to deal with clearly problematic characters, but the characters in question here are perfectly cromulent Cyrillic letters. See the Mozilla bug for why using this approach to block every possible character that could be part of a homograph isn't sustainable in the long run.

      I don't want to be in the font maintenance business, so if such a font does not exist, 3 is out. If there is one, it might be worth distributing it.

  2. I wonder if a binary-translation scheme could work - configured in a 'from' and 'to' way - Dependant on fonts installed? What I mean by that is if the incoming code can be quickly detected and translated to the output language/character-set (Latin - or vice versa - perhaps the output can key-in on server location?), then it could take most inputs and still fullfill server demands (especially since it is just translating characters and not phrases. I ran into this when trying to localize PPC Media Center to Russian and Greek where system code went through just fine, but the bash-calls couldn't not be compiled on my system since bash only uses Latin, regardless of whatever fonts are installed.

  3. Things like this make me hate the english language and wish we'ld all set aside our differences and make a new universal language for us humans, much like some computers. Another part of me worrys it will be just as bad or worse than english as well as the fact that we'll have to create new puns for our dirty jokes for this language. Anyways, I hope you (and mozilla) solve this issue soon and wish you luck.

  4. How about a simple mouse over on the URL bar to display the punycode expansion? (On my phone so can't go into detail with a tiny keyboard)

  5. Would it be feasible for the address bar to possibly change color (perhaps with some kind of tooltip or mouseover explaining what the color change means) whenever a Unicode URL is being displayed?

    Possible drawback: if the site is an attack site, a mere color change and non-modal dialog probably come too late.

    Random question that has absolutely nothing to do with the post: does anyone know the exact characteristics of the heatsink screws on a dual processor 1 GHz MDD G4? Someone gave me one and I'd love to use it with TFF, but that person also lost several of those screws and it overheats almost immediately.

  6. What about Waterfox' developer trying to keep XUL up:

    This is worth some support I'd say.

  7. A little off-topic ramble.

    In my office job I'm forced to use Windows, as many of us are. At least my computer is a quite capable 14 months old 3.4 GHz Quadcore i5, needed to run medical database applications that can be quite taxing. I use Firefox as my main browser. During the last year I've intentionally been using the corresponding ESR versions in order to be as close to TenForFox as possible. This will now change, as for security reasons I'm moving to ESR 52. Time for a resume.

    So I've been switching between Firefox on Win7 and TenForFox on Mac OS X 10.5 on a daily basis. The quadcore machine is faster, by pure processor speed, and also because a lot of what I see in the browser window is hardware-accelerated. There is no question about it and it's not a miracle.

    On the other hand I can say that my 17" 1.67 G4 PowerBook isn't so slow as to be unusable on the modern web. Quite the opposite: Using TenForFox, on many sites I don't notice much of a difference, if any. I can use TenForFox at home with about the same *practical* speed as Firefox at work on most sites. I've never once thought, coming home from work, switching on the PowerBook: 'This is so slow'! If I go mad on Facebook it's not because of the speed. Also I have to say that Firefox 45 ESR on Windows 7 crashes quite a lot more often than TenForFox 45 on 10.5, which is exceptionally stable and reliable.

    My PowerBook has turned twelve years old this February and I can still use it just like a recent computer on the web. Thank you Cameron!


Due to an increased frequency of spam, comments are now subject to moderation.