2011-05-03

How Not To Localize Your Software

Software localization is difficult. There are so many differences in natural langauges we use. In this post, I want to write about how not to localize your software. That is, there are so many softwares and libraries trying to deal with the localization, badly. They believe their works helps localization. But in reality, it just make things worse.

First and the most important. Collation is a bad idea. I have no idea what the collation is all about. Because, Japanese doesn't have that kind of concept. So if your software dealing with collation, you're doing thing that isn't generic among most natural langauges in the world.

Just look at the wikipedia entry for the Collation.

Collation - Wikipedia, the free encyclopedia

See Languages at the left side. If collation is so generic and it matters in all languages, then why there are only 19 languages for corresponding entry in the list. Why there is no Japanese, Chinese, Korean? Because, we don't have such a concept.

Case, accent, character ordering, these things does not exist or simply doesn't work at all in Japanese.

Plural handling. Please, Don't. Just don't do that. If you think plural handling is important for localization, you shall not implement the localization library in the first place.

For ease of localization, software should implement the most generic feature that makes sense in all natural languages. Plural handling is not a generic concept. We should rather change our natural language to make it easier for processing.

English. If localization library requires English knowledge, then it sucks. Really. It's horrible. The fact we need localization is not all people understand the single language. Software developer is no exception.

You may argue that programmer should understand English, localization is for user. That doesn't work at all. Programmers are, in the same time, users. If you can hire enough programmers who knows English well in that country, then you don't need a localization for English software in that country.

If you're living in a country where English is not a official language, but you and all citizens can understand English, then English defeated your language. What language do they use to teach students? English? Then, your language is dead. You don't need a localization at all.

We need a localization because we don't know that language. If your software does one of above in the name of localization, please don't. It's just doesn't work at all.

I bet if you insist English, you also insist plural handling. Consider, how can we completely abandon our language when you can't even abandon one of your silly grammar in your language? Nobody can.

These close minded non-generic implementation almost always has assumption how language should be processed. For other languages, it doesn't make sense and it must be modified. So we need to patch source code or binary. In that case, localization library become just another obstacle for localization. Localization cost is as same as localizing hard-coded software.

5 comments:

Anonymous said...

紙の辞書なんてもうご覧になったことはないんですか? (ないとしても不思議はないが。日本でもほとんど電子辞書に置き換えられた分野だし)
ああ>アあ>あい
かがく>がかく>かき
のようなソートをするために必要なのがCollationですが。
欧米のような複雑なCollationは必要ないかもしれませんが、Collationの概念がないというのは明らかに誤りです。日本語にCollationの概念が存在しないならJIS X 4061は何なのですか?
日本語版ウィキペディアに項目がないのは単に日本語版ウィキペディアが終わってるからで、そんなものあてにするほうがどうかしています。

江添亮 said...

漢字には複数の読みがあるのですが、どうやって読みによるソートをするつもりですか?

Egtra said...

なるほど、JIS X 4061の代表読み集合はおもしろいですね。各漢字ごとに「代表読み」(特定の読み仮名)を定めて、代表読みによって漢字の順序を決めると(X 4061での一手法であって、X 4061はこれだけがすべてではありませんが)。

ほか、漢字交じりのソートの一例として思い付くのは、Windowsが「XJIS」(X 4061らしい)と「部首/画数」から選べることです。
http://blogs.msdn.com/b/sqljapan/archive/2008/06/16/japanese-japanese-xjis-100-japanese-bushu-kakusu-100.aspx

江添亮 said...

その手のソートは、日本語には有効かもしれませんが、すべての言語に適用できるほど汎用的ではありません。
真にローカライズを考えるならば、すべての言語に適用できない概念はそぎ落とすべきです。

Egtra said...

ええ、特定の言語に特有の処理というのは避けられるならそれに越したことはないですね。その上で、ある程度Internationalizationが為されたソフトウェアは、今度はかえって言語や地域に特有の処理を作り込んでいく印象を自分は抱いていて、おもしろいと思っています。MS Officeなんかで言語によって機能が違う(日本語と英語でしか試したことはありませんが)という経験がありまして。