About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Developers Patrons
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Developers > KnowHow > KnowHow Welcome Guest!

An Introduction to Collation

By Cathy Wissink & Michael S.Kaplan - Windows Globalization, Microsoft CorporationPublished on 25th October 2003

Collation is an integral part of a well-globalized product. It is just pervasive in software and the real world that it is necessary to provide users with a means to search and order data in a way that makes sense in their particular culture. Unfortunately, there is no good way when it comes to implementing collation and group sorts by region or writing system.

It is necessary to take each individual language into account in order to get culturally accepted results. That said, there are many tools (e.g. ICU, the C runtime and its comparison functions) and APIs (e.g. lstrcmp, lstrcmpi, and CompareString) out there to help developers and designers plan and implement the well-globalized product, and the most important thing one can do is to stop and consider what the user might be expecting here prior to doing a lot of work that will confuse them. When collation is implemented thoughtfully and with prior consideration for the users, it can be a powerful force in making the software easier to use.

 

Read more on : "Collation in action"

Features of a language and their influence on collation

While the speaker of a language may not have conscious or overt knowledge of the different phenomena that influence linguistic sorting for his or her culture, these phenomena do indeed exist and must be taken into account when creating a collation.

Linguistic elements that influence sorting:

Myths about collation

It is necessary to debunk some of the untrue statements that we've heard about collation over the last few years. While these myths may be obvious to you, we still hear them from customers, developers and users quite a bit, and you may hear them as well (since there are many misconceptions of what software globalization entails, especially at higher management levels where the technical difficulties may not be well-understood or investigated).

Setting User preferences

  • In a file system, one might need to compare two file names in a case-insensitive manner. This means that any two file names that differ only by case are invalid; they would conflict. Because sorting (and respectively casing) rules could vary from user to user, you could end up with files that are valid on one user's machine and invalid on another. This is a bad thing.
  • Looking at applications like Excel, the columns are given letters. These letters are used in combination with the numbered rows so that cells can be identified as A1 or B12 or FD45. If one changed the ordering, then VBA code and formulas in the spreadsheet would break.
  • In this multilingual and multicultural society, your user might be Swedish but they may be looking at German data. The user's expectations for collation are not based on the data, but (generally) his or her native language.

In the cases where one would not want to use a linguistically appropriate collation, the desire to be too linguistically appropriate can actually be a source of confusion, errors and more problems than you can shake a stick at. Certainly one of the biggest reasons why for example a file system cannot support Turkish casing rules is due to the fact that .GIF, .Gif, and .gif files all need to be seen as the same file type so that a graphics program can find the file.

Obviously there is no one simple rule to be followed, given how much variation there is. The designers of the application simply need to decide what the users will be expecting, and whether it is necessary to have consistent (and potentially linguistically inaccurate) results, and if these two concepts overlap at all.

Note that there are two separate issues here: expecting consistent results independent of language, and giving users a culturally-correct experience. In the former case, one wants an invariant collation, and in the latter, one wants to follow the user's expectations. For example, the comparison of two files (for the sake of absolute equality) cannot change from machine to machine, the ordering of those files when one is looking at a directory on their own machine usually can be culturally valid without causing any problems.

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.