About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Developers Patrons
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Developers > KnowHow > Globalization > Globalization Welcome Guest!

Globalization Basics

Sharon Vandana, Technical Writer, Vishwak Solutions Pvt. Ltd.,

Published on 26th June 2004

Introduction

The paper discusses many basic aspects of Globalization and Localization, the importance of Indic computing, differences between Unicode and non-Unicode applications, Multi-lingual support in Windows 2000 and XP, Multi-lingual User Interface, Date, Number and currency formatting, Language groups and Locales.

  Background

Globalization & Localization are no longer new concepts. The demand for localization is quite significantly felt to be increasing. Earlier, computing in a local language seemed to be an out of the ordinary task due to many issues such as unavailability of fonts, English-dominant operating systems and applications, non-language supporting browsers etc. But now it is quite obvious that the local language applications market is now gearing up to come into its own.

Indic development is rapidly improving. With a vast population conversing in a multitude of languages, many with their own scripts, the problem of translation and transliteration1 from English to these languages and from one local language to another is daunting. It is not surprising that many language researchers and developers are grappling with this problem. Some of the solutions are quite mature, and available as commercial software offerings. Many of these permit interactions with the computer in a local language, using keyboards designed explicitly for the language.

The IT industry in India is realizing that the next spurt of growth will stem from IT services reaching the masses. To make this happen, support for local Indian languages is imperative. Beginning with simple word processing activities like preparing word documents, editing, and printing, to complex processes like data management and business processes can be supported with the help of local language displays.

The Indian constitution recognizes 18 official languages2 namely Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Tamil, Telugu & Urdu. But, almost each of these 18 languages has different dialects3 or variations. The Indian constitution uses the term 'mother tongue' instead of language or dialect . Each language includes in it many mother tongues. The Indian census records over 200 different mother tongues. The most commonly spoken language in India is Hindi.

Globalization Basics

  Enabling Indic support in software Applications

In India, english dominant software applications will have to be replaced with local language supported applications since people who actually speak and use English are still less than 10 percent of the population of more than 900 million. There is a much larger number perhaps that understands some English, but cannot communicate adequately.

Indian languages are entirely different when compared to English. Some of the oldest Indic languages4date back to 800 B.C. The syllables in English is pronounced in a strict left-to-right sequence of consonants and vowels, whereas in Indic scripts, the visual pronunciation indicators in a syllable do not always occur from left to right. This behavior creates specific problems in the creation of computing solutions for these languages. Another difficulty is the lack of a standard definition for the behavior of Indic languages.

The basic phonetic components of Indic languages are vowels (called 'swar' in Hindi) and consonants (called 'vyanjan' in Hindi). These form the basis of Indic alphabets. There are also dependent characters which are used in conjunction with the alphabet to modify the sounds they represent. We call these 'special-modifiers.' The number of characters in the alphabet varies with different scripts5.

There are a number of these encoding schemes used to encode Indic languages but it remains confined to the environments in which they were deployed, and therefore not interoperable. The most established encoding, ISCII (Indian Script Code for Information Interchange) plays a vital part in Indic scripts. The Unicode Standard and its ISO counterpart (ISO/IEC 10646) were created as a global standard to address the needs of most major languages. When Unicode was extended to Indic languages, it drew from the ISCII standard.

  Unicode and Non-Unicode applications

Unicode was the only solution for operating systems to support various other languages. Though the Unicode standard supports all languages of the world, the operating systems like Windows and Linux have not implemented all the languages yet.

Software providers utilize Unicode encoding for Web-based applications, making it the ideal choice.

Unicode makes it easier to deploy applications since your applications need not be modified to synchronize with the local (native) code pages6. Unicode consists of codepages for scripts. All characters in a particular script are assigned codepoints. Unicode supports all of the languages around the world, and therefore applications can create multinational and multilingual documents. All modern technologies (Java, XML, .Net, etc.) either support or are based on Unicode

If you have developed an application that is not using Unicode, you might need to change the language for non-Unicode programs. In fact, in Windows XP the System locale is called "Language for non-Unicode programs."

To change the language for non-Unicode programs in Windows XP, do the following:

  1. In Control Panel, click Regional and Language Options.
  2. Click the Advanced tab, and then under Language for non-Unicode programs, select the language for which the application was developed.



  Definitions

Internationalization

Internationalization is the process of creating world-ready, single-binary application ready for use in many different markets or high-quality, and foreign-language editions of a product. It covers generic coding and design issues and comprises two major areas-globalization and localizability.

Globalization

Globalization is the first step towards internationalization. It is the process of creating a program code that is not solely based on a single language or locale. It can correctly accept, process, and display a worldwide assortment of scripts, data formats, and languages but the language of the user interface remains unchanged. For this, you need to localize the application for the specific culture/locale.

Localizability

An intermediate step prior to localization is the process known as localizability. It is the process of readying software localized into different languages without changing the source code. In other terms, this means that there are no dependencies in the core application upon a specific language or culture. Effectively, in this step you need to ensure that you have separated the application’s resources that require translation from rest of the application’s code.

Localization

Localization involves translating and customizing a product for a specific market. For e.g. modifying the user interface (UI) elements, translating text, and standardizing terminology.

NOTE: The terms are often abbreviated as G11n, I18n, L10n, with the intervening number referring to the number of letters between the first and last of each word. You may also see "translation" abbreviated Tr8n, and all four terms together abbreviated as GILT.

  Multilingual support in Windows operating systems

Your OS needs to support different languages, empowering users with the flexibility to easily communicate across cultures. This avoids maintenance of dedicated operating systems and applications for each language.

Windows 2000

  English Version Localized Versions Multilanguage Versions
Multilingual Features for Users User interface (menus, help files, dialog boxes and folder names) in English language.Users can input,edit, view and print in hundreds of different languages. User interface (menus, help files, dialog boxes and folder names) in the language of the particular localized version (Japanese, for example).Users can input, edit, view and print in hundreds of different languages. User interface (menus, help files, dialog boxes and folder names) in English language.Users can input, edit, view and print in hundreds of different languages.

Multilingual Benefits for IT Professionals Provides support for working with documents in other languages.Ideal if you: Do not have significant need for a UI in a language other than English. Provides support for working with documents in other languages.Provides fully localized user interface.
Ideal if you: Need to deploy and support only one localized language UI (Japanese, for example) in your environment.
Provides support for working with documents in other languages.
Allows you to manage a single operating system code base for the entire enterprise.
Ideal if you:
Need to deploy and support UIs in more than one language in your environment.
Need to
reduce the TCO7. of deploying and maintaining multiple language versions of your operating system.

Source: Microsoft.com

Each version provides its own level of multilingualism. English Version enables working in both English and other languages. It supports multilingual viewing and editing features that allow users to read and write documents in hundreds of different languages. The UI language would be in English but documents can be read and written in additional languages.

The localized versions enable a single language other than English. Documents in other languages apart from the language supported by the localized version can be handled and processed. It would include multilingual viewing and editing features, administrators or users can install the character sets for other languages, including letters and numbers, local currency symbols, and other settings.

The Windows 2000 Professional, Multilanguage Version provides an extra level of multilingual capability by allowing users to change the language of the operating system user interface. This means a user can log on to a workstation and use Windows 2000 Professional in any of 24 languages—provided an administrator has installed the appropriate language files. Additionally, the user will be able to edit and view documents in the hundreds of languages supported in all Windows 2000 Professional versions.

The Windows 2000 Localized and Multilanguage Version are available in 24 languages. But support for fully localized versions in Indic languages are not available.

The base character encoding of Windows 2000 is Unicode version 2.1. A set of system tables consisting of Locale information such as date, time, number or currency format, and localized names of countries, languages, days, and months. Character mapping tables that match local character encodings (ANSI or OEM) to Unicode or vice-versa, Keyboard layout information, character typing information and sorting information are provided through the National Language Support in Windows 2000.

The Multilingual API (MLAPI) allows applications to handle keyboard input and fonts from different language versions. For example, it changes the keyboard layout tables or the fonts used to display text. It also handles text layout issues.

Language-specific information is stored in separate resource files. This includes information such as text for menus, dialog boxes, and Help files. Separating text allows system code to be shared by all language versions of Windows 2000. The changes can be made in the language-specific resource files.

The National Language Support API and the Multilingual API can be used to write generic code to handle data input, storage, and display for a large number of languages.

Windows XP

Windows XP Professional has multilingual support built into the operating system. Windows XP lets you enter, edit, and view data in many languages, but you can only change the language used for menus and dialog boxes by installing the Multilingual User Interface Pack (MUI), an add-on to the English version of Windows XP Professional.

Windows XP lets you display, input, edit, and print documents, including e-mail and Web pages, in many languages. You can specify the fonts, keyboard layouts, sort orders, date formats, and input methods for many languages from Regional options in Windows XP. Windows XP is available in 24 different language versions, called localized versions, in addition to English.

The technology used by Windows for multilingual computing is Unicode. If your application is an Unicode application in a supported language, it will run on both the English and localized version of Windows XP. You can switch between user interface languages only if you have installed the Multilingual User Interface Pack (MUI).

  Installing MUI

Multilingual User Interface Pack is a set of language specific resource files that allows the user interface language of the operating system to be changed according to the preferences of individual users to one of the 33 supported languages.

MUI also allows different language users to share the same workstation with their own localized user interface. MUI is not supported on Windows 9x, Windows Me, and Windows XP Home Edition. MUI runs on top of the English version of Windows.

Windows 2000

Windows 2000 has the Windows 2000 Professional Multilanguage Version.

Windows XP

You can install the Windows XP Professional MUI Pack to enable localization of your user interface. Besides, the Office XP Multilingual User Interface Pack is available for localizing MS office programs such as online help, wizards and templates. If you install the Office XP Multilingual User Interface Pack on a computer running the Windows XP Professional MUI Pack, Office XP detects the default user interface language of the Windows XP Professional MUI Pack and sets that as the default for all Office programs.

Related Links:

Windows XP Language Interface Pack

Based on MUI technology, LIP provides the desktop user with an approximately 80% localized user experience by translating a reduced set of user interface elements.

Read More : 
   Windows XP Hindi Interface Pack
   Windows MUI/LIP Knowledge Center
   Windows XP Professional Language Interface Pack (LIP) FAQ

  Formatting Dates, Time, number and currency

Regional options determine how operating systems and most applications format dates and time, numbers, and currency options such as what is the currency symbol and will it appear before and after the number. Regional settings also determine the order in which the operating system and applications sort lists, since different world regions have different conventions for sorting lists. The date format may change according the calendar followed by the country.

Date and Time can be formatted differently based on the locale. Most countries use the Gregorian calendar. The Date tab is used to set the date and the Time tab lets you change the time display format. This is fairly universal, except that if you are using AM/PM time and not 24 hour time, you may want to change the AM and PM symbols.

The international standard notation of the date is YYYY-MM-DD where YYYY is the year in the usual Gregorian calendar, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 31. The international time format is hh:mm:ss where hh is the number of complete hours that have passed since midnight (00-24), mm is the number of complete minutes that have passed since the start of the hour (00-59), and ss is the number of complete seconds since the start of the minute (00-60). The colon, period, and space are examples of valid separators for hours, minutes, and seconds. The letter h can separate hours and minutes. There is both 12-hour and 24-hour notation separated by a space.

Separators can be different in different locales or left out altogether. The hyphen, comma, period, space, and slash are all examples of valid separators for the day, month, and year. In numeric date formats, the month and day fields can be reversed, and, in some cases, the year field can come first. For example, the 4th of August 1992 can be written as either 4/8/92 or 8/4/92 depending on locale. In addition, users in other countries sometimes place the year first, so June 11, 1992 could be 920611 or 921106.

In Windows 2000,

  1. Click Start, Settings & Control Panel
  2. Click Regional Options
  3. Select the Date tab
  4. Select the locale or location whose settings you want to use. For example, to format numbers and dates appropriately for a Tamil user, select Tamil.
  5. Click Apply and OK

The Numbers tab will let you alter the way numbers are displayed. You can change the 'Measurement system:' option, i.e. Metric/U.S. The comma, period, space, and apostrophe are examples of valid separators for units of thousands. The period, comma, and the center dot are examples of valid separators for decimal fractions. Grouping may not be restricted to thousands separators.

Various countries indicate positive and negative values differently. The symbols + (plus) and - (minus) can appear either before or after the number. Negative numbers can be enclosed in parentheses in applications such as a spreadsheet.

Currency formats differ among various countries. The comma, period, and colon are examples of valid separators for currency. There can be one or no space between the currency symbol and the amount. The currency symbol can be up to four characters. The Currency tab will give you the option to change the Dollar sign to other currency symbols, say, Rs, as well as several options on how values are displayed.

  Language group/Language collection

All the languages and scripts that an operating system can support are grouped together to form a language group. It gets added when the user installs them by selecting them from Regional Options. A language group includes code-page information, keyboard layouts, and fonts. Installing each language group adds support for respective locales.

Windows 2000

The different languages can be grouped under 3 heads:

European languages – installed by default on XP English, supports Baltic, Cyrillic, Greek and Turkish Languages

East Asian languages – Chinese, Japanese, Korean

Indic Languages – Complex scripts, RTL scripts; support for Thai and all Indian Languages

The functionality for all supported scripts is available on all language versions of the Windows 2000 operating system. If additional languages are needed, you can install them separately using the Windows 2000 CD during or after setup. In Windows 2000, the languages are not grouped together as Language groups. They are represented separately as Indic, Japanese etc under the Language Settings in the Control panel.



Windows XP

In Windows XP, the ‘Language group’ is known as ‘Language collection’. After installing a language group or collection, you may have to restart the computer.

  1. Go to the Languages tab in the Regional and Language options window.
  2. Under Supplemental language support, select the check box beside the appropriate language collection or group. For e.g. 'Install files for complex script and right-to-left languages'

NOTE: The complex script and right-to-left languages include Arabic, Armenian, Georgian, Hebrew, the Indic languages, Thai, and Vietnamese.



  Locales

A locale is a collection of operating system settings that reflects a specific country's/region's language and cultural conventions.

Different countries may share a common language, say English, but their currencies, dialects etc may vary according to their country’s culture. For e.g., the English (Canadian), English (United Kingdom), and English (United States) locales reflect different countries/regions but they have the same language, English.

Types of Locales

1.User Locale

The user locale contains data related to the standard regional settings the country to which the user belongs to and not the language specified by the user. The user locale determines the formats used to display dates, times, currency, numbers, and the sorting order of text. The formats are based on the Standards and Formats settings of your computer. Setting the user locale does not affect the language settings, other than the language used to display the names of days and months, and time and date formats. This setting affects virtually all software you run on your computer. If you use a Microsoft Outlook, and you set your user locale to Hindi, you will note that the dates are displayed in Hindi. Changing the user locale does not need restarting of the computer.

Windows 2000

After you set your Standards and formats on the Regional Options control panel tab to Hindi, look at how the Numbers, Currency, Time, Short Date, and Long Date are displayed. They are now set to Hindi format.



Windows XP



2.Input Locale/ Input Language

The Input Language/Input Locale is the combination of the language being entered and the keyboard layout, IME or any other device being used to enter text. Each user can add multiple input languages to create multilingual documents.

Each input language has a default keyboard layout associated with it. Some languages also have alternative keyboard layouts.

Windows 2000

To select the default input language/input locale and Keyboard layout, go to Control Panel => Regional Options => Input Locales tab.

NOTE: (You can change the Key Sequence for switching between input locales from the Enable Indicator by clicking the Change Key Sequence button)

Ensure that the Enable indicator on taskbar checkbox is clicked. Click Add... button. Select desired Input locale, Keyboard layout/IME.

Windows XP

Under "Text services and input languages," click on the "Details..." button. Under Installed Services, click "Add...". In the Add Input Language dialog box, click the input language and keyboard layout or Input Method Editor8. (IME) you want to add and then click OK.



3.System Locale

This setting enables programs that do not support Unicode to display menus and dialog boxes in their native language by installing the necessary code pages and fonts. However, programs designed for other languages may not display text correctly.

Only applications that do not use Unicode as their default character-encoding mechanism are affected by this setting; therefore, applications that are already Unicode-encoded can safely ignore the value and functionality of this setting.

Windows 2000

On the General tab, click on the "Set Default..." button. From the drop-down list, select the language that your program needs, and click OK.



Windows XP

Go to Regional & Language Options => Control Panel. Select the language from the drop down under Language for non-Unicode programs.



  Annexe

Table 1: Table of languages spoken in various states of India

Indian States Language Script
Karnataka Kannada Kannada
Rajasthan, Haryana, Delhi, Uttaranchal, Himachal Pradesh, Uttar Pradesh, Bihar, Jharkhand, Rajastan, Madhya Pradesh, Chhattisgarh, Hindi Devanagari
Gujarat/ Daman, Diu, Dadar & Nagar Haveli Gujarati Gujarati
Maharashtra Marathi Devanagari
Goa Konkani Devanagari
West Bengal Bengali Bengali
Orissa Oriya Oriya
Jammu & Kashmir Kashmiri Sharada/Urdu/Devanagari
Assam Assamese Assamese
Arunachal Pradesh Nissi/Daffla  
Nagaland
Manipur
Meghalaya
Ao
Manipuri
Khasi & Garo
Assamese
Tamilnadu/ Puduchcherry(Pondicherry),Karaikal (Pondicherry) Tamil  
Kerala/ Lakshwadeep, Mahe (Pondicherry) Malayalam Malayalam
Punjab Punjabi Gurmukhi/Urdu
Andhra Pradesh/ Yanam(Pondicherry) Telugu Telugu
Mizoram Mizo  


Table 2: Table of the oldest Indic languages

Indic language Dialect
Vedic (sometimes called Vedic Sanskrit) North-western
Sanskrit Western Central India (eastern Punjab)
Pali North-central
Prakrit North-eastern
Magadhi Bihar
Shauraseni Developed from Pali
Maharashtri South-Eastern

  Conclusion

The main objective of this paper is to bring out the basics that ensure world readiness of a product. It does not intend to cover all information regarding the topic.

  References

MSDN-http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vsent7/html/vxoriplanningglobalreadyapplications.asp
UNICODE-http://www.unicode.org/



1. Transliteration is merely redisplaying the same text in a different script in a manner that pronunciation is not affected. This is not the same as 'translation' where the language itself changes.
2.See Annexe: Table1
3. Spoken in a certain geographical area
4.See Annexe: Table2
5. A collection of symbols used to represent textual information in one or more writing systems
6. A codepage is a list of selected character codes. Codepages are usually defined to support specific languages or groups of languages which share common writing systems
7. Total cost of ownership
8. The IME interprets the keystrokes as characters, and then gives the user the opportunity to insert the correct interpretation into the program being worked in

Top

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.