Indian Languages

Working in Multiple Indian Languages on your computer.

Name: Editor
Location: India

Saturday, October 08, 2005

The Indian Languages -

Hey u guys did you all know that it is now possible to do spell check in all the indian languages using the software developed by CDAC, Pune ---

Yes it is now very much possible to do spell check in MS Word (Microsoft) in all the indian languages.

Not only this, CDAC's "iLeap" software in the first indegeously developed software meant for Indian Language word processing... The speciality of this software is it is possible to "transliterate" between the various Indian Language scripts as well as" Roman English". By "Transliteration" we mean that it is possible to do character replacement between the Indian Scripts, but the 'sense' and 'grammar' shall remain of the original script in which the text was entered or typed. Unique isn't it !!! ... be sure to catch such surprising announcements - do keep reading these pages !!!

Tuesday, October 04, 2005

Gist Cards - The Technology (Part III)

Coming directly from C-DAC (or through the authorized C-DAC GIST Hardware Dealer) , the pioneer of the GIST technology, C-DAC GIST Cards incorporate all the latest developments in the product (both hardware and the firmware) that result from the continuous and dedicated R&D efforts going on in C-DAC. These latest technological advancements are based on the feedback of our customers and dealers, and on our own research and study of the new/additional requirements and applications of the customers.

C-DAC GIST Cards now come with the latest Firmware version 8.041, unique to C-DAC GIST Cards, and offer some new and advanced features like Landscape printing on HP LaserJet (A4 & A3) printers; VL-VGA/PCI-VGA/Integrated VGA compatibility through improved Device Driver (GISTDRV.SYS V1.31); a customized, WordStar-compatible multilingual word processor (ALP For GIST Card Ver 3.0) etc.

C-DAC has also developed drivers for UNIX SVR 3.0 & 4.x and SCO UNIX to facilitate use of GIST Card on UNIX consoles. These latest features are unique to the GIST Cards manufactured by C-DAC in Pune .The latest, unique features and utilities that come bundled only with C-DAC GIST Card are :-
1. GIST Firmware Ver 8.041 - with facility for Landscape printing on HP LaserJet (A4 & A3) printers.
2. GISTDRV.SYS Ver 1.31a - DOS Device Driver with support for GIST Card (on ISA Bus) on VL-VGA/PCI- VGA/ Integrated VGA motherboard.
3. ‘ALP For GIST Card’ Ver 3.0 - WordStar compatible multilingual word processor with Manual.
4. INSCRIPT Keyboard Tutor - Online Tutor to facilitate ease of learning to type in Indian languages with INSCRIPT Keyboard Manual.
5. Separate Boot ROMs for GIST 9000 and GIST 9001 - the Firmware for GIST 9000 and 9001 based cards have different signatures so that one does not download incorrect image file by mistake.
6. Tell Code Ver 1.1 - for instant offline information about file type (ISCII-8/ISCII-7/PC-ISCII/ASCII).C-DAC GIST Card is now able to co-exist with a Ethernet Card on LAN (Novell NetWare V3.x & 4.x) even when Remote Boot ROM is enabled.

Indian Languages -- Gist Cards

Salient Features of Gist Cards :

* CDAC GIST Cards for Indian Scripts come with the latest Firmware version 8.041.
* GIST Cards support printer drives for all Epson & IBM compatible 9-Pin, 24-Pin Dot matrix printers HP Laser Jet / Desk Jet and Mannes mann Tally Dot matrix line Printer.
Landscape printing on HP Laser Jet (A4 & A3) printers.
* CDAC GIST Card generates a high resolution, monochrome bit-mapped graphics display for neat rendition of complex GIST scripts.
* VL-VGA / PCI-VGA / Integrated VGA compatibility through improved Device Driver (GISTDRV.SYS).
* Inbuilt WordStar-compatible multilingual word processor (ALP For GIST Card Ver 3.0)
* It is designed to support the prorotionally spaced scripts as the Indian scripts are, other than this it supports Perso-Arabic,Sinhalese, Tibetan / Bhutanese, Thai and Russian scripts.
* ALP-G has spellcheckers for HIndi, Marathi and Gujarati (optional)
* It has mail merge facility.
* Online Keyboard tutor.Background printing of files
The Current version of C-DAC GIST Card is V1.44 - for VGA. C-DAC GIST Cards support two modes to display GIST Scripts namely DATA Page and SCRIPT Page. DATA Page is compatible to the IBM screen display and SCRIPT Page is specially created for multiscript operation considering the variable width of the characters of the various Indian Language Scripts. The CDAC's is the only reliable add-on card present for perfect data processing and word processing applications under DOS enviornment for Indian Languages.

Gist Cards for UNIX / XENIX Operating Systems

The second of our details of Gist Cards used for DOS based Indian Language Computing :
GLINK : GIST Terminal Emulation software GLINK enables a PC with a GIST Card to behave like a GIST Terminal when connected to a multi-user system, like UNIX, XENIX etc., over the RS232 serial port. It provides full functionality of a GIST Terminal, and supports DEC VT52, VT100, VT220, VT320 and ANSI standards. It supports local script printing through the Centronics compatible parallel printer port on the GIST Card. GLINK allows the user to do data as well as word processing in UNIX/XENIX envioronment in Indian scripts and languages.

Indian Language Computing - Products Details

In all our future posts we are now going to discuss in detail the various products in developed by M/s.CDAC, Pune for Indian Langauge Computing . In step by step we shall discuss the following products starting from the DOS based products to right up to WINDOWS and LINUX.

Initially we shall talk about the GIST Cards :-

C-DAC GIST Cards : An Introduction (Part I)
C-DAC GIST Cards are PC add-on cards that allow the use of Indian & other scripts along with English in character-oriented packages like dBase, Lotus 1-2-3, WordStar, FoxPro, FoxBASE, QBasic etc., and compilers like C, C++, Clipper etc. on MS-DOS. A software driver allows the user to type in English and other scripts using the INSCRIPT Phonetic keyboard. The firmware for downloading the scripts is provided along with the GIST Card. Printing through the packages is done using the normal print commands of that package.

Support for GIST Card is also available on PC-based UNIX (Ver 3.2 & SVR4) platform. The Terminal Emulation software GLINK enables a PC with a GIST Card to emulate a GIST Terminal. GIST Card is machine (node) specific, and can also be used in a LAN (Novell NetWare 3.x) environment. GIST Card, however, does not support Windows environment and graphics applications. C-DAC GIST Cards have a tremendous potential of use for various applications, especially where databases have to be compiled and maintained and reports / forms / documents have to be generated in the local/regional Indian language. Similarly, C-DAC GIST Card has penetrated the lower end of the business sector where it is being used for various low-level applications like accounting, payroll, inventory etc.

Independent software developers have already developed application packages in FoxPro, QBasic etc. with the help of C-DAC GIST Card which give the facility of using the local/Indian language on computers to the users of these packages.

Monday, October 03, 2005

Terminology used in GIST products - Configurable Keyboard Overlay

Configurable Keyboard Overlay :- With all the GIST packages except LEAP one more DOS based utility is provided for configuring the user defined keyboard layouts. The name of this utility is 'CFGKBD' (ConFiGure KeyBoarD) which will be available in the respective GIST Package directory. See its help file for getting more details. Also read the GIST Package release note to know about usage of configured keyboard overlay.
Though INSCRIPT keyboard layout is standardised for all the Indian Scripts the Configurable Keyboard Overlay is useful for customised keyboard layouts/typing depending upon the user.

Terminology used in GIST products - Phonetic English Keyboard Overlay

PHONETIC English Keyboard Overlay:- This keyboard overlay, has the Indian script alphabets phonetically assigned to that of English alphabets on the IBM-PC QWERTY overlay. It is drawn upon the simple rule that all the Indian languages /scripts are based on 'phonetics'. Meaning the pronounciation of a particular sound in any Indian language/script is the same irrespective of the language/script.

Terminology used in GIST products - Inscript Keyboard Overlay

INSCRIPT Keyboard Overlay :- The Inscript (Indian Script) keyboard overlay was standardized by the DOE in 1986 with a subsequent revision in 1998. This keyboard overlay is phonetic in nature and is common for all the scripts provided with this software.
The Inscript overlay contains characters required for all the Indian scripts, as defined by the ISCII character set. The Indian script alphabet has a logical structure derived from the phonetic properties. The Inscript overlay mirrors this logical structure. Due to the phonetic nature of the keyboard, a person who knows typing in one Indian script can type in any other Indian script. The logical structure allows ease in learning. Please refer to the Inscript keyboard manual provided with the package for details.

Terminology used in GIST products - Keyboard and Character Blocks

Keyboard and Character Blocks :- Common keyboard for all the Indian scripts As the character set is common, the keyboard layout for all the scripts is also the same. The keyboard layout is logical and very simple to learn. The vowels are on the left part of the keyboard and the consonants are on the right side. The vowels are kept on the SHIFT position of the corresponding vowel matra. The aspirated consonants are on the shift position of their un-aspirated counterpart. Consonants in one Varg are kept in one vertical column.

The way you pronounce is the way you type it. Typing is very easy : just key in the characters in the same sequence you pronounce them in a word. GIST will take care of displaying the matras at their proper position and creating the conjuncts. The language specific rules to do these things are known to the system. (e.g. vidya, kra, gra, tra, dra, sra, hra, kSa, pra-rpa).

If you know typing in one Indian script, you can type all Indian scripts. Characters, which are specific to a script, are provided in the same layout. Tamil Za, short "a" sound, Marathi half "ra". In the Script Page the cursor moves from one char block to another skipping the intermediate characters in that block.

It is not worth comparing the mechanical typewriter keyboard layout with the Inscript layout. There is a basic difference in their operation. In the mechanical typewriter typing its parts creates each conjunct. In Inscript type the sequence of the chars and the system creates the conjuncts. The keyboard layout is logical .The keyboard layout is the fastest layout for implementation of the Indian scripts on the computers. The best example would be newspapers like Sakal, Loksatta, and Free-Press Journal, which have been using it for their daily editions.
THE CHARACTER CODE and KEYBOARD LAYOUT IS COMMON TO ALL THE GIST PRODUCTS

Terminology used in GIST products - Transliteration

Transliteration :- The origin of Indian scripts is the Bhramhi script, which is a phonetic script. The basic sounds which are "depicted" using the script characters are same. That is, the character remains the same but its representation changes from one script to another. Thus transliteration is easily possible; same characters are to be displayed using the rules of some other script.

Transliteration between Indian scripts is possible because a common character code set is used to represent all the scripts. Transliteration between English and Indian scripts is NOT possible because:
* The character codes are not the same (Not compatible)
* Roman script is not based on sound (put/cut), therefore mapping from English to/from Indian script cannot be done.
* The proper nouns can be transliterated between English and Indian scripts using the Ntrans software package.
* Translation is a HUGE TASK which is not as simple as transliteration as the knowledge of the language, the grammar, the words and their usage in the language and more over the context of the text being translated must be known to the translation system. (e.g. kutte ne billi ke sath jhagda kiya: various meanings in Hindi. I eat mango: translate from English to Hindi- Gender problem).
* Transliteration is a common feature on all the CHARACTER-based GIST products for Indian languages and scripts on computers.

Terminology used in GIST products - Script Page

Script Page :- is a bitmap graphics page that provides word-processing support for the proportionally spaced scripts. Depending on the resolution of the adapter 14 or 9 rows can be displayed on the screen. As the characters are proportionally spaced, the number of characters per line depends on the individual character.
On Script Page during the horizontal cursor movement, cursor "jumps" from one character block to another skipping the chars in the that block. Ready-made text-oriented packages can not be used in the Script Page. Text in multiple Indian scripts/language can be used with various styles using the script and display attributes.

Terminology used in GIST products - Data Page

Data Page :- is exactly similar to the standard IBM screen (display page) 80 characters and 25 rows and in addition Indian script/language characters can be used along with the English characters. 8 bit PC-ISCII and 7 bit ISCII codes can be used here.

Terminology used in GIST products - 7 Bit ISCII

7 Bit ISCII :- Uses only lower half of the 8-bit code.
The words beginning with the char 'x' are displayed in Indian script.For English words starting with an 'x', an extra 'x' should be given. Thus English and Indian script can co-exist within 7-bit code.

PC-ISCII and 7 bit ISCII codes can be used in the Data Page of the GIST card. ANY text-oriented package can be used for Multilingual or Indian Language processing using the GIST card. This is needed for the applications such as WS, which use 7-bit code for to represent the characters.
Only the basic alphabets needed in Indian scripts are defined in the character code set. No conjuncts are defined here. The CHARACTER SET DOES NOT DEFINE THE SHAPES of the CHARACTERS. In general, it can be said that the character set defines the "sound". All the characters in the character set are kept in the proper sequence so as to make sorting possible without modifying the sorting algorithm to define the colliding sequence.

Terminology used in GIST products - 8 Bit ISCII

8 Bit ISCII :- The standard code conforming to the recommendations set by ISO.
* Contiguous character codes.
* Allows script and display attribute code mechanism.
* Can be used under the ScriptPage of the GIST card.
* Existing text-oriented packages don't use this code: Only custom designed applications such as ALP, MDP etc. can use it. A generic name is given to this char code set: ACII (Alphabetic Code for Information Interchange). Character codes are defined for non-Indian scripts.

Friday, September 30, 2005

Terminology used in GIST Products - PC ISCII

PC ISCII :- 8bit code to be used along with the Text-oriented packages.

* The line-drawing chars are not used: most of the existing packages use these for displaying menus/ boxes.

* The character set gets divided into two parts as the line-drawing chars are not assigned any code.

* NOT as per the recommendations defined by the ISO for a char set: the char set should be contiguous.

* Editors such as NE, MS, and database packages like Dbase, FoxBASE etc.

INSCRIPT KEYBOARD Layout

Mechanical limitations in composition on Typewriters

Scattered shapes versus Inscript layout

Data Processing and standards for Indian scripts

The advantage of Spell-checkers

Resistance to change from Typewriter

English Phonetic typing for QWERTY users

Learning Inscript Keyboard

Conclusion

Data input through the keyboard forms an important part of the computer usage. And, for using Indian languages on computers we must review the options. The two options are the layout of mechanical typewriters and the Inscript (Indian script) layout.

Mechanical limitations in composition on Typewriters:
As many are aware, the Typewriters were made in the time where mechanical restrictions forced poor design of QWERTY keyboard for English, ensuring slower typing for all. Our Indian languages being more complex, the situation is worse as different companies tried different keyboard layouts to manage the composition using the character slices. There have been changes and experienced typists tell us that they have changed layout atleast six times. There is no standard for typewriter layout in each Indian script. And it is different for various scripts.

In contrast INSCRIPT is only one layout and like QWERTY you can use it with any standard software that is available on computers. In school we were taught to write consonant first and then put the matra since our languages are phontetic. To type short-ii sound I need not type it before the consonant. Uniformity and ease of composition are the two strong points of the INSCRIPT keyboard. A person who follows this will intuitive method will have less spelling mistakes than the one who treats the script as a collection of character slices.

Scattered shapes versus Inscript layout:
For a learner the typewriter keyboard is a nightmare since it requires memorizing the positions of shapes. The logic of placement is not visible. On Inscript layout there is no need of mugging-up as there anyone can see that vowels are on left side and consonants on the right. Various "vargs" and "matras" are kept together. This road-map is very intuitive and easy to learn and therefore requires less practice in order to start typing and achieve a good speed. It is ideal for fast touch typing and good even for the infrequent user.

Data Processing and standards for Indian scripts:
New technologies bring new demands such as sorting the database. This is impossible using the type-writer keyboard layout, since sorting would give wrong results. Thus to take an example for certain matras like "small-ii", and instead of sorting on "Kii" the sort would be on "Ik".

In ISCII it has been easier to achieve basic data processing operations like alphabetic sorting only because of phonetic approach. And typing is closely connected to storage as well as intuitive editing. Bureau of Indian Standards has made ISCII and Inscript as a national standard (IS 13194:1991) to be followed on Computers and other electronic media so that multilingual data entry, storage, processing and exchange is possible and easier.

Amongst the software which allow typing through Typewriter layout, there is lot of inconsistency. None actually conforms to any of the typewriters. They have provided retrofits named as "XYZ-like keyboard" and most users experience stress and actually undergo change to type on them. On certain software a configurable keyboard was provided. Since composition in Indian scripts is complex and unlike English it cannot be re-configured satisfactorily to meet the specific needs. It can only be an approximation and help the users in transition.

The advantage of Spell-checkers:
There are other problems of quality of composition when typed through the typewriter layout. Tools like Spell-checkers and thesaurus can do a poor pattern matching since they cannot interpret the language constructs. Without a proper Spell-checker any Word-processor is incomplete and the power of computer is not harnessed at all. A Spell-checker provides the confidence for increased usage of Indian languages in correspondence and publishing.

Moreover the typewriter keyboard is cumbersome and does not really allow for exact reproduction of characters e.g. the quarter "Ra" in Marathi as in "VARYA" is just a hyphen added before the character. A Spell-checker can't distinguish this as "Ra". This has been one of the major reasons for poor response to Vernacular typing.

Actual reports from the Department of Official languages says that very few people complete the typing training amongst the ones who are requested or deputed. People find it complex to compose mentally and type. It takes them longer to learn. Many people give up early and most prefer English for this reason. Inscript Keyboard is easy to learn and attaining a fair speed of 40-60 w.p.m. does not take much time. Also, pressure on attaining higher speed was more in case of Typewriters since every small correction required one to retype the entire page. With powerful editing features on Computers, one is not affected much by the speed even in English as productivity is higher.

Based on ISCII storage, powerful Spell-checkers have been made in many languages like Hindi, Gujarati, Marathi, etc. and are being enriched further.

Resistance to change from Typewriter:
It takes some time before people learn a new thing which is more productive. The chief stumbling block is their fear of losing their speed on Typewriter due to expected unlearning of earlier skill. Actual cases have shown that it is not so. Our brain is very versatile. It can co-ordinate same hand and feet in different ways to allow us to ride a bicycle or drive a scooter or even a car. All on the same occasion by choice. We use the same hands to write with a pen or to type. Similarly, we can type on Typewriter and can "write" also with the INCRIPT keyboard.

The other is - whether they will be able to master it or not. And this is genuine since they carry an impression of great difficulty for learning vernacular typing. They feel that phonetic typing will involve mugging and extensive practice. But it is not so. We prefer to call it phonetic writing as opposed to typing. It is closer to what we learnt at an young age and we still do - write. People, who took months to get started on typewriter, got going on phonetic keyboard in a few days after their worries are taken care of.

The users of typewriters pose a question repeatedly for providing their layouts in the popular multilingual solutions. They must understand these issues before asking for change in layout in software. These facts will make them aware of the reasons and help them in learning Inscript layout for the future ease on computers. It is beneficial to undergo this transition as early as possible.

English Phonetic typing for QWERTY users:
GIST has now introduced another concept of English phonetic typing on certain products. This is useful for persons not familiar with the script and who wish to get started using QWERTY layout itself. However it is best advisable to learn the Inscript key-board itself for maximum efficiency. English Phonetic typing may not be efficient for high speed use which is done best on Inscript layout directly.

Learning Inscript Keyboard:
To facilitate the learning process a Inscript Keyboard Tutor software is now available free from CDAC or with any GIST product. This is preferred since most persons do not have time to attend specialized full-time courses. And above all once the Inscript keyboard is mastered, the same typist can type in any Indian language whether Hindi, Marathi, Gujarati, Malayalam or Tamil of course, provided he can read the script. Multi-lingual typists can be more versatile and hence have better job opportunities.
For the new users it is important that they find it easy to use our languages. If it is difficult on typewriter, the resistance is so strong that even the Department of Official Languages finds it difficult to make more persons attend the typing training courses. It should be as easy or even easier to use than QWERTY layout for English. They should be able to do it themselves. This is the idea behind the simple and well laid out Inscript layout which is common for all Brahmi based scripts.

Conclusion:
Without trying to under-rate in any way the Typewriter keyboard, its many disadvantages preclude its use as an ideal solution to its being adopted onto the Computer. Inscript layout is a logical choice as an ideal solution which will harness the power of computers for the benefit of languages into the future.

Monday, September 26, 2005

Development of Indian Languages in Information Technology

After the Govt. of India was impressed by the work done by 4 Kanpur IITians in developing the GIST Technology for Indian Language computing, the Department of Electronics (DOE), Govt. of India, gave them space and funds to further develop the technology at CDAC (Centre for Development of Advanced Computing) Pune. Incidentally, CDAC has also created the Indian Super Computer called "Param" that utilises parallel processing and has been installed at several universities and organisations across the world.

Advancement of Indian Language computing

After the start of GIST Technology, then DOS/Unix based, hardware and software the next step was the introduction of ISM which stands for ISFOC Script Manager, where ISFOC stands for Indian Scripts FOnt Code. This software is a font package of all the Indian Scripts for usage under Windows/Linux enviornment. Apart from having a collection of aesthetic fonts in all the various Indian Scripts one can also take the advantage of various word processing and editing features which support Indian Language Scripts. Presently the following Indian Language Scripts are supported --- Assamese, Bengali, Devnagri (Hindi, Marathi, Sanskrit), Gujarati, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu. In the foreign languages the package supports Thai, Sinhalese, Nepalese. A seperate pack is available for Perso-arabic scripts - Arabic, Urdu, Sindhi and Persian.

GIST Technology for Indian Languages on computers

The acronym "GIST" stands for "Graphics and Intelligence based Script Technology". This technology was developed by 4 passionate and patriotic IITians at Kanpur in the year 1986. After the project was successful, the Govt of India funded it for further development which finally led to the develpment and productionisation of the "good old" GIST Card. The Gist Card is an Add-on card which allows the users to work in all /any indian script of his choice for his DOS based data/text processing applications. The Gist Card is available for both ISA and PC (16bit/32Bit) slots, and can be easily installed on the motherboard of the Computer.

Then came the GIST Terminals - which provided Multilingual or Indian Languages interface for UNIX enviornment.

Both the Gist Cards and Terminals are extensively used by Nationalised Banks and National Informatics Centre for their data processing in Regional and Indian Languages.

Sunday, September 25, 2005

Indian languages on your computer

This blog will discuss the technology used for displaying the Indian languages on your computer. We shall also talk about saving the data in your favourite word processing program in your favourite language.