We’ve had someone volunteer who will work on the UK dialect Weblate translations for Gramps that is currently at 9%.
Gramps is nominally defined as being natively American. However, there are some British strings already in the master code. It would be fine if everything was one dialect or the other. But Gramps is inconsistent in its dialect. (For example, the About Gramps button to show the GPL uses the British “Licence” in 5.1.0 where Americans would expect “license”. By 5.1.5, that was changed to “License”. And there are instances of “colour” and the z/s differences.)
Will Americanizing the anglicization in the the master strings to be uniformly US english cause a new translation demand for the more than 40 translators of other languages? If so, is there a way to eliminate that 40x+ extra work?
(In a completely separate vein, the Windows AIO installer has the UK dictionary as mandatory. Yet it does NOT install the US dictionary to a US english OS by default. see bug 11060. This is on a system where the About Gramps reports: “LANG: en_US.UTF-8”. Are the Mac & Linux installers similarly affected? Or do none of the installers adapt the Dictionary selection to the OS language?)
Note, maybe 9% of differences or variations between American
and British and not untranslated at 91%…
Keep in mind that American is (generally) the fallback language.
If strings are fuzzy or untranslated, then gramps will display the hardcoded message ID in US (or UK…) english. So:
msgstr = msgid
If there is no variation between British and American, to let the ‘msgstr’ field empty will do the job and might avoid extra work or typos.
You can quickly make at test for massive changes on translation files, without generating a new template and merging modified entry on all files. For hardcoded typos on code (python, glade, files, etc.), you will have to make it properly later for avoiding any future overwrite after next ‘gramps.pot’ regeneration. i.e., quick workaround and temp change on translations files:
$ sed -Ei 's/5.1.0/5.1.5/g' *.po
This should modify the string/entry/expression on all translation files for the ‘5.1.0’ sequence/set/string, by replacing it by ‘5.1.5’.
If you modified the “msgid” too, then maybe just update translation files, once more, by merging them with the gramps template (gramps.pot), which should be ‘safe’. Sure, need to fix the code before (consistency on wording “licence/license”).
msgattrib - attribute matching and manipulation on message catalog
msgcat - combines several message catalogs
msgcmp - compare message catalog and template
msgcomm - match two message catalogs
msgconv - character set conversion for message catalog
msgen - create English message catalog
msgexec - process translations of message catalog
msgfilter - edit translations of message catalog
msgfmt - compile message catalog to binary format (.po->.mo)
msggrep - pattern matching on message catalog
msginit - initialize a message catalog
msgmerge - merge message catalog and template
msgunfmt - uncompile message catalog from binary format
msguniq - unify duplicate translations in message catalog
For example, the About Gramps button to show the GPL uses the British “Licence” in 5.1.0 where Americans would expect “license”. By 5.1.5, that was changed to “License”.
oh, need to look at “Licence” vs “License”…
Sorry!
In french (France, Belgium, Switzerland,etc.), “Licence” will be translated by “Licence”!
Does Canadian rather use “License” word?
If you can write a rigorous specification of the changes between UK and US (e.g. color vs. colour), I have a macro-generator I can run against the various .po files to “normalise” (UK spelling ) the msgid. Thus the translation need not be reviewed. Provided, of course, the changes in .pot are limited to US-ifying it, without additional messages or other significant change.
The specification is needed to write the macros.
EDIT: as an after-thought, the macro-generator can be run against all .py files (and perhaps also against .glade ones) to automatically normalise/normalize the _(…) strings to US spelling. This would be nearly the same job as in the .po files. All is needed is a bash script wrapper to process all the .py files. Not very hard.
That sounds great. There are opportunities to normalize some variants and reduce the size of the PO files. (There are a few typos, capitalization & grammar errors that could be cleaned too.)
I’m about 15% through the Weblate list of strings.
I have already done this kind of capitalization on a XIX century Norwegian novel converted to Bokmål Norwegian with simultaneous spelling changes to match current dictionaries. There were also punctuation normalisation to fix OCR misinterpretations. And it is very fast: 4 seconds on my CPU for a ~800k chars file. The macros can be adapted to Gramps, provided you give a strict specification (in English; don’t try to design macros; I use non-standard principles to get context-sensitive triggering though the patterns are quite similar to regular expressions).
I don’t think the normalisation can systematically reduce the size of the PO files. Having made my own translation (because I disagree with some translator(s) decisions), I noticed that the gettext framework can lead to erroneous or ambiguous “categorisation” because of insufficient context data. The same word may need different translations in different contexts and this context data is not always provided because coders could not foresee the need for it. Also, some strings are shared between the CLI and the GUI parts. The CLI will usually be rendered with monospaced fonts while fonts in the GUI are proportional. This becomes important when messages need to be formatted in columns where alignment is done with spaces. I didn’t find any satisfactory solution to this dilemma.
One of the most embarrassing problems is message collection before translation. For an unknown reason (to me), messages are not collected in the same order without apparent reason (there are no new files, only some changes here and there). This prohibits use of diff or other comparison utilities to quickly see what changed. Therefore, everything must be re-read; not user-friendly when you have more than 8000 messages.