Translations/gettext - make context the norm?

I reviewed one of .PO translations recently. It turned out the only way to understand it properly was to grep through code and see how some terms are used. For one-word terms, most of them are ambiguous.
In some cases, a word is used for action is one place, and as a noun in another - ie. “Report”. While this English word can be both a noun and a verb, different languages have distinct words for these.

Also, translations of menu items and button text should be handled differently, but the only way to check what they are is to look at the source.

So the issue I’m seeing with .PO/.POT translations is that the strings lacks context. And there is a feature to fix that - gettext does support passing context, and small percentage of strings uses them.

What I’d propose is to make context the norm, rather than an exception for special cases - mark all menu strings and button strings, as well as all one-word strings.

What do you think?

This would be a reasonable rule for new strings, particularly short strings without both subject and verb.

Attempting to go through the current code to add context would be a very large job in itself. I suppose it could be done by a very motivated person.

But worse, it would invalidate translations for the affected strings. Adding another very large amount of work for all the translators.

I’m sure that this would result in a much improved experience for non-English speaking users, but I’m not at all sure we could get such a project done in a reasonable time with the current number of volunteer developers and translators.

Should be a first pass be to consolidate the redundancies where possible?

Each variant spelling or eliminated homonym saves 40-some people work if a new approach requires strings to be re-visited.

And maybe it reduces the overhead too?

Is there a way to harmonize this stuff efficiently?

And is there a tool for developers to run their proposed strings against the existing database for similar (NOT exact matching) strings? It makes their choices easier if they can see “n instances of the ‘share’ tooltip in m modules” versus “x instances of the ‘from existing <object type>’ tooltip in y modules”.

If all strings have context, this increases the total amount of strings to be translated - strings which are identical in English but have multiple contexts, appear multiple times.

So this doesn’t aim to decrease the amount of strings, but the amount of time you need to spend on each string to translate it correctly. When I reviewed a commit someone did recently with polish translation, I had to search many phrases in the code to make sure what they are (and btw many were incorrect, as most translators just go with what they have on the screen without deep verification). Providing context would give more information to the translators.

Introducing the context requires first agreeing how context strings should look like. Currently they are treated sometimes as some kind of keyword, ie. _("Link", "notetype") or _('Select_a_media_selector', 'manual'), and sometimes as verbose text description, ie. _("%", "percent sign or text string") or _('Min. Conf.', 'Citation: Minimum Confidence').

Automating translation - I don’t think it is related to this subject. This is a question to translation helper software, ie. Weblate (which latest commits reveal that is used by Gramps), or POEdit.

Example shell command to print translations with context in the current code:

find ./gramps -type f | xargs sed -n 's/^.*\(_[\(]['\''"][^,\)]\+['\''"],[^,\)]\+[\)]\).*$/\1/p' | sort -u

Not a problem. POT file can be generated from the code with some more contexts introduced, then when POs are updated to recent POT, the related translations will just be marked as ‘fuzzy’. They will still work.

And if somehow that won’t work, this can be easily fixed by few simple shell commands.

Here is an example for the decision about what to use in context. I took the word “Report” I talked about before. I fix it below in two ways - with short context, and with longer and more structured one.

Fix 1. Short/minimalistic context. Generating two messages. But it doesn’t provide the translators with details about where this is used, only focuses on the ambiguity of the word:

diff --git a/gramps/gen/lib/notetype.py b/gramps/gen/lib/notetype.py
index 7cef15989..b7d0092cc 100644
--- a/gramps/gen/lib/notetype.py
+++ b/gramps/gen/lib/notetype.py
@@ -73,4 +73,4 @@ class NoteType(GrampsType):
         (CITATION, _('Citation'), "Citation"),
-        (REPORT_TEXT, _("Report"), "Report"),
+        (REPORT_TEXT, _("Report", "as noun"), "Report"),
         (HTML_CODE, _("Html code"), "Html code"),
         (TODO, _("To Do", "notetype"), "To Do"),
diff --git a/gramps/gen/plug/_pluginreg.py b/gramps/gen/plug/_pluginreg.py
index 10c08db4d..bfa1514a8 100644
--- a/gramps/gen/plug/_pluginreg.py
+++ b/gramps/gen/plug/_pluginreg.py
@@ -76,7 +76,7 @@ RULE = 13
 PTYPE = [REPORT, QUICKREPORT, TOOL, IMPORT, EXPORT, DOCGEN, GENERAL,
          MAPSERVICE, VIEW, RELCALC, GRAMPLET, SIDEBAR, DATABASE, RULE]
 PTYPE_STR = {
-        REPORT: _('Report') ,
+        REPORT: _('Report', 'as noun') ,
         QUICKREPORT: _('Quickreport'),
         TOOL: _('Tool'),
         IMPORT: _('Importer'),
diff --git a/gramps/gui/logger/_errorview.py b/gramps/gui/logger/_errorview.py
index 210a7762a..f7dcbd810 100644
--- a/gramps/gui/logger/_errorview.py
+++ b/gramps/gui/logger/_errorview.py
@@ -164,3 +164,3 @@ class ErrorView(ManagedWindow):
         self.top.add_button(_('_Cancel'), Gtk.ResponseType.CANCEL)
-        self.top.add_button(_("Report"), Gtk.ResponseType.YES)
+        self.top.add_button(_('Report', 'as verb'), Gtk.ResponseType.YES)
         self.top.add_button(_('_Help'), Gtk.ResponseType.HELP)

Fix 2. Structured context. Generating three messages. The structure of the context is “Category: specifics” - it provides the translator with full information about where the string will appear. Categories could be ie. Title, Menu, Button, Enumeration.

diff --git a/gramps/gen/lib/notetype.py b/gramps/gen/lib/notetype.py
index 7cef15989..b7d0092cc 100644
--- a/gramps/gen/lib/notetype.py
+++ b/gramps/gen/lib/notetype.py
@@ -73,4 +73,4 @@ class NoteType(GrampsType):
         (CITATION, _('Citation'), "Citation"),
-        (REPORT_TEXT, _("Report"), "Report"),
+        (REPORT_TEXT, _("Report", "Enumeration: type of note"), "Report"),
         (HTML_CODE, _("Html code"), "Html code"),
         (TODO, _("To Do", "notetype"), "To Do"),
diff --git a/gramps/gen/plug/_pluginreg.py b/gramps/gen/plug/_pluginreg.py
index 10c08db4d..bfa1514a8 100644
--- a/gramps/gen/plug/_pluginreg.py
+++ b/gramps/gen/plug/_pluginreg.py
@@ -76,7 +76,7 @@ RULE = 13
 PTYPE = [REPORT, QUICKREPORT, TOOL, IMPORT, EXPORT, DOCGEN, GENERAL,
          MAPSERVICE, VIEW, RELCALC, GRAMPLET, SIDEBAR, DATABASE, RULE]
 PTYPE_STR = {
-        REPORT: _('Report') ,
+        REPORT: _('Report', 'Enumeration: type of plugin') ,
         QUICKREPORT: _('Quickreport'),
         TOOL: _('Tool'),
         IMPORT: _('Importer'),
diff --git a/gramps/gui/logger/_errorview.py b/gramps/gui/logger/_errorview.py
index 210a7762a..f7dcbd810 100644
--- a/gramps/gui/logger/_errorview.py
+++ b/gramps/gui/logger/_errorview.py
@@ -164,3 +164,3 @@ class ErrorView(ManagedWindow):
         self.top.add_button(_('_Cancel'), Gtk.ResponseType.CANCEL)
-        self.top.add_button(_("Report"), Gtk.ResponseType.YES)
+        self.top.add_button(_('Report', 'Button: send a report'), Gtk.ResponseType.YES)
         self.top.add_button(_('_Help'), Gtk.ResponseType.HELP)

The idea in my head evolved a bit - I no longer think all strings should have context. The long messages, being muti-line / multi-sentence, are always used as either labels(description area within some window), messages or tooltips. Distinguishing between these types doesn’t influence translation. Also being long, these messages provide enough context by themselves. So I think only single-line / single-sentence messages should have contexts.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.