Placeholders in Hebrew

Hi guys,
Iโ€™m helping with the Hebrew localization of Gramps, I must say you are doing a wonderful job :slight_smile:

In Hebrew we have some requirements regarding placeholders which are a bit unique:
Instead of using a whole word to translate terms like โ€œInโ€ weโ€™re using a prefix letter โ€œื‘โ€, in cases where the text is completely in Hebrew it doesnโ€™t matter but when mixing this prefix letter before a Latin (or any other script) word or a digit we are adding a supporting character between the prefix letter and the Latin word called Maqaf which looks like this - ึพ (a high hyphen).
Thereโ€™s also a special case where any prefix letter except for Vav (addition) is the prefix letter before a word starting with Vav the Vav of this word should be doubled (reflected in the MediaWiki sample).

I read about the Date Handler but I think this issue is a bit wider.

How does it look?

  • Jacob died on 5th of January - ื™ืขืงื‘ ื ืคื˜ืจ ื‘ึพ5 ื‘ื™ื ื•ืืจ
  • Jacob buried on October - ื™ืขืงื‘ ื ืงื‘ืจ ื‘ืื•ืงื˜ื•ื‘ืจ (notice the Maqaf is missing)
  • Jacob buried in Barcelona - โ€ซโ€ื™ืขืงื‘ ื ืงื‘ืจ ื‘ึพBarcelonaโ€โ€ฌ (in case the name of the place is not translated)
  • Jacob buried in Petach Tikva - ื™ืขืงื‘ ื ืงื‘ืจ ื‘ืคืชื— ืชืงื•ื•ื”

This is an explanation about the prefixes:

This is an explanation about Maqaf:

This is a code example from MediaWiki:
First Comment

1 Like

This is the code from MediaWiki as promised:

All these types of string can be found in the libnarrate.py file.

The locale code can be identified using self._locale.locale_code() == 'he'.

Running the detailed ancestral report in Hebrew using the example database gives results like:

%(male_name)s was born on %(birth_date)s in %(birth_place)s.
%(male_name)s ืœื™ื“ื” ื‘ึพ%(birth_date)s ื‘ึพ%(birth_place)s.
Lewis Anderson ืœื™ื“ื” ื‘ึพ1855-06-21 ื‘ึพGreat Falls, MT, USA.

My first reaction was to use regular expressions to add or remove the Maqaf character as required. Unfortunately the python re module didnโ€™t work as I expected. Iโ€™ll have to look into this further when I have more time.

I also couldnโ€™t find any RTL or LTR marks in the strings, but they do react in the editor as if they are RTL.

Perhaps someone can point me in the right direction.

1 Like

Hi Nick and thank you for the quick response (and sorry for my slow response).

Well, if the field is Right-to-Left in the first place thereโ€™s no need to add LRM or RLM characters in such cases.
Regex is a very good solution in that case but Iโ€™m not sure it feasible.

Regarding the examples youโ€™ve mentions, letโ€™s take the last part of the sentence:
ื‘ึพGreat Falls, MT, USA.
In Great Falls, MT, USA.

Nowadays the Maqaf is part of the template as the string in the โ€œpoโ€ file dictates, this Maqaf should be conditional for cases where the name of the place is in Hebrew:
ื‘ื—ื™ืคื”
In ื—ื™ืคื”.

Another case would be to double initial Vav in case of a place starting with Vav if the name of the place doesnโ€™t already begin with double Vav, for example:
The name of the place is ื•ืจื“ื™ื”
ื‘ื•ื•ืจื“ื™ื”
In ื•ืจื“ื™ื”

But, if the name of the place is ื•ื•ืœื“ื•ืจืฃ:
ื‘ื•ื•ืœื“ื•ืจืฃ
In ื•ื•ืœื“ื•ืจืฃ

I was looking into reading the prefix (ืžืฉื”ื•ื›ืœื‘), if the prefix letter is any of these and the first character is either in Latin or a number I would add the Maqaf, the other case with double Vav is very rare but itโ€™s not so complicated to handle once you have already implemented the Maqaf orchestration.

Thank you :slight_smile:

1 Like

I have created pull request #1496: Add support for Hebrew prefixes.

It will take a narrated string like โ€œIn %(place)sโ€ and assume it is translated with a prefix to โ€œื‘%(place)sโ€. Then, for Hebrew only, I have just created a function to modify the substitution variable. It will double the Vav if not already double, prefix a maqaf for non-Hebrew words and numbers, and remove the leading He.

At the moment I donโ€™t actually detect whether the substitution variable is actually prefixed. It maybe not necessary for the strings in the libnarrate module.

@avma we need to make sure we adjust the translation accordingly.
We need to remove all the Maqaf characters and remove and space between the word and the placeholder.
So it should look like: ื ื•ืœื“ ื‘(ืชืืจื™ืš) ื‘(ืžืงื•ื).
Thank you both :slight_smile:

1 Like

Your are perfectly right Yaron, there is a way to deal with calendars and holidays in different languages for different cultures as well as what youโ€™ve just pointed out โ€œit should look like: ื ื•ืœื“ ื‘(ืชืืจื™ืš) ื‘(ืžืงื•ื)โ€ . Please see Translating Gramps - Gramps.
Why donโ€™t you go ahead and take a shoot at it. this for sure make reports look much better in Hebrew!

It is not about time/date placeholders but much more general.
But since you already asked we need this implementation for dates as well because dates can be either represented as Month, Year or a full date starting with a number.
In Hebrew it would look like:
ื ืคื˜ืจ ื‘ึพ1 ื‘ืื•ืงื˜ื•ื‘ืจ, 2010
ื ืคื˜ืจ ื‘ืื•ืงื˜ื•ื‘ืจ 2020

Keeping the Maqaf in that case could be problematic because the placeholder could begin with the Hebrew name of the month (second case) or the ordinal number of days in a month (first case), if these are interchangeable (use the same placeholder) we need this implementation to determine if Maqaf should be added or not.

@Nick-Hall @yaron
A lot of that is done with regex strings to manipulate date strings into verbal (text) date string such as:
ื‘ึพ1 ื‘ืื•ืงื˜ื•ื‘ืจ 2010
and
ื‘ืื•ืงื˜ื•ื‘ืจ 2020
or
ืื•ืง 2020
โ€ฆ

This howsoever will not handle the โ€œื ืคื˜ืจ ื‘ึพโ€ or โ€œื ืคื˜ืจ ื‘โ€ depends on the char set of the place name. On top of that we do have the gender form issue as well (ื ืงื‘ืจ, ื ืงื‘ืจื”, โ€ฆื ืคื˜ืจ, ื ืคื˜ืจื”, ).

I rather have one Bird in the Hand and deal withe the date issue first which to my completely unprofessional opinion is within reach while turning Gramps into complete bi-di support might be just a bit more challenging.
Iโ€™ve seen a few regex manipulations in the _datrhandlerxx.py (ru, hu, hr, caโ€ฆ) so mybe it is possible to do somthing like:

import re
from datetime import datetime

# Define the input date formats
date_formats = [
    r'\d{4}/\d{2}/\d{2}',
    r'\d{2}.\d{2}.\d{4}',
    r'\d{2}-\d{2}-\d{4}'
]

# Define a dictionary to map month numbers to Hebrew month names
hebrew_months = {
    1: "ื ื™ืกืŸ", 2: "ืื™ื™ืจ", 3: "ืกื™ื•ืŸ", 4: "ืชืžื•ื–", 5: "ืื‘", 6: "ืืœื•ืœ",
    7: "ืชืฉืจื™", 8: "ื—ืฉื•ื•ืŸ", 9: "ื›ืกืœื•", 10: "ื˜ื‘ืช", 11: "ืฉื‘ื˜", 12: "ืื“ืจ"
}

# Define a function to convert a date to the desired verbal format in Hebrew
def convert_to_hebrew_verbal(date):
    year, month, day = date.year, date.month, date.day
    hebrew_month = hebrew_months[month]
    verbal_date = f"ื‘ึพ{day} ืœ{hebrew_month} {year}"
    return verbal_date

# Define your input text
input_text = "Your input text containing dates like 1955/12/23, 23.12.1955, 12-23-1955 and so on."

# Iterate through the input text and replace date patterns with Hebrew verbal dates
for date_format in date_formats:
    regex_pattern = f'({date_format})'
    matches = re.finditer(regex_pattern, input_text)
    
    for match in matches:
        matched_date = match.group(1)
        date_obj = datetime.strptime(matched_date, '%Y-%m-%d')
        hebrew_verbal_date = convert_to_hebrew_verbal(date_obj)
        input_text = input_text.replace(matched_date, hebrew_verbal_date)

print(input_text)

This can than probably be manipulated further more

Letโ€™s discuss the RTL work in the relevant thread unless you think we should open another one to discuss RTL in general ant not only for the Narrative Web Report.
This Maqaf handler should be an easy fix, RTL is a long term mission and requires a lot of QA from our end.

I updated about my progress in the RTL thread, it seems possible yet requires some more work.
Regarding the desktop app - we need to see how we can load the UI using GNOME Builder/Anjuta/However their calling it these days and see if forcing RTL is efficient or requires manual work and how much of it.
Weโ€™ll continue over there.

1 Like

Yes itโ€™s a good idea, i think we should, there are quite a lot of RTL related staff I neglected to raise requests for fix, it would be nice to see it all at the same place from now onโ€ฆ

Regarding your fix: itโ€™s a good one but it should be more general and cover cases for both places and dates, I thought about possibly adding a function that detects the Maqaf in the translation and removes it if unnecessary (and adding where needed) this way we can make sure that adding or omitting Maqaf in the translation is irrelevant and the text will read just fine in all cases.

Specifying the Hebrew dates in their Hebrew name is only part of the solution, BTW is there a Hebrew calendar support? Should we add that as part of the RTL thread or another thread about all the possible calendars?

Let me know of any RTL issues in the GUI. Iโ€™ll try to fix them.

1 Like

Our dates support the Hebrew calendar. However the calendar reports and โ€œOn this dayโ€ report probably only support the Gregorian calendar at the moment.

1 Like

Are we ready to merge PR #1496 - Add support for Hebrew prefixes?

Will you need any help with the new translated strings? I could do a search and replace on the po file.

1 Like

Sure, Lets have @yaron take a look at it tooโ€ฆ
Thanks for offering, I think Iโ€™ll manage withe the translate part but i do need help with the Hebrew ate handler. Iโ€™ve got all the pieces working separately, but Iโ€™m heaving some problems putting it all together.

1 Like

What issues are you having with the date handler?

@Nick-Hall
The current โ€œAs Isโ€ situation is that both date quality and date tape are not functioning. If the date is not a โ€œnormalโ€ date, it is not possible to handle while logged in in Hebrew.


As i mentioned before, i copied one of the recent (new format) so now i can get one or the othere type โ€œaboutโ€ or quality โ€œanything other then normalโ€ but not both. span and range are broken too.

@avma You will need to write a Hebrew date handler because we donโ€™t have one at the moment. The easiest way to do this is to copy a simple existing one like Spanish.

So you would copy gramps/gen/datehandler/_date_es.py to_date_he.py. Then rename DateParserES to DateParserHE and DateDisplayES to DateDisplayHE.

Next change the registration at the bottom of the file to:

register_datehandler(
    ("he_IL", "he", "hebrew", "Hebrew", ("%d/%m/%Y",)), DateParserHE, DateDisplayHE
)

A line will also need to be added to gramps/gen/datehandler/__init__.py:

from . import _date_hr

Then it is just a matter of translation. Update modifier_to_int, calendar_to_int and quality_to_int. Keep at least one entry for each type.

The lists _span_1, _span2, _range_1 and _range_1 are to detect "from โ€ฆ to โ€ฆ " and โ€œbetween โ€ฆ and โ€ฆโ€.

Then there is the date displayer class. It looks fairly straight forward to translate.

Ask if you have any questions.

Thanks @Nick-Hall, basically I did exactly that, copied _date_ca.py and worked from there, but even if i strip it to the minimal necessity, it dos what Iโ€™ve describe on the previous post.

let me through in the code here( i will dump in later) so you can have a look:

# -------------------------------------------------------------------------
#
# Python modules
#
# -------------------------------------------------------------------------
import re

# -------------------------------------------------------------------------
#
# Gramps modules
#
# -------------------------------------------------------------------------
from ..lib.date import Date
from ._dateparser import DateParser
from ._datedisplay import DateDisplay
from ._datehandler import register_datehandler


# -------------------------------------------------------------------------
#
# Hebrew parser
#
#
# -------------------------------------------------------------------------

class DateParserHE(DateParser):
    calendar_to_int = {
        "ื’ืจื’ื•ืจื™ืื ื™": Date.CAL_GREGORIAN,
        "ืื–ืจื—ื™": Date.CAL_GREGORIAN,
        "ื™ื•ืœื™ืื ื™": Date.CAL_JULIAN,
        "ื™": Date.CAL_JULIAN,
        "ืขื‘ืจื™": Date.CAL_HEBREW,
        "ืข": Date.CAL_HEBREW,
        "ืžื•ืกืœืžื™": Date.CAL_ISLAMIC,
        "ืž": Date.CAL_ISLAMIC,
        "ื”ืžื”ืคื›ื” ื”ืฆืจืคืชื™ืช": Date.CAL_FRENCH,
        "ืฆ": Date.CAL_FRENCH,
        "ืคืจืกื™": Date.CAL_PERSIAN,
        "ืค": Date.CAL_PERSIAN,
        "ืฉื•ื•ื“ื™": Date.CAL_SWEDISH,
        "ืฉ": Date.CAL_SWEDISH,
    }

    modifier_to_int = {
                "ืœืคื ื™": Date.MOD_BEFORE,
                "ืœืคื ื™ ื”ึพ": Date.MOD_BEFORE,
                "ืœืค.": Date.MOD_BEFORE,
                "ืื—ืจื™": Date.MOD_AFTER,
                "ืื—ืจื™ ื”ึพ": Date.MOD_AFTER,
                "ืื—.": Date.MOD_AFTER,
                "ื‘ืกื‘ื™ื‘ื•ืช": Date.MOD_ABOUT,
                "ืกื‘ื™ื‘": Date.MOD_ABOUT,
                "ื‘ืขืจืš ื‘ึพ": Date.MOD_ABOUT,
                "ื‘ืขืจืš ื‘ืฉื ืช": Date.MOD_ABOUT,
                "ื‘ืงื™ืจื•ื‘": Date.MOD_ABOUT,
                "ืžื™ื•ื": Date.MOD_FROM,
                "ืžื”ึพ": Date.MOD_FROM,
                "ืžึพ": Date.MOD_FROM,
                "ืž": Date.MOD_FROM,
                "ืž ": Date.MOD_FROM,
                "ืขื“": Date.MOD_TO,
                "ืขื“ ื™ื•ื": Date.MOD_TO,
                "ืขื“ ื”ึพ": Date.MOD_TO,
                "ื•ืขื“ ื™ื•ื": Date.MOD_TO,
            }

    quality_to_int = {
        "ืžื•ืขืจืš": Date.QUAL_ESTIMATED,
        "ืžื—ื•ืฉื‘": Date.QUAL_CALCULATED,
    }

    bce = [
        "ืœืคื ื™ ื”ืกืคื™ืจื”",
        "ืœืคื ื™ ืขื™ื“ืŸ ื–ื”",
        "ืœืคื ื”\"\ืก",
        "ืœืคื ื™ ืกืคื™ืจืช ื”ื ื•ืฆืจื™ื",
        "ืœืกืคื™ืจืชื",
    ] + DateParser.bce

def init_strings(self):
    DateParser.init_strings(self)
    _span_1 = ["ึพืž"]
    _span_2 = ["ืขื“"]
    _range_1 = ["ื‘ื™ืŸ"]
    _range_2 = ["ืœื‘ื™ืŸ"]
    self._span = re.compile(
        r"(%s)\s+(?P<start>.+)\s+(%s)\s+(?P<stop>.+)"
        % ("|".join(_span_1), "|".join(_span_2)),
        re.IGNORECASE,
        )
    self._range = re.compile(
        r"(%s)\s+(?P<start>.+)\s+(%s)\s+(?P<stop>.+)"
        % ("|".join(_range_1), "|".join(_range_2)),
        re.IGNORECASE,
    )
    """
    compiles regular expression strings for matching dates
    """
    DateParser.init_strings(self)
    # match 'Day. MONTH year.' format with or without '-' '.' ' ' or '/') separator
    self._text2 = re.compile(
        r"(\d{1,2})[/.-](\d{1,2})[/.-](\d{4})",
        r"(\d{4})[/.-](\d{1,2})[/.-](\d{1,2})",
        r"(\d{1,2})(\d{2})(\d{2})" % self._mon_str,
        re.IGNORECASE,
    )
    
    self._span = re.compile(
        r"ืžึพ\s+(?P<start>.+)\s+ืขื“\s+(?P<stop>.+)", re.IGNORECASE
    )
    self._range = re.compile(
        r"ื‘ื™ืŸ\s+(?P<start>.+)\s+ืœื‘ื™ืŸ\s+(?P<stop>.+)", re.IGNORECASE
    )
    
    def init_strings(self):
        DateParser.init_strings(self)
        
        # match 'short-day day.month year' format
        short_day_str = "(" + "|".join(self._ds.short_days[1:]) + ")"
        self._numeric = re.compile(
            r"%s\s*((\d+)[\.]\s*)?((\d+)\s*)?(\d+)\s*$" % short_day_str, re.IGNORECASE
    )
# -------------------------------------------------------------------------
#
# Hebrew display
#
# -------------------------------------------------------------------------
class DateDisplayHE(DateDisplay):
    """
    Hebrew language date display class.
    """
_bce_str = "%s ืœืกืคื™ืจื”"
  
    
long_months = (
        "",
        "ื™ื ื•ืืจ",
        "ืคื‘ืจื•ืืจ",
        "ืžืจืฅ",
        "ืืคืจื™ืœ",
        "ืžืื™",
        "ื™ื•ื ื™",
        "ื™ื•ืœื™",
        "ืื•ื’ื•ืกื˜",
        "ืกืคื˜ืžื‘ืจ",
        "ืื•ืงื˜ื•ื‘ืจ",
        "ื ื•ื‘ืžื‘ืจ",
        "ื“ืฆืžื‘ืจ",
    )

short_months = (
        "",
        "ื™ื ื•",
        "ืคื‘ืจ",
        "ืžืจืฅ",
        "ืืคืจ",
        "ืžืื™",
        "ื™ื•ื ",
        "ื™ื•ืœ",
        "ืื•ื’",
        "ืกืคื˜",
        "ืื•ืง",
        "ื ื•ื‘",
        "ื“ืฆืž",
    )

hebrew = (
        "",
        "ืชืฉืจื™",
        "ื—ืฉื•ื•ืŸ",
        "ื›ืกืœื•",
        "ืชื‘ื˜",
        "ืฉื‘ื˜",
        "ืื“ืจ",
        "ืื“ืจ ื'",
        "ื ื™ืกืŸ",
        "ืื™ื™ืจ",
        "ืกื™ื•ื•ืŸ",
        "ืชืžื•ื–",
        "ืื‘",
        "ืืœื•ืœ",
    )

formats = (
        "DD-MM-AAAA (ISO)",
        "ืกื™ืคืจืชื™",
        "ื—ื•ื“ืฉ ื™ื•ื, ืฉื ื”",
        "ื—ื•ื“ืฉ ื™ื•ื, ืฉื ื”",
        "ื™ื•ื ื—ื•ื“ืฉ, ืฉื ื”",
        "ื™ื•ื ื—ื•ื“ืฉ, ืฉื ื”",
    )
    # this must agree with DateDisplayEn's "formats" definition
    # (since no locale-specific _display_gregorian exists, here)

def display(self, date):
        """
        Return a text string representing the date.
        """
        mod = date.get_modifier()
        cal = date.get_calendar()
        qual = date.get_quality()
        start = date.get_start_date()
        newyear = date.get_new_year()

        qual_str = self._qual_str[qual]

        if mod == Date.MOD_TEXTONLY:
            return date.get_text()
        elif start == Date.EMPTY:
            return ""
        elif mod == Date.MOD_SPAN:
            d1 = self.display_cal[cal](start)
            d2 = self.display_cal[cal](date.get_stop_date())
            scal = self.format_extras(cal, newyear)
            return "%s%s %s %s %s%s" % (qual_str, "ืž", d1, "ืขื“", d2, scal)
        elif mod == Date.MOD_RANGE:
            d1 = self.display_cal[cal](start)
            d2 = self.display_cal[cal](date.get_stop_date())
            scal = self.format_extras(cal, newyear)
            return "%s%s %s %s %s%s" % (qual_str, "ื‘ื™ืŸ", d1, "ืœื‘ื™ืŸ", d2, scal)
        else:
            text = self.display_cal[date.get_calendar()](start)
            scal = self.format_extras(cal, newyear)
            return "%s%s%s%s" % (qual_str, self._mod_str[mod], text, scal)

# -------------------------------------------------------------------------
#
# Register classes
#
# -------------------------------------------------------------------------
register_datehandler(
    ("he_IL", "he", "Hebrew", "Ivrit", "ืขื‘ืจื™ืช", ("%d-%m-%Y",)),
    DateParserHE,
    DateDisplayHE,
)

There might be stuff in there which no longer required since it is now taken care off in โ€œthe Gramps .poโ€ .