Placeholders in Hebrew

yaron · July 30, 2023, 9:51am

Hi guys,
I’m helping with the Hebrew localization of Gramps, I must say you are doing a wonderful job

In Hebrew we have some requirements regarding placeholders which are a bit unique:
Instead of using a whole word to translate terms like “In” we’re using a prefix letter “ב”, in cases where the text is completely in Hebrew it doesn’t matter but when mixing this prefix letter before a Latin (or any other script) word or a digit we are adding a supporting character between the prefix letter and the Latin word called Maqaf which looks like this - ־ (a high hyphen).
There’s also a special case where any prefix letter except for Vav (addition) is the prefix letter before a word starting with Vav the Vav of this word should be doubled (reflected in the MediaWiki sample).

I read about the Date Handler but I think this issue is a bit wider.

How does it look?

Jacob died on 5th of January - יעקב נפטר ב־5 בינואר
Jacob buried on October - יעקב נקבר באוקטובר (notice the Maqaf is missing)
Jacob buried in Barcelona - ‫‏יעקב נקבר ב־Barcelona‏‬ (in case the name of the place is not translated)
Jacob buried in Petach Tikva - יעקב נקבר בפתח תקווה

This is an explanation about the prefixes:

This is an explanation about Maqaf:

This is a code example from MediaWiki:
First Comment

yaron · July 30, 2023, 9:52am

This is the code from MediaWiki as promised:

github.com

wikimedia/mediawiki/blob/2a0de02aabec6adc00606acce7372b38956e08d1/resources/lib/jquery.i18n/src/languages/he.js

/**
 * Hebrew (עברית) language functions
 */
( function ( $ ) {
	'use strict';

	$.i18n.languages.he = $.extend( {}, $.i18n.languages[ 'default' ], {
		convertGrammar: function ( word, form ) {
			switch ( form ) {
				case 'prefixed':
				case 'תחילית': // the same word in Hebrew
					// Duplicate prefixed "Waw", but only if it's not already double
					if ( word.slice( 0, 1 ) === 'ו' && word.slice( 0, 2 ) !== 'וו' ) {
						word = 'ו' + word;
					}

					// Remove the "He" if prefixed
					if ( word.slice( 0, 1 ) === 'ה' ) {
						word = word.slice( 1 );
					}

This file has been truncated. show original

Nick-Hall · July 30, 2023, 3:42pm

All these types of string can be found in the libnarrate.py file.

The locale code can be identified using self._locale.locale_code() == 'he'.

Running the detailed ancestral report in Hebrew using the example database gives results like:

%(male_name)s was born on %(birth_date)s in %(birth_place)s.
%(male_name)s לידה ב־%(birth_date)s ב־%(birth_place)s.
Lewis Anderson לידה ב־1855-06-21 ב־Great Falls, MT, USA.

My first reaction was to use regular expressions to add or remove the Maqaf character as required. Unfortunately the python re module didn’t work as I expected. I’ll have to look into this further when I have more time.

I also couldn’t find any RTL or LTR marks in the strings, but they do react in the editor as if they are RTL.

Perhaps someone can point me in the right direction.

yaron · August 6, 2023, 11:08am

Hi Nick and thank you for the quick response (and sorry for my slow response).

Well, if the field is Right-to-Left in the first place there’s no need to add LRM or RLM characters in such cases.
Regex is a very good solution in that case but I’m not sure it feasible.

Regarding the examples you’ve mentions, let’s take the last part of the sentence:
ב־Great Falls, MT, USA.
In Great Falls, MT, USA.

Nowadays the Maqaf is part of the template as the string in the “po” file dictates, this Maqaf should be conditional for cases where the name of the place is in Hebrew:
בחיפה
In חיפה.

Another case would be to double initial Vav in case of a place starting with Vav if the name of the place doesn’t already begin with double Vav, for example:
The name of the place is ורדיה
בוורדיה
In ורדיה

But, if the name of the place is וולדורף:
בוולדורף
In וולדורף

I was looking into reading the prefix (משהוכלב), if the prefix letter is any of these and the first character is either in Latin or a number I would add the Maqaf, the other case with double Vav is very rare but it’s not so complicated to handle once you have already implemented the Maqaf orchestration.

Thank you

Nick-Hall · August 6, 2023, 4:53pm

I have created pull request #1496: Add support for Hebrew prefixes.

It will take a narrated string like “In %(place)s” and assume it is translated with a prefix to “ב%(place)s”. Then, for Hebrew only, I have just created a function to modify the substitution variable. It will double the Vav if not already double, prefix a maqaf for non-Hebrew words and numbers, and remove the leading He.

At the moment I don’t actually detect whether the substitution variable is actually prefixed. It maybe not necessary for the strings in the libnarrate module.

yaron · August 7, 2023, 7:44am

@avma we need to make sure we adjust the translation accordingly.
We need to remove all the Maqaf characters and remove and space between the word and the placeholder.
So it should look like: נולד ב(תאריך) ב(מקום).
Thank you both

avma · August 7, 2023, 1:05pm

Your are perfectly right Yaron, there is a way to deal with calendars and holidays in different languages for different cultures as well as what you’ve just pointed out “it should look like: נולד ב(תאריך) ב(מקום)” . Please see Translating Gramps - Gramps.
Why don’t you go ahead and take a shoot at it. this for sure make reports look much better in Hebrew!

yaron · August 11, 2023, 7:33am

It is not about time/date placeholders but much more general.
But since you already asked we need this implementation for dates as well because dates can be either represented as Month, Year or a full date starting with a number.
In Hebrew it would look like:
נפטר ב־1 באוקטובר, 2010
נפטר באוקטובר 2020

Keeping the Maqaf in that case could be problematic because the placeholder could begin with the Hebrew name of the month (second case) or the ordinal number of days in a month (first case), if these are interchangeable (use the same placeholder) we need this implementation to determine if Maqaf should be added or not.

avma · August 13, 2023, 12:37pm

@Nick-Hall @yaron
A lot of that is done with regex strings to manipulate date strings into verbal (text) date string such as:
ב־1 באוקטובר 2010
and
באוקטובר 2020
or
אוק 2020
…

This howsoever will not handle the “נפטר ב־” or “נפטר ב” depends on the char set of the place name. On top of that we do have the gender form issue as well (נקבר, נקברה, …נפטר, נפטרה, ).

I rather have one Bird in the Hand and deal withe the date issue first which to my completely unprofessional opinion is within reach while turning Gramps into complete bi-di support might be just a bit more challenging.
I’ve seen a few regex manipulations in the _datrhandlerxx.py (ru, hu, hr, ca…) so mybe it is possible to do somthing like:

import re
from datetime import datetime

# Define the input date formats
date_formats = [
    r'\d{4}/\d{2}/\d{2}',
    r'\d{2}.\d{2}.\d{4}',
    r'\d{2}-\d{2}-\d{4}'
]

# Define a dictionary to map month numbers to Hebrew month names
hebrew_months = {
    1: "ניסן", 2: "אייר", 3: "סיון", 4: "תמוז", 5: "אב", 6: "אלול",
    7: "תשרי", 8: "חשוון", 9: "כסלו", 10: "טבת", 11: "שבט", 12: "אדר"
}

# Define a function to convert a date to the desired verbal format in Hebrew
def convert_to_hebrew_verbal(date):
    year, month, day = date.year, date.month, date.day
    hebrew_month = hebrew_months[month]
    verbal_date = f"ב־{day} ל{hebrew_month} {year}"
    return verbal_date

# Define your input text
input_text = "Your input text containing dates like 1955/12/23, 23.12.1955, 12-23-1955 and so on."

# Iterate through the input text and replace date patterns with Hebrew verbal dates
for date_format in date_formats:
    regex_pattern = f'({date_format})'
    matches = re.finditer(regex_pattern, input_text)
    
    for match in matches:
        matched_date = match.group(1)
        date_obj = datetime.strptime(matched_date, '%Y-%m-%d')
        hebrew_verbal_date = convert_to_hebrew_verbal(date_obj)
        input_text = input_text.replace(matched_date, hebrew_verbal_date)

print(input_text)

This can than probably be manipulated further more

yaron · August 13, 2023, 12:56pm

Let’s discuss the RTL work in the relevant thread unless you think we should open another one to discuss RTL in general ant not only for the Narrative Web Report.
This Maqaf handler should be an easy fix, RTL is a long term mission and requires a lot of QA from our end.

I updated about my progress in the RTL thread, it seems possible yet requires some more work.
Regarding the desktop app - we need to see how we can load the UI using GNOME Builder/Anjuta/However their calling it these days and see if forcing RTL is efficient or requires manual work and how much of it.
We’ll continue over there.

avma · August 13, 2023, 1:15pm

Yes it’s a good idea, i think we should, there are quite a lot of RTL related staff I neglected to raise requests for fix, it would be nice to see it all at the same place from now on…

yaron · August 13, 2023, 1:30pm

Regarding your fix: it’s a good one but it should be more general and cover cases for both places and dates, I thought about possibly adding a function that detects the Maqaf in the translation and removes it if unnecessary (and adding where needed) this way we can make sure that adding or omitting Maqaf in the translation is irrelevant and the text will read just fine in all cases.

Specifying the Hebrew dates in their Hebrew name is only part of the solution, BTW is there a Hebrew calendar support? Should we add that as part of the RTL thread or another thread about all the possible calendars?

Nick-Hall · August 13, 2023, 4:16pm

Let me know of any RTL issues in the GUI. I’ll try to fix them.

Nick-Hall · August 13, 2023, 4:18pm

Our dates support the Hebrew calendar. However the calendar reports and “On this day” report probably only support the Gregorian calendar at the moment.

Nick-Hall · August 13, 2023, 4:36pm

Are we ready to merge PR #1496 - Add support for Hebrew prefixes?

Will you need any help with the new translated strings? I could do a search and replace on the po file.

avma · August 15, 2023, 8:38am

Sure, Lets have @yaron take a look at it too…
Thanks for offering, I think I’ll manage withe the translate part but i do need help with the Hebrew ate handler. I’ve got all the pieces working separately, but I’m heaving some problems putting it all together.

Nick-Hall · August 15, 2023, 2:31pm

What issues are you having with the date handler?

avma · August 15, 2023, 3:46pm

@Nick-Hall
The current “As Is” situation is that both date quality and date tape are not functioning. If the date is not a “normal” date, it is not possible to handle while logged in in Hebrew.

As i mentioned before, i copied one of the recent (new format) so now i can get one or the othere type “about” or quality “anything other then normal” but not both. span and range are broken too.

Nick-Hall · August 15, 2023, 4:54pm

@avma You will need to write a Hebrew date handler because we don’t have one at the moment. The easiest way to do this is to copy a simple existing one like Spanish.

So you would copy gramps/gen/datehandler/_date_es.py to_date_he.py. Then rename DateParserES to DateParserHE and DateDisplayES to DateDisplayHE.

Next change the registration at the bottom of the file to:

register_datehandler(
    ("he_IL", "he", "hebrew", "Hebrew", ("%d/%m/%Y",)), DateParserHE, DateDisplayHE
)

A line will also need to be added to gramps/gen/datehandler/__init__.py:

from . import _date_hr

Then it is just a matter of translation. Update modifier_to_int, calendar_to_int and quality_to_int. Keep at least one entry for each type.

The lists _span_1, _span2, _range_1 and _range_1 are to detect "from … to … " and “between … and …”.

Then there is the date displayer class. It looks fairly straight forward to translate.

Ask if you have any questions.

avma · August 15, 2023, 5:08pm

Thanks @Nick-Hall, basically I did exactly that, copied _date_ca.py and worked from there, but even if i strip it to the minimal necessity, it dos what I’ve describe on the previous post.

let me through in the code here( i will dump in later) so you can have a look:

# -------------------------------------------------------------------------
#
# Python modules
#
# -------------------------------------------------------------------------
import re

# -------------------------------------------------------------------------
#
# Gramps modules
#
# -------------------------------------------------------------------------
from ..lib.date import Date
from ._dateparser import DateParser
from ._datedisplay import DateDisplay
from ._datehandler import register_datehandler


# -------------------------------------------------------------------------
#
# Hebrew parser
#
#
# -------------------------------------------------------------------------

class DateParserHE(DateParser):
    calendar_to_int = {
        "גרגוריאני": Date.CAL_GREGORIAN,
        "אזרחי": Date.CAL_GREGORIAN,
        "יוליאני": Date.CAL_JULIAN,
        "י": Date.CAL_JULIAN,
        "עברי": Date.CAL_HEBREW,
        "ע": Date.CAL_HEBREW,
        "מוסלמי": Date.CAL_ISLAMIC,
        "מ": Date.CAL_ISLAMIC,
        "המהפכה הצרפתית": Date.CAL_FRENCH,
        "צ": Date.CAL_FRENCH,
        "פרסי": Date.CAL_PERSIAN,
        "פ": Date.CAL_PERSIAN,
        "שוודי": Date.CAL_SWEDISH,
        "ש": Date.CAL_SWEDISH,
    }

    modifier_to_int = {
                "לפני": Date.MOD_BEFORE,
                "לפני ה־": Date.MOD_BEFORE,
                "לפ.": Date.MOD_BEFORE,
                "אחרי": Date.MOD_AFTER,
                "אחרי ה־": Date.MOD_AFTER,
                "אח.": Date.MOD_AFTER,
                "בסביבות": Date.MOD_ABOUT,
                "סביב": Date.MOD_ABOUT,
                "בערך ב־": Date.MOD_ABOUT,
                "בערך בשנת": Date.MOD_ABOUT,
                "בקירוב": Date.MOD_ABOUT,
                "מיום": Date.MOD_FROM,
                "מה־": Date.MOD_FROM,
                "מ־": Date.MOD_FROM,
                "מ": Date.MOD_FROM,
                "מ ": Date.MOD_FROM,
                "עד": Date.MOD_TO,
                "עד יום": Date.MOD_TO,
                "עד ה־": Date.MOD_TO,
                "ועד יום": Date.MOD_TO,
            }

    quality_to_int = {
        "מוערך": Date.QUAL_ESTIMATED,
        "מחושב": Date.QUAL_CALCULATED,
    }

    bce = [
        "לפני הספירה",
        "לפני עידן זה",
        "לפנה\"\ס",
        "לפני ספירת הנוצרים",
        "לספירתם",
    ] + DateParser.bce

def init_strings(self):
    DateParser.init_strings(self)
    _span_1 = ["־מ"]
    _span_2 = ["עד"]
    _range_1 = ["בין"]
    _range_2 = ["לבין"]
    self._span = re.compile(
        r"(%s)\s+(?P<start>.+)\s+(%s)\s+(?P<stop>.+)"
        % ("|".join(_span_1), "|".join(_span_2)),
        re.IGNORECASE,
        )
    self._range = re.compile(
        r"(%s)\s+(?P<start>.+)\s+(%s)\s+(?P<stop>.+)"
        % ("|".join(_range_1), "|".join(_range_2)),
        re.IGNORECASE,
    )
    """
    compiles regular expression strings for matching dates
    """
    DateParser.init_strings(self)
    # match 'Day. MONTH year.' format with or without '-' '.' ' ' or '/') separator
    self._text2 = re.compile(
        r"(\d{1,2})[/.-](\d{1,2})[/.-](\d{4})",
        r"(\d{4})[/.-](\d{1,2})[/.-](\d{1,2})",
        r"(\d{1,2})(\d{2})(\d{2})" % self._mon_str,
        re.IGNORECASE,
    )
    
    self._span = re.compile(
        r"מ־\s+(?P<start>.+)\s+עד\s+(?P<stop>.+)", re.IGNORECASE
    )
    self._range = re.compile(
        r"בין\s+(?P<start>.+)\s+לבין\s+(?P<stop>.+)", re.IGNORECASE
    )
    
    def init_strings(self):
        DateParser.init_strings(self)
        
        # match 'short-day day.month year' format
        short_day_str = "(" + "|".join(self._ds.short_days[1:]) + ")"
        self._numeric = re.compile(
            r"%s\s*((\d+)[\.]\s*)?((\d+)\s*)?(\d+)\s*$" % short_day_str, re.IGNORECASE
    )
# -------------------------------------------------------------------------
#
# Hebrew display
#
# -------------------------------------------------------------------------
class DateDisplayHE(DateDisplay):
    """
    Hebrew language date display class.
    """
_bce_str = "%s לספירה"
  
    
long_months = (
        "",
        "ינואר",
        "פברואר",
        "מרץ",
        "אפריל",
        "מאי",
        "יוני",
        "יולי",
        "אוגוסט",
        "ספטמבר",
        "אוקטובר",
        "נובמבר",
        "דצמבר",
    )

short_months = (
        "",
        "ינו",
        "פבר",
        "מרץ",
        "אפר",
        "מאי",
        "יונ",
        "יול",
        "אוג",
        "ספט",
        "אוק",
        "נוב",
        "דצמ",
    )

hebrew = (
        "",
        "תשרי",
        "חשוון",
        "כסלו",
        "תבט",
        "שבט",
        "אדר",
        "אדר א'",
        "ניסן",
        "אייר",
        "סיוון",
        "תמוז",
        "אב",
        "אלול",
    )

formats = (
        "DD-MM-AAAA (ISO)",
        "סיפרתי",
        "חודש יום, שנה",
        "חודש יום, שנה",
        "יום חודש, שנה",
        "יום חודש, שנה",
    )
    # this must agree with DateDisplayEn's "formats" definition
    # (since no locale-specific _display_gregorian exists, here)

def display(self, date):
        """
        Return a text string representing the date.
        """
        mod = date.get_modifier()
        cal = date.get_calendar()
        qual = date.get_quality()
        start = date.get_start_date()
        newyear = date.get_new_year()

        qual_str = self._qual_str[qual]

        if mod == Date.MOD_TEXTONLY:
            return date.get_text()
        elif start == Date.EMPTY:
            return ""
        elif mod == Date.MOD_SPAN:
            d1 = self.display_cal[cal](start)
            d2 = self.display_cal[cal](date.get_stop_date())
            scal = self.format_extras(cal, newyear)
            return "%s%s %s %s %s%s" % (qual_str, "מ", d1, "עד", d2, scal)
        elif mod == Date.MOD_RANGE:
            d1 = self.display_cal[cal](start)
            d2 = self.display_cal[cal](date.get_stop_date())
            scal = self.format_extras(cal, newyear)
            return "%s%s %s %s %s%s" % (qual_str, "בין", d1, "לבין", d2, scal)
        else:
            text = self.display_cal[date.get_calendar()](start)
            scal = self.format_extras(cal, newyear)
            return "%s%s%s%s" % (qual_str, self._mod_str[mod], text, scal)

# -------------------------------------------------------------------------
#
# Register classes
#
# -------------------------------------------------------------------------
register_datehandler(
    ("he_IL", "he", "Hebrew", "Ivrit", "עברית", ("%d-%m-%Y",)),
    DateParserHE,
    DateDisplayHE,
)

There might be stuff in there which no longer required since it is now taken care off in “the Gramps .po” .

Topic		Replies	Views
Making date input more flexible Ideas hacks , dates , supertool-script	19	1475	August 23, 2024
Date quality vs. modifier Help dates	44	1312	October 8, 2022
Hebrew relationship translator/calculator Help	26	598	October 20, 2023
Fictitious Calendar - How to create one? Help worldbuilding	25	1082	May 3, 2023
US and DE date formats mixed up in Ancestry GEDCOM import Help	36	898	June 5, 2023

Placeholders in Hebrew

Related topics