Context
I am working on a specific-contents Note of type LINK. These Notes parallel the Repository-Source-Citation hierarchy to dynamically generate a full URL from a fragment provided at each level.
When the mouse hovers over a “fragment”, a tool tip showing the full URL is displayed and user can Ctrl-click to launch a web browser and jump to the target.
It works fine most of the time, except I hit two problems.
Identification of a "fragment"
A fragment is recognised through a regexp which is passed to StyledTextEditor during __init__(). Thus, every time the cursor is in a “fragment” or the mouse over one, method do_match_changed(…) is triggered and an action can be done.
But, when I dump the match in do_match_changed(…), I find that GTK merges adjacent matches. My pattern is supposed to match within a line (it does not span line end). Some notes have several “fragments” following each other on separate lines. GTK gives me all these in a single block.
I understand such an optimisation as being legitimate when it involves “decorations” like bold or italic because the visual effect is not interrupted by line wrap. But as I want to create a tooltip, I need to know exactly the target I am addressing.
I checked that this merge also occurs on “standard” Gramps patterns like GENURL or LINK but handling is different and it does not matter.
Could some coder more familiar with GTK (I am a complete newbie here) tell me how to get only the exact line where the cursor or mouse is?
Freezing
Sometimes, when I type my fragments, Gramps suddenly freezes and I must kill the application to regain control. After trials and errors, it occurs “randomly” while entering new fragments and also on existing Notes when the cursor or mouse is on an invalid fragment.
I have now an instance which freezes systematically. It allowed me to add debugging prints and experiment. I have narrowed down the problem to a call to re.match(str(fragment)) which never returns.
The weird thing is match() should fail because fragment text does not pass (or should not) the regexp. I am at a complete loss at why match() does not return None.
Does anyone know in which circumstances match() could loop forever?
Since match() is a library method which has been tested and proved myriads of times, the fault is forcibly on my side. For what is worth, here is my regexp:
FRAG_SENTINEL = '>>>'
FRAG_YES = r'+'
FRAG_NO = r'-'
FRAG_SP = r'[ \t]*'
FRAG_ID = r"(?P<id>\w*)?" + FRAG_SP
FRAG_PREV = r"(?:\{" + FRAG_SP + r"(?P<prev_type>[RS*])" \
+ FRAG_SP + r"(?P<prev_id>\w*)" \
+ FRAG_SP + r"\})?" + FRAG_SP
FRAG_URL = FRAG_SP + r"(?P<url>\S*)?"
FRAG_REGEXP = re.compile(FRAG_SENTINEL + FRAG_SP
+ r"(?P<flag>[" + FRAG_YES + FRAG_NO + r"])"
+ FRAG_ID
+ FRAG_PREV
+ r"(?P<label>(?:=[^>]|[^=]+))+" + FRAG_SP
+ r"=>" + FRAG_URL # + r"(?=" + FRAG_SP + r"(?:\n|$))"
)
Just in case GTK runs regexp in multiline mode, I explicitly replaced \s with FRAG_SP in order to exclude newline from spacers (but this does not prevent merging adjacent matching lines).
EDIT 2026-03-02
Freezing
I wonder if I fell into the Explosive Quantifier Trap.
Freezing occurs when the regexp should fail. Following the advice in the link above, I modified my quantifiers for stricter control in the “label” part as new:
FRAG_LABEL = r"(?P<label>(?:[^=]++|=(?!>)))++"
The first ++ is probably overkill. It apparently cured my problematic Note. I keep on experimenting on real use cases.
However, this does not explain the first problem (multiline match instead of single line match). The regexp is used twice:
- it is passed to StyledTextEditor to detect the construct
- it is used again in the “highlight action” to extract the components
In the latter case, since I noticed the matched text extends over several lines, I apply .splitlines() to process lines separately. Because of this, the “typing-in-progress” data (which can no longer rely on a valid matching end from next line) should have failed but the complexity of the regexp sent it into backtracking limbo.