Link feature bug? (hotlink markup positions offset by 1 character)

This happens with 5.1.2 under Fedora 40.

I add manually notes to my Sources to describe this source. The note contains a short description and the (physical) catalog index in the repository.

The index is selected and “link” button is pressed so that I can specify “Internet Link” and the URL to the source. After OK, the index is underlined blue, under the exact selection. Note type is changed to Link and everything is saved

However, randomly (I could not see any pattern), when I reopen some notes, the underline starting position is shifted one position right, thus not highlighting the whole index. This is “random” on the note but if such a note is affected, opening it will always show the mishap.

What is weird is the raction of Gramps: pressing Cancel triggers a dialog asking if I want to save modification although I didn’t change anything in the note.

In case I answer “Yes”, reopening the note displays a very small Note window. Enlarging it shows nothing though contents summary is still displayed in the Notes list window.

Since I want to convert all Html code notes to Link, I export the whole DB to XML. XML is processed and the conversion runs fine. Since I have the XML, I looked at the offending notes to see that in <range start="x" end="y"> x is effectively one position away from what I manually specified. Thus Gramps displays correctly the situation when the note is reopened. But this does not explain why Gramps considers there has been a change.

The only thing I didn’t try was to delete the note and recreate it from scratch because I’d have to rechain it to the referencing records.

Has anybody an idea how I could fix this? This concerns only 4 notes on a total of ~3100 notes containing links.

I can provide an XML copy of the notes for you to try if needed.

I tried wiping everything in affected notes and retyping what I need to no avail.

Do you mean 5.2.2? I saw this in that version, not in 5.1.

An example would be helpful.

1 Like

I really mean 5.1.2 because this is the base version I patch. I have not yet rebased on a more recent version such as 5.1.4 or 5.2.2. Only two versions are installed 5.2.2 and 5.1.2. I tested against 5.1.2 to make sure my modifications were not responsible.

But, it also occurs in other versions.

This is the test case I start with: note.gramps (1.1 KB). This is uncompressed XML. You can open it in any text editor. Note that the <range start= … are correct.

After importing into 5.2.2, I get export.gramps (1.1 KB) where the underlines don’t display correctly. Note that the <range …/> are not listed in ascending order but seem “technically” correct.

In 5.1.2, this is export1.gramps (1.1 KB) where <range …/> are in ascending orders, apparently correct but don’t display as expected.

Sometimes, the one-position offset is transcribed in <range …/> but correcting it manually has no effect.

Open the note, close it. Do you get the “not saved modifications” warning although you didn’t modify the note?

Yes, for all 3 files. Tested with 5.1.7.

I wonder what makes Gramps think there is a change in the note without user action. It is certainly related to the offset error but how?

There is a high probability of isolating the cause now that you have provided a reproducible case with test data.

It looks like a bug that will require a code change. A bug report needs to be generated now [13319] so the issue will appear in the Roadmap and Change List.

If you are curious about the underlying code problem, follow the bug report. It will have a link to the “commit” on GitHub. And you can patch the change on your installation of Gramps if you cannot wait for the next release.

The text in your example note consists of 130 characters. One of the links is defined to end at offset 131 which is beyond the end of the buffer. The editor will truncate the link at 130 and this is the change that you are being warned about when you try to quit without saving.

Is it possible the automatic white space trimming might have trimmed a leading linefeed (or reduced a double space to a single space) without updating the markdown positions?

I can understand the last part, meaning that the truncate is a change. I also read that Patrick selected the link, which suggets that he did not type the numbers in the link definitions. And if that’s the case, and they were generated by Gramps, there must be a bug that does that.

@pgerlier is there anything special about these 4 notes, in contrast to the other ones? Are these the only ones where there is no text following the last link?

When we save the XML we strip out control characters from the text, but the styled text ranges are not adjusted. My guess is that the notes contained control characters other than CR, LF and Tab.

2 Likes

When I looked at the Note content in the pasted into GVim, the cursor position was strange.

Instead of the status showing a simple row, character position readout, it had an extra “-character position” due to the diacritics.

So for row one of Note N0032 in note.gramps which has an acute “é” (at 1,16) and a circumflex “ê”, the offset increased twice.
When placing the cursor at the “r” in the row1, 17th character shows “1,18-17” and when moved to the “m” following the “ê”, it shows “1,32-30”

Permalinks différents mais même document

Could the Note Editor be having problems with the diacritics?

1 Like

That was weird. My posting changed the posting person and converted a quoted section.

I answer globally to the preceding replies.

  • 130/131 characters:
    Text was entered in the GUI, no tweaks on the XML to modify the coordinates in <range … />.

  • note specificity:
    These are not the only Link notes containing text preceding the links. They (with initial text) are manual notes. Others notes (more than 3000) are automatic conversion from Html code notes with <a href=…> links. They don’t seem affected but I have not checked them all yet.

  • diacritics:
    I feared something like this, essentially for my automatic conversion process because I count codepoints, not characters. However, the only letters+diacritics used fall into to Latin-1 Supplement and consequently require only a single codepoint (no combining diacritics).
    But, this should not matter for manually crafted notes.

    Regarding @emyoulation’s test in GVim, the outcome is “as expected”. “é” and “ê” are both in Latin-1 which means they need 2 bytes for their UTF-8 encoding.
    Nevertheless <range …/> counts characters not bytes (I checked this on examples). I have not verified this statement on complex diacritics combinations which use separate “combining marks” but it should be valid if the library references the “standard” ICU utilities, notably those computing visual glyph boundaries.

  • control characters:
    I doubt my text contains other controls than CR, LF and HT.
    My workflow is:

    1. export “production” DB to uncompressed XML
    2. process the XML to convert Html code notes to Link ones
    3. import the modified XML into a new blank tree
      I had temporarily problems with this last step because the process inserted a NUL at end of the XML file, which was rejected by Gramps as “invalid token”. This was the only spurious control.

The manual notes initially contained a weird end-of-line control (not the usual one) at the end of the first line (brought in by copy’n’paste). This caused unwanted display in the notes list and was manually corrected by deleting the EOL and pressing Enter. This was rechecked by visual inspection of the XML.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.