Generated reports and Note Cleanup tool

I am using Gramps 5.1.3 on Windows 10.

I have noticed something odd about URL links in notes and the generated narrated web report.

It happens after I have run the Note Cleanup tool .

Here is an Event note against an event:
1 event

Here is the output from the Note Link Check report . For some reason there are two lines referring to the note above:
image

Here is the generated Narrated Web Report source:
3 code

It is a bit hard to read the code above but there are three occurrences of the href, and the third is actually nested inside the second.

Here is the view in the browser (Chrome)
4 click

It looks OK in the browser, and if I click the link, then the page opens as expected.

The report was generated using the Mainz CSS.

Questions

  • Why are there double rows in the Note Link Check report, and why does the generated HTML have multiple hrefs.

  • Should I avoid using the Note Cleanup tool.

  • I am not sure of the purpose of the Note Links Report.

  • How could I fix the above (if it needs fixing). I am guessing that the note content is stored somewhere internally, inaccessible to the average user, where its text is annotated with details about the colours, italics start and stop, bolding, and links etc.

Regards
John

Could you provide a small example of your problem. I think something is incorrect in your database.
Perhaps you have a double link which could explain the html code or something else.

The best way should be to create a bug report and to add your example.
We could reproduce your problem (I hope) and see what can produce what you see.

Thanks, I have taken up your suggestion and created a bug report in the bug tracker:

  • 0012355: Generated reports and Note Cleanup tool

I did a new install of Gramps on another PC, and the problem is there and easy to reproduce. I have included a sample database in the bug report, it just has one person and one note in it, which is enough to show the problem.

I wonder if the Note Cleanup tool creates an explicit link (to reinforce the Note editor’s implicit one) when recognizing a URL in the text?

And then then the Narrated Web Site might create a redundancy by doing the same?

A quick way to make that determination would be to highlight the link in Note #2 and use the “Clear Markup” button at the far right of the Note Editor’s toolbar.

The net intended URLs to be invisible. Hotlinks are supposed to be human-readable labels with the browser transparently dealing with an anchor hotlinked reference (<a href =...>). A bare URL as user-readable content confuses the intention & automation.

In your note, you have the link twice. The narrative web show the two links.
See below:

    <note handle="_eca6cc99c38cc152c755059638" change="1626257235" id="N0000" type="Person Note">
      <text>here is a link to https://twitter.com/</text>
      <style name="link" value="https://twitter.com/">
        <range start="18" end="38"/>
      </style>
      <style name="link" value="https://twitter.com/">
        <range start="18" end="38"/>
      </style>

I understand how you did that. You created the link twice.
To solves the problem, you select the url then you remove it.
After that, you recreate the link.

I don’t know how we can avoid this.

I selected all text in the note and clicked clear markup. After regenerating the narrated web site report, the link was gone, as expected.

When I then ran Note Cleanup, and regenerated the report, the triple href returned.

SNoiraud, I am not sure where your code comes from, in my website page I just see

				<div class="grampsstylednote">
					<p>
					here is a link to <a href="https://twitter.com/"></a><a href="https://twitter.com/"><a href="https://twitter.com/">https://twitter.com/</a></a>
					</p>

I use the Cleanup tool mainly for its ability to convert to links in bulk, as I don’t usually select a bit of text and then apply ‘create link’ to it when I am writing individual notes.

Perhaps the Note Cleanup tool should have an option to not add markup to bare URLs?

From NoteCleanupTool

The tool also searches for “links” to web sites and sets them to “Styled Text Links”, so that they work properly in reports such as the Gramps Narrative Web Site .

Or, you can stop putting bare URLs in notes. Either : 1) use the Note Editor’s linking feature applied to a label; or, 2) add a bare URLs as Internet tab objects.

The code comes from your .gramps file. If I look the link, we have:

here is a link to <a href="https://twitter.com/"></a>
                         <a href="https://twitter.com/">
                              <a href="https://twitter.com/">https://twitter.com/</a>
                         </a>

This code is generated from the link included in the note.

We have a first link without text : <a href="https://twitter.com/"></a>

If you insert "xxx " before “</a>”, you’ll get:
link1 where xxx is clickable

We have a second link for which we have the same link instead of text.

It is impossible for the cleaning tool to detect such an anomaly.

The problem is the tool.

1 Like

Snoiraud, thanks for the info re the code in the gramps file. I also think there is a
problem with the tool.

Emyoulation, thanks for the hint about bare URLs. Many of my notes have text that I copy from the citation field on a web page. The citation might include a bare URL, here is an example:

The note cleanup tool mangles them also, with extraneous characters showing in the web page, eg in the Dynamic Web report.

This is a good set of ‘catches’ in special cases flaws for the Note Cleanup Tool.

Please DO file bug reports for that tool.

Paul has a great sense of duty in regards to code quality. There is little doubt that he will be interested. And patches to smaller add-ons tend to be distributed far more quickly than built-ins or more complex add-ons.

maybe we could use #012355 to make this feature request a bug

Now that the problem is isolated, it might be better to refile a fresh bug report… so that the issue can be more quickly recognized in the Summary and Description. The feature could be associated & closed as a duplicate.

As a Developer, which would you prefer to see?

(Note: the title & summary are probably all that would be seen when composing things like the Release Notes. Although Add-ons don’t have the benefits of Release Notes publication.)

Repairing Note Cleanup
As I had about 1300 notes with duplicated links, I did not want to edit them by hand, and looked into regular expressions, using Notepad++, to find and delete the extra links.

I have not written regular expressions before, so I used the regex tester at https://regex101.com/ to help build my search string.

There was a Stackexchange question which helped me, see notepad++ - How to find and mark all duplicate paragraphs using Notepad ++? - Super User

Regex in Notepad++

It seems to me that the following search/replace in the Notepad++ Replace dialog has deleted all the duplicate links that were created by Note Cleanup.

  • tick Wrap around
  • select Regular expression
  • tick Matches newline
  • The search for string is
    • (<style name="link".*?<\/style>)\s+\1+
  • The replace string is
    • \1

image

Fixing the tree

  1. In Gramps create a backup of your tree without media

  2. In file explorer (I’m on Windows) right-click the .gramps backup file you just created and open it with 7-Zip (I have 7-Zip installed on my PC but another tool would do the same).

  3. In the 7-Zip window right-click the .gramps file and chose Open then select to open with Notepad++ - it takes a while to open as it is a big file.

  4. In Notepad++ enter the search/replace dialog as above, and click Replace All

  5. Save the fixed file – with a new name just in case.

Note
I have not checked everything yet, but it seems to have worked. I think it saved time, and at least by automating I did not overlook some duplicates if I stepped through it all by hand.
I have Notepad++ v7.9.5
There are some link anomalies that Note Cleanup seems to have created, that I mentioned earlier, but I will have to find them again. I guess regex could fix them too.
And thanks again SNoiraud - I had never looked into the .gramps file.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Continuing the discussion from Generated reports and Note Cleanup tool:

I have only just noticed that the regular expression in my final post about cleaning Note Cleanup duplicates is incorrect (must be due to my formatting). I don’t seem to be able to edit that post, so created this new one.

The search string should not use open/closing double quotes, it should of course use ordinary double quotes.

Search for
(<style name=“link”.*?)\s+\1+