GRAMPS: AIO64-5.1.4-1 on W10pro. I need a bit of help taming the Python implementation of regex within Gramps.
I am trying to set up a filter to find (in notes) particular data in title case, i.e. lower case with an initial capital. For certain instances, I want to reformat these to all upper case. I do not want to do it globally, but at the moment I cannot get it to work for any single string. I want to set up a custom filter into which I enter my target string, and can then work though the results to tidy up the data so there is consistent case-formatting.
It seems that the routines called by Gramps parse all text on a case-insensitive basis, so that if I search for e.g. â\bSmithâ, with âuse regular expressionsâ checked, I still get results containing all the instances of âSMITHâ, which are already the vast majority of instances within my data, and also why I want a filter to find the very much smaller number of title-case instances so they can be fixed.
Even if I construct a regex using \u0000 to define each of the characters I am looking for, in its appropriate case, Gramps still returns all instances on a case-insensitive basis.
Is there some way of forcing a Gramps regex to be case-sensitive?
It is perfectly understandable that some Gramps filtering (e.g. on names of people) needs to be case insensitive.
In the case of names of people, there are preferences to display names in a variety of formats â where the available name formats not only consist of various name elements presented in a user-editable sequence, but where also the elements can be set to appear in uppercase, or [according to the Display name editor] otherwise appears literally as entered into the database. So it is very likely that many databases will contain name data in a variety of input case formats.
So the name display options enable consistent output of names, whatever the case formatting of the name elements at the time of entry of data for different people. Which for that purpose is a good thing.
But it is not at all clear why there would be no option for case sensitive filtering within the strings of descriptive fields (such as an event description), or of the text in notes.
If nothing else, Gramps needs to provide a practical mechanism for a user to format the names that appear within event descriptions and notes consistent with whatever name format they have selected for display purposes.
I do not know my way around Python code, and I certainly have not worked out which bits of code do the regex filtering (from the sidebar view) for the Events or Notes views. But I do notice that some .py files within my Gramps installation, apparently linked to filtering, include keywords such as âcase insensitiveâ or âignorecaseâ, and they contain a number of variables such as case_sensitive which can (presumably) be set true or false (e.g. in C:\Program Files\GrampsAIO64-5.1.4\gramps\gen\lib\baseobj.py ).
It is not obvious to me that Gramps was deliberately engineered to be completely case-insensitive for all filtering. It occurs to me there might be a bug, but it is perhaps more likely that inadvertent choices made for the majority of filtering instances has had the effect of disabling case sensitivity elsewhere (pehaps everywhere)?
Unless someone who is more adept at undertanding Python code than I am (I am not setting the bar very high!) can verify there is a bug somewhere in the Gramps code relating to forcing all filtering to case-insensitivity â in which case I will file a bug report â I propose to file an enhancement request to enable case-sensitivity for at least the description field associated with events, and for notes. Any other comments?
There is another thread started in February that is trying to explore which dialect of RegEx is supported by Gramps:
And this forum thread was mentioned there. But maybe you can enlist a developerâs help to try using the 3rd Party regex library instead of Pythonâs native re library?
It would need some performance testing. And verification that it actually expands the case-sensitivity controls for Unicode pattern matching.
The SuperTool add-on might also be an alternative for a deeper level of control than the re library for Python offered.
Thanks Brian, I have already looked at the pypi regex package, which does looks promising. I have commenced working out how I might test it. But I need to proceed with great caution.
I only noticed your earlier post about which flavour of regex Gramps was currently using after I had posted my original request. As an aside, the Discourse suggestion mechanism about existing âsimilarâ posts is not at all impressive â despite the explicit keyword regex in both our posts, yours was not offered as being similar, but a list of completely and inexplicably irrelevant ones were pushed at me nevertheless, which at the time was a major distraction.
It seems that despite advice to the contrary, the Gramps regex implementation is different from most recent Python ones, in that Gramps appears to default to case insensitivity, which as far as I can tell is the reverse of the usual default.
Knowing about the perhaps idiosyncratic Gramps behaviour does help, but unfortunately invoking case sensitivity in Gramps currently seems to be completely inaccessible.
I will also undertake some more testing of the Supertool addon, but I still have a big learning curve ahead of me on that front. I doubt that it will help much, since as far as I can tell it does not assemble a list from which the Gramps event or note editor can be invoked iteratively. The nature of the changes I need to make to the case-formatting of various parts of the text data in descriptions or notes is such that it is not likely the process could be automated (even if I had the regex experience to code for replacement and then to commit changes). Rather, I need a mechanism to locate objects in which at least one (and often many more than one) instance of wrongly-formatted names exist. Locating the object is the first step, then all the many (likely different) names in that text object can be edited in a single though inevitably manual pass. A typical example is a transcript of a long and potentially complicated newspaper article which was entered into Gramps at a time when I was using a different Gramps name display format, and with data entry formatting matching the name display format of name objects at that time. But I have subsequently changed my preferred name display format. Now that I have settled on a preferred name format, I need to update the formatting of text within the descriptions and the notes so they are consistent with the display formatting of names of people in the db.
At the very least it would be helpful if the Gramps documentation stated that some âstandardâ Python regex behaviour such as switching case sensitivity on or off is NOT accessible in Gramps! I have wasted a huge amount of time fruitlessly trying to get it to work â I just hope others donât have to go through all that again. When I have a bit more clarity about the current limits, I will have a go at updating that part of the wiki to at least give people a warning.
I suspect an enhancement request might be needed to flag this, as it doesnât look like case-sensitivity will be enabled very quickly.
Improving the RegEx documentation is why the thread was started. I lack the interest to explore that feature in Gramps. (There a MANY other features that beckon more stridently.) But wanted the Wiki to provide better leads on its use. So questions were asked instead of exploring aimlessly.
@SNoiraud pointed out that enabling the Regular Expression option converts the Filter Grampletâs Name search into looking for a âphraseâ instead of the much slower âall wordsâ. This is not something the typical user would suspect.
When we use regexp, if you use for example (a|b)
a means only a and not A.
If you want to use a or A, you must use: ([aA]|(bB])
This is how lexeme research works.
If you are looking for Axel: axel and AXEL do not match and this is normal.
If you really want this, the only solution would be to use re.IGNORECASE but this must be an additional option because not everyone wants to use it.
But Serge, at least on my install (AIO64-5.1.4-1 on W10pro), that is NOT how Gramps actually behaves!
Filtering for Axel or AXEL or axel all produce identical results!
If (in the People view sidebar) I enter âaxelâ or âAXELâ, with âuse regular expressionsâ enabled, all the result instances I get are as âAxelâ (and âAxelinaâ), and in my data, I do not find a single instance of âaxelâ or âAXELâ (but all filters using different case patterns give me the same âAxelâ & âAxelinaâ results).
In the people view, this is potentially complicated by the existence of name display formatting which might override the input case of the entries.
In the description field of events, or in the text of notes, exactly the same behaviour is evident, so for the same filter as for the people view, and again with âuse regular expressionsâ enabled, all result instances are as âAxelâ (and in my particular Event.Description data, also of âWAXELLâ), but again none of my results is as âaxelâ or âAXELâ, regardless of the case of the filter pattern.
So the evidence appears to be that Gramps does something like an upper() on both the pattern and the target before it conducts a regex, or else the regex is always case-insensitive.
I would love to know how a filter in either the Events Description field, or the Notes text, can be made actually selective as to case(s) of the target string.
No â if my pattern is Axel, I want only âAxelâ on a case-sensitive basis â not AXEL, not axel, not aXEL, not aXel etc
I can exclude e.g. WAxell or Axellina â which I also do not want â if I use a word boundary â\bâ before & after my pattern string â as in â\bAxel\bâ.
But my main problem is that I cannot currently filter by on a case-sensitive basis to distinguish between âAxelâ and âAXELâ.
Gentlemen;
Gramps is using regex, and as Serge points out is using the case insensitive option âre.Iâ. Deleting that bit in the requestprepare method of _rule.py would make all the filters that use the rule.requestprepare without overriding them to become sensitive to case. I for one would not want that as for most uses the searches should return results regardless of case. And I think we already have too many optionsâŚ
If you want to create a specialized filter that is case insensitive, you could create an addon filter that overrides the requestprepare method, leaving out the âre.Iâ option. Maybe something like CaseSensRegExpName.
As to why we do it this way, you would have to ask the authors from 11 years agoâŚ
Maybe a separate case-sensitive RegEx Filter tool that (quickly) Tags? Maybe based on the Addon:AddRemoveTagTool?
(If the option was a Gramplet, like a modified RegEx version of the Filter Gramplet, it could filter the View results. But its extra functionality could not be extended to filters in the export/reports/etc. unless the view results are Tagged. But tagging an extended selection in the results of a view has excess refreshes, making the tagging process incredibly slow. The Addon:AddRemoveTagTool bypasses the extraneous refreshes.)
Paul, I agree that the default insensitive behaviour should not be removed.
You overlook how you & Serge have already contributed to better documentation of Gramps, because your expertise was needed to confirm that indeed the current Gramps behaviour is always case insensitive. I will update the wiki so at least other people are made aware of that, and donât waste time attempting case-sensitive filtering!