Making Gramps filters Faster, and then Superfast

dsblank · November 17, 2024, 1:56pm

I thought I’d describe the status of work on JSON and speeding Gramps’ filters up. None of these steps have been incorporated into the code yet and are still being reviewed. The steps so far:

Convert internal database format to JSON, an expressive format - under review
Create a method for developers to write fast code without changing core code - under review
Refactor the filter system - under review
Optimize the filter system - under review (included in 3)
Add “business logic” where appropriate - not started

I want to describe steps (3) and (4). First, the filter system is very well designed in its flexibility, and is at least 18 years old. At it heart is simplicity. Here is some pseudo-code to show how it works. Every filter is composed of a set of rules:

results = []
for person in db.get_all_people():
    if all(rule.apply(person) for rule in rules):
        results.append(person.handle)
return results

That’s the basics anyway. One of the most expensive parts is hidden in the “get_all_people”. Turning Gramps’ raw JSON data from the database into a full Person() object is expensive, and it is better that we don’t have to do it. So, the main part of step (3) is to remove that part, using the new JSON data instead of objects, changing the code to something like:

results = []
for data in db.get_all_people_data():
    if all(rule.apply(data) for rule in rules):
        results.append(data["handle"])
return results

That seems like a very small change, but the original filter system called the above code recursively, so the Person() object was getting re-created over and over. With just this refactor step, every filter will run anywhere from about 30% to 200% faster. (Some rules still need a Person() object, and they’ll make one. But only if they need to. And steps are taken to never re-create it once made).

But we can do even better. Consider that we want to filter on all people tagged with the “ToDo” tag. There are only 5 people tagged “ToDo” but we have to go through all of the people to find them. Let’s say there are 10,000 people in the database.

Gramps already has already addressed this scenario in the filters. Some rules, like isAncestorOf(), pre-computes the ancestors, and so the rule merely checks to see if a person’s handle is in the pre-computed set of handles (called rule.map by convention). The big insight was to use these maps rather than going through the entire database.

The optimized version now looks something like:

results = []
handles = get_maps(rules) # recursively searches filters/rules
for handle in handles: # only 5 handles in the HasTag example
    data = db.get_person_data_from_handle(handle)
    if all(rule.apply(data) for rule in rules): # check all rules
        results.append(data["handle"])
return results

If rules like HasTag() or isAncestorOf() have the appropriate map (pre-selected handles) then the loop goes from having to look at 100,000 people to only looking at 5. Superfast!

Step (4) is a bit complex because in Gramps, a rule can actually be another filter with rules, and so on. But at the very bottom is a set of rules. Also, some of these filters can be inverted. That is, the map is actually a set of handles that should not be considered. In addition, rules can be combined using either “and”, “or”, or “only one”. So the optimization has to get the right maps, otherwise it could lead to incorrect data—wrong is worse than being slow! Lot’s of testing to be done to prove this code is correct.

There is also a caveat: creating maps takes memory. We have to be careful about how and when we create these maps. There is a trade-off of more speed for memory. It may not be appropriate for all uses, and we may want to have a checkbox to turn it on/off.

Finally, why did the above code go for 18 years without being revised? The code is fairly complicated, but it worked. I suspect that most of the developers don’t have trees that have 10,000 or more people in them, and the old system was fast enough. Also, it wasn’t clear what the problem was. The third step fixes the implementation, and (4) does something new.

I hope that this was useful. Do you have any questions?

Davesellers · November 17, 2024, 4:45pm

Thank you for looking at these internals of gramps.
I hope there is someone who could speed up the way screens are loaded when opened. One that bugs me every day is the loading of all the source/citations when the window is opened.

Case: new event is created, new citation added.

Problem: When you select the sources tab your entire source /citation database table is loaded. For me that is 2 seconds and is getting longer as more are added.

Fix: As a minimum a list of the sources that will fill the screen should be loaded, more if the screen is scrolled. If a search is used, search the table in the DB using your new methods and return the results. If the user selects a source in the window, they can expand it and the screen will populate with the required citation rows.

dsblank · November 17, 2024, 4:53pm

@Davesellers, I feel your pain. In fact, this was the first thing I looked at earlier this year to try to speed up Gramps. But this is almost impossible to fix with the current architecture. Why? Because the full objects are needed (in many cases) in order to sort them. You can’t load just a page unless the items have already been sorted. New implementations of Gramps (like gramps-web) can re-think this basic operation (get page of sorted data) and operate as you describe.

Not to say that there aren’t gains to be made, but they are unlikely to be fast like the ones for filters.

emyoulation · November 17, 2024, 5:01pm

Kari created an experimental Recent Items library addon. It adds a small (6 item) table at the top of the object selector dialog.

This addon does not reduce the re-populate latency for the dialog. But it does dramatically reduce the navigation time when repeatedly selecting the same Citation or Place during data entry.

Davesellers · November 17, 2024, 5:14pm

I don’t think this would help me much. I use the clipboard as much as possible. However I haven’t found a way to use it when adding citations because they are so diverse.
Instead of all the sources and citations being loaded on opening the window, why not just the sources (for me this is about 200), these can be sorted quickly. Then if you expand on a source, those rows loaded and sorted. A search should only return a few rows and can be sorted quickly. This needs to be broken down into smaller chunks as a DBs grow in size.

Davesellers · November 17, 2024, 5:18pm

I guess I should say in my professional life I was use to working with sql data and with indexed tables, and gramps is not there yet.

dsblank · November 17, 2024, 5:27pm

It isn’t exactly that. A lot of code has been written in order to be able to sort by complicated issues. Take for example names. How do you sort a name? There are many issues, like whether you have prefixes, suffixes, and name preference types. Does “von Trapp” appear in the T’s or the V’s. Many people care about such details. The only way to do that in SQL (or another backend) is to re-do that logic in SQL.

Anyway, this thread is about filter optimization.

MarkusB · December 3, 2024, 11:45am

It is clear that ‘making Gramps faster’ is also about optimizing code, but I don’t find it sluggish at all (12000 persons in the DB). However, that is only one part - the ease of practical use matters a lot. May I make two suggestions, that would speed up Gramps significantly, at least for me:

Reduce the obscene amount of clicking for simple standard actions: try changing the birth year of a person: 7 (!) clicks needed to return to people view, ie two doubleclicks, and 3 single ones. Change or add a date for marriage, that’s 11 clicks… The ‘data entry’ gramplet is a step in the right direction, but that style of form is really what should open up as default when you click a person.
The ability to freely configure (parts of) the tool bar, or a floating personal toolbar with shortcuts- say I want to have one permanent button and click to take me from Citations to the Grouped People view, and one to take me to ‘Charts-Graph view’ or ‘Person Card’ etc. From anywhere in Gramps.

Gramps 5.2.2 on Win11

Urchello · December 3, 2024, 12:02pm

you will find it sluggish, but a later

Here I agree with you 1000%! I think the same. Really, too many clicks… Gramps has unbelievable flexibility, and that’s why this problem occurs. It would be great to have simpler interfaces that allow making such common/regular changes (and creations) in one window, like it’s done in the main tab of the person card on MyHeritage. It allows you to simply add dates, places, etc. Then all entered data in Gramps would create events for birth, death, marriage, etc. If user wants to add more data, such as attributes, they can always use the current Gramps tabs as they are.

dsblank · December 3, 2024, 1:17pm

Yep, you’re in the “sweet spot” where you have quite a bit of data, but not too large. There will be some filters for you that are still slower than they need to be.

If you really wish to have Gramps developers address these User Interface issues, you will want to start a separate conversation. There are some contributors that specialize in UI, so that would be a different group. And I 100% agree with you… the standard UX is pretty rough. Which is the main reason I created the Data Entry gramplet, and the CSV Import/Export Merge tool. I hate having to make so many clicks in the regular UI!

trlvn · December 3, 2024, 1:50pm

It is actually much more than 7 clicks because, of course, the researcher would need to record the evidence supporting the revised date of birth.

Honestly, if Gramps had a setting that prompted me to enter source information every time I entered or changed a name, date, place or other key fact, I think I would turn on that feature. Because I sure hate looking at something in a few months/years/decades and wondering what the heck younger me was thinking!?!

I get the appeal of simple data entry but if you care about supporting your research, it may be false economy.

Craig

emyoulation · December 3, 2024, 3:01pm

I’d rather have something more simple: an active record (Place, source or citation) or Recent-Items top entry that is more functional as a default.

If you are doing data-entry, then its likely that you’re working from a particular page/URL from a source. So Gramps should “Guess” that Recent‐Items Source for ANY of the new Object editors.

And while we’re discussing fewer clicks… Select Object and Object Editor dialogs that don’t make you backtrack.

When adding an object in a large database, the first step is to check if it already exists. You use the Select Object for a Place, Person, Citation, Event (et cetera) and then navigate down a multi‐level hierarchy; only to realize “damn, it doesn’t exist”. So you cancel, click the Add button and (at some point) navigate back to the same place in the hierarchy. Why not just have an Add button (in the Select Object dialog) that starts from that navigated spot? (new Person with that Surname, new Place with that enclosing Place, new Citation with that Source, etc.)
Likewise, when refining a Place for an Event (i.e., the county was known, now you found the village) why not have a way to navigate the Object Selector to the known county so you can refine to the Village?

trlvn · December 3, 2024, 4:00pm

emyoulation:

dialogs that don’t make you backtrack.

When adding an object in a large database, the first step is to check if it already exists. You use the Select Object for a Place, Person, Citation, Event (et cetera) and then navigate down a multi‐level hierarchy; only to realize “damn, it doesn’t exist”. So you cancel, click the Add button and (at some point) navigate back to the same place in the hierarchy. Why not just have an Add button that starts from that navigated spot? (new Person with that Surname, new Place with that enclosing Place, new Citation with that Source, etc.)

Likewise, when refining a Place for an Event (i.e., the county was known, now you found the village) why not have a way to navigate the Object Selector to the known county so you can refine to the Village?

Agreed. So very much agreed.

Craig

Davesellers · December 3, 2024, 5:04pm

I don’t disagree with you that searching through lists is a bummer. However, I always start with a Search for a Place or Source/Citation if I don’t already have it on the Clipboard. Since my db is 5 years old I don’t need to add many Places or Sources anymore so the search finds them quickly.
Navigation in gramps is not much different than many genealogy web sites, so I would be interested in any better navigation as long it does not make the process more complicated.

emyoulation · December 4, 2024, 10:32pm

Thank you very much for both of those!!

Topic		Replies	Views
Understanding Gramps 6.0 Development roadmap	25	1607	February 19, 2025
What to expect (and test) with Gramps 6.0 beta Beta Testing	40	633	April 24, 2025
How to make gramps-web fast Development gramps-web , performance	4	155	October 20, 2024
Filtering. Is there a way better than another? Help	5	391	October 31, 2022
The Road Ahead for database representations Development database	41	496	December 15, 2024

Making Gramps filters Faster, and then Superfast

Related topics