I thought I’d describe the status of work on JSON and speeding Gramps’ filters up. None of these steps have been incorporated into the code yet and are still being reviewed. The steps so far:
- Convert internal database format to JSON, an expressive format - under review
- Create a method for developers to write fast code without changing core code - under review
- Refactor the filter system - under review
- Optimize the filter system - under review (included in 3)
- Add “business logic” where appropriate - not started
I want to describe steps (3) and (4). First, the filter system is very well designed in its flexibility, and is at least 18 years old. At it heart is simplicity. Here is some pseudo-code to show how it works. Every filter is composed of a set of rules:
results = []
for person in db.get_all_people():
if all(rule.apply(person) for rule in rules):
results.append(person.handle)
return results
That’s the basics anyway. One of the most expensive parts is hidden in the “get_all_people”. Turning Gramps’ raw JSON data from the database into a full Person()
object is expensive, and it is better that we don’t have to do it. So, the main part of step (3) is to remove that part, using the new JSON data instead of objects, changing the code to something like:
results = []
for data in db.get_all_people_data():
if all(rule.apply(data) for rule in rules):
results.append(data["handle"])
return results
That seems like a very small change, but the original filter system called the above code recursively, so the Person()
object was getting re-created over and over. With just this refactor step, every filter will run anywhere from about 30% to 200% faster. (Some rules still need a Person() object, and they’ll make one. But only if they need to. And steps are taken to never re-create it once made).
But we can do even better. Consider that we want to filter on all people tagged with the “ToDo” tag. There are only 5 people tagged “ToDo” but we have to go through all of the people to find them. Let’s say there are 10,000 people in the database.
Gramps already has already addressed this scenario in the filters. Some rules, like isAncestorOf()
, pre-computes the ancestors, and so the rule merely checks to see if a person’s handle is in the pre-computed set of handles (called rule.map
by convention). The big insight was to use these maps rather than going through the entire database.
The optimized version now looks something like:
results = []
handles = get_maps(rules) # recursively searches filters/rules
for handle in handles: # only 5 handles in the HasTag example
data = db.get_person_data_from_handle(handle)
if all(rule.apply(data) for rule in rules): # check all rules
results.append(data["handle"])
return results
If rules like HasTag()
or isAncestorOf()
have the appropriate map (pre-selected handles) then the loop goes from having to look at 100,000 people to only looking at 5. Superfast!
Step (4) is a bit complex because in Gramps, a rule can actually be another filter with rules, and so on. But at the very bottom is a set of rules. Also, some of these filters can be inverted. That is, the map is actually a set of handles that should not be considered. In addition, rules can be combined using either “and”, “or”, or “only one”. So the optimization has to get the right maps, otherwise it could lead to incorrect data—wrong is worse than being slow! Lot’s of testing to be done to prove this code is correct.
There is also a caveat: creating maps takes memory. We have to be careful about how and when we create these maps. There is a trade-off of more speed for memory. It may not be appropriate for all uses, and we may want to have a checkbox to turn it on/off.
Finally, why did the above code go for 18 years without being revised? The code is fairly complicated, but it worked. I suspect that most of the developers don’t have trees that have 10,000 or more people in them, and the old system was fast enough. Also, it wasn’t clear what the problem was. The third step fixes the implementation, and (4) does something new.
I hope that this was useful. Do you have any questions?