Developers,
I think the most important question for the next version of gramps is: how to allow Gramps to take advantage of JSON for large database speedups? Now that the JSON format is settled in Gramps 6.0, I think it is time to starting planning.
This question implies the following constraints:
- The code needs to allow implementations in non-SQL backends
- It should be robust and maintainable, and not too much code
- It should not introduce non-Pythonic interfaces
- It should not “look like SQL” to the developer
(This list is a summary of discussions about this topic over at least the last 7 years. These are not just my preferences, but those from the developer community.)
I proposed the idea in a bit of detail (shown below) in January of this 2025 on the Gramps Developers Mailing list.
There wasn’t much feedback, but Tim Lyons asked:
I’m not entirely sure how much something like this would actually be used in the main Gramps code.
I’ve been working on this for the last few weeks, and I think I have figured out the tricky parts, not just in implementation, but how it must properly used in the Gramps environment. The first big integration hurdle is that:
- You can’t use low-level database access for speed, if you are using a Gramps proxy
A “proxy” in Gramps terminology is a Python wrapper around a database. It does things like remove sensitive detail if an object is marked Private.
On the other hand, a IsPrivate filter is very simple. Here is a simplified example:
def apply_to_one(db, person) -> bool:
return person.private
It say “include the object in the selected items if person.private is True” and it works even if there are Proxies in place (Proxies can stack on top of one another). So, how can we make that code any faster? Well, the filter requires that you go through each and every Person in the database to check to see if they are private. That is slow because it happens in Python. If we could move the select part into, say SQL, then we can do the sweep over the database in SQL rather than in Python (fast!).
Ok, here is the IsPrivate filter using the proposed API (simplified for demonstration):
def prepare(db):
if db.can_use_fast_selects():
self.selected_handles = db.select_from_person(
what="person.handle",
where="person.private"
)
def apply_to_one(db, person) -> bool:
if db.can_use_fast_selects():
return person.handle in self.selected_handles
else:
return person.private
Notes:
- The
db.can_use_fast_selects()checks two things: is theselect_from_tableimplemented in this database backend, and are we not using a proxy? If both are true, then we can useselect_from_person() - If it can’t use fast selects, then it defaults to the old implementation
- The what and where parameters are strings of Python code. You can use most simple Python syntax expressions that you would use regularly.
But how does this prevent from sweeping over the entire database? Because in Gramps 6.0 we introduced the filter optimizer, and it knows that if you set selected_handles then you only need to look at those handles, and nothing more. For many filters that only match a few objects, the speedups are large. And by moving the selected_handles into SQL, then the speed up is truly amazing.
One additional nice feature: because the arguments what, where, order_by can all takes strings of Python syntax, it makes it very easy to use in Gramps Web. In fact, with this, Gramps Web can be much faster because it, too, would not need to sweep over the entire set of objects. (Care has been made to be able to use different dialect of SQL, including Postgresql).
@Nick-Hall, there are a lot of details to discuss, but please keep an open mind on the basic proposal. Any detail can of course be changed. In addition, I think “business logic” functions can be implemented using the select_from_table API.
Please let me know if you have questions or comments!