Gramps Web API, list performance

This post is about performance issues in gramps-web-api, specifically in the list views.

First, my perspective: there is nothing in gramps-web-api that should make it work slower than Gramps Desktop, and in fact there is no reason why gramps-web-api can’t scale up to efficiently handle hundreds of thousands of records (or more).

With that in mind, I’ve been testing both Gramps Desktop and Gramps Web on a test dataset of 100k people. That may be larger than many family trees, but is very helpful in finding places that are too slow in general. So all of the stats below are based on this dataset. Also, I’m going to use the term “Gramps 7” below to refer to a future version of Gramps that could select data using SQL directly.

Let’s start easy: how much time do you think it will take to just see the people listed as “female” (eg, using the IsFemale filter)?

Scenario IsFemale filter (seconds)
Gramps 6 52
Gramps 7 0.63
Gramps Web API 104

So this simple query shows the issues. First, a SQL query can perform the task in less than a second. Second, gramps-web-api is twice as slow as Gramps 6. That’s it.

We could work on fine-tuning gramps-web-api to get it closer to Gramps 6, or we can start thinking ahead to how to make it comparable to a Gramps 7 scenario. But because we are free of the constraints of Gtk, it could be even faster.

This first post in this thread identifies a problem that I believe can be fixed, and is worth fixing. I’ll describe in my next post my setup for testing this and why this is even worse than Gramps Desktop.

3 Likes

Here is my setup. This is unique to me, because I want to get rid of as many moving parts as I can. No docker. This uses my SQLite database. Postgresql may behave differently (if so I’d like to know). I’m assuming that it runs about the same.

I’m going to use the following Python Packages:

  1. gramps-bench - for creating the test dataset of 100k people
  2. gramps-web-desktop - for easily getting gramps-web running
  3. gramps-api-client - for easily running and timing of the endpoints

Creating a common test set of 100k people:

pip install gramps-bench

gramps-database-generator 100000 --seed 42

This will create a Gramps family tree database called “gen-100000, random-42, version-1”

Starting gramps-web and gramps-web-api directly:

pip install gramps-web-desktop

gramps-web-desktop "gen-100000, random-42, version-1" username password

Timing tests:

pip install gramps-api-client

Create the API:

from gramps_api_client.api import API

api = API("http://localhost:5000", "username", "password")

Timing tests (in ipython):

%%time
people = api.get_people(page=1)

# Wall time: 8.26 s
%%time
people = api.get_people(rules={"rules":[{"name":"IsFemale"}]}, page=1)

# Wall time: 1min 47s

Gramps 6 is just the latest version of gramps, and Gramps 7 is this branch:

Next, a description of the problem, and possible solutions.

1 Like

The root of the problem is this code:

It does the following, in order:

  1. First it loads all of the people objects into memory
  2. Then it makes an index dict
  3. Then it sorts the objects
  4. Then it applies the rules on the given handles
  5. Then it might sort them again
  6. Then it only selects one page of objects
  7. Then it gets more data for each of the matching objects

I don’t understand all of these steps yet, but I know that the cost is 2x Gramps Desktop.

1 Like

Thanks a lot for this initiative!

I think it’s really time to improve Gramps Web’s performance. Incidentally, the next release of Gramps Web API (to be released next week) will contain a script with some “typical” API calls like the ones the frontend issues, so we can compare performance across deployments (database backends, tree sizes, etc.). We can also add more benchmarks to this script as we learn which ones are relevant.

A general comment on the list endpoints: Yes, having them faster would definitely be great, but we should also take into account that there are few occasions where a web app would request all objects, especially if there are thousands of them. The list views in Gramps web are paged and only fetch 20 items at a time. Of course slow sorting is still an issue in this case, but what I recommend is focusing on reducing the time for sorting & filtering (which is what you are after anyway I believe), and not focus too much on overhead that comes from API specifics (like adding the “profile” and “extended” data, which is very slow but comes after pagination). For those reasons, I also don’t think it’s too useful to compare Gramps desktop with Gramps Web-

A general though about caching: what I realized some time ago was that Gramps Web was incredibly slow for certain things like relationship calculations especially when using PostgreSQL (so I think it’s very important to not do the benchmarks with SQLite only), because there is code in Gramps core that contains for loops with database query in each loop iteration, and sometimes thousands of iterations. This can be ok with SQLite on disk, but already a local PostgreSQL server adds an unacceptable penalty, and in the network this becomes deadly. That’s why I added explicit caching for some calls here: Add cached people/families proxy DB to speed up relationship calculation by DavidMStraub · Pull Request #598 · gramps-project/gramps-web-api · GitHub

If we can get rid of this again by removing the bottlenecks in Gramps itself, that would of course be much better.

Finally, I think fast filtering (ideally JSON-based) would be super useful for one of the big missing features of Gramps Web API, namely filter-based permissions, see these issues:

2 Likes