Gramps Performance Testing

As we begin to explore new Gramps Enhancement Proposals (GEPS), it is important to measure the impact that various changes might have on performance.

To that end, I’ve started a new project that will live outside of gramps core that will measure the timing of various database functions. (It should live outside of gramps because the versions of the testing will be independent of the version of gramps).

I haven’t seen overall comparisons looking at particular functions over various versions of Gramps. Here is one showing the 6.0 filter optimizer compared to 5.1 and 5.2.

And here is a test of the speed of transactions (I don’t know what that work was that made such a big difference in 5.2—nice job team!):

Another filter optimizer example (one can’t necessarily tell which filters have been optimized, but is-descendant-of has been):

Of course, not everything has gotten faster (this is getting person objects from random handles) :

I’ll make a repo soon and keep up to date stats on recent versions, and allow developers/others to run the tests themselves. Can also provide data as to what needs to be improved.

8 Likes

Introducing gramps-bench : gramps-bench · PyPI with the source code here: GitHub - dsblank/gramps-bench: Scripts to measure and record Gramps genealogy program benchmarks This project is designed especially for developers that want to test how different backend version and settings effect performance of the Gramps’ backends.

Developers: checkout the tests in gramps-bench/gramps_bench/performance_tests.py at main · dsblank/gramps-bench · GitHub

We’ll also have to decide on what gramps family tree to use for standard benchmarks (or create one randomly).

The idea is:

  1. Run gramps-bench to get a baseline
  2. Make some changes to the installed gramps code (let’s call this 6.0.4-t1 for test 1)
  3. Record the benchmark
  4. Display the results

Basically, this works like:

pip install gramps-bench --upgrade

gramps-bench path/to/a/family-tree-file
# make your changes to code
gramps-bench path/to/a/family-tree-file --version 6.0.4-t1

This will create in the current directory a folder called .benchmarks (this is a standard format using pytest-benchmark). To see the results:

gramps-bench

and look for a PDF in the folder. There are more details in the README.md. Let me know if issues… this is the initial version.

Next up for me: add the standard benchmark results to the repo for easy comparisons for users to see and compare.

2 Likes

If you would like to make a benchmark for your machine/os/version of Python to share with others do this:

git clone git@github.com:dsblank/gramps-bench.git
cd gramps-bench
git checkout -b your-branch-name
cd benchmarks
gramps-bench-all path/to/gramps/example.gramps path/to/gramps_source

and then make a PR for the changes.

In testing some of the suggestions made by AI, I started with a database config for setting SQLite pragma values (called “config” in chart below). One interesting big difference is in the db.get_person_by_handle() method:

As you can see, it reduced the time by 50%, back down to version 5 times. However, it doesn’t seem to reduce the time of other lookups. Why? I don’t know yet. They all seem to have indices. Perhaps it is in the ordering of the tests…

3 Likes

Yes, seems like the sqlite cache space is filling up. I’ll see if there is a solution.

1 Like

@dsblank Are you using the example database (Garner/Zielinski) for performance testing?

Yes, right now I am using the example.gramps file:

cd gramps-bench/benchmarks
gramps-bench ~/gramps/gramps/example/gramps/example.gramps --version TESTNAME

But we do need to find a larger one and decide on that going forward.

I was thinking about writing code to generate one, that way we could always add the newest objects, and object attributes. Or if there is a modern gramps-specific one that has close to 100k people, that could work.

We already have several large databases:

https://gramps-project.org/wiki/index.php/Gramps_Performance

1 Like

That’s true, but:

  1. They are all private/can’t be shared. We need a public tree
  2. They are all pretty old, don’t contain modern Gramps items

I think we can algorithmically create a large tree, and continue to add new things to it over time. We want to performance test many methods.

Thanks for pointing to that page! I knew I had done some performance testing in the past. This time we’ll do it right: automate it!

2 Likes

One thing that stumbled upon is that if we use PRAGMA journal_mode = WAL; in the SQLite layer then we can use parallel processing in places like finding ancestors and descendants.

My AI coding environment says:

WAL mode allows multiple concurrent readers to access the database simultaneously. While SQLite doesn’t support concurrent writes, it does support:

  • Multiple readers at the same time

  • One writer at a time (with readers still allowed)

This could have a very large impact on performance.

Indeed, it can have a large impact. Here is the example gramps family tree 2,157 people. Finding descendants of random people using a parallel search (using 4 CPUs) is about 40 times faster than 5.1.

2 Likes

Should multiprocessing threads reserve the main core for parts of Gramps that are blind to extra cores?

I don’t know what this means. I’m using Python’s threads. Maybe I should have written “4 threads” rather than “4 CPUs” to be more precise.

From the documentation it looks like it ensures consistent data at the transaction level

For example, if the timeline is

Thread 1: read a Person record
Thread 2: delete the Person and events referenced by the Person
Thread 1: read the events referenced by the Person

then we need to ensure that the reads in thread 1 are a single transaction to ensure consistent data. Am I right in thinking we only use transactions for writes today?

I believe that is correct.

1 Like

gramps-bench now has up to date stats on performance, including current (proposed) work on 6.1:

gramps-bench/README.md at main · dsblank/gramps-bench · GitHub

1 Like