Yet another query language for gramps-web?

I recently saw that @DavidMStraub had created a project call Gramps Query Language, and it is even included in gramps-web. A query language is a good idea, but I found the Jira-inspired GQL not very intuitive, and has lots of builtin things (strings and field names look the same), and many loose ends.

I wondered why not use the language that at least many Gramps developers already use: Python! Many devs would be able to help construct queries.

I made a drop-in replacement for GQL. I call it “Pythonish QL” because I thought we might want (or need) to alter the Python syntax a little. But after a few revisions, no syntax needed to be changed.

So, I made a system that uses Gramps objects, with some clever items pre-defined in the query environment. I also liked @emyoulation 's suggestion to include the SimpleAccess API. I also used many of @DavidMStraub 's ideas and test suite (adapted).

So here are some examples:

Select all people married to someone named “Donna”:

sa.first_name(sa.spouse(person)) == "Donna"

Select all people with a note:

person.get_note_list()

A complicated one: find all of the people who have the word “vote” in a note:

any([('vote' in str(get_note(handle).text)) for handle in person.get_note_list()])

It should be possible to access all data in gramps, because we use the standard library.

It might be a little faster than GQL because the raw gramps data is not converted into objects, then dictionaries, and then back to objects (if needed). The query is also compiled, and uses eval().

I don’t think there are any security issues using eval() on these expressions. But if there are, we can do a couple of additional items to lock that down.

You can install it, or check it out:

  1. GitHub - dsblank/pythonish-ql: A Python-like language for querying JSON data
  2. pip install pythonish-ql

Let me know what you think.

6 Likes

If it is ok, I’ll make PRs for gramps-web, and gramps-web-api.

If it useful, it might also be a nice way to filter in the main Gramps too. I don’t think the Gramps app has a free-form query system.

3 Likes

@DavidMStraub , I see a couple of items on the GQL road map:

  1. Better support for dates, e.g. comparing a string to a date
  2. Performance improvements. Currently, the whole database needs to be read even for a simple query like class=tag.

I’d like to address these in the Pythonish QL. For number 2, I’m imaging pql.get_tables(query) that would return None (for all tables) or a list of Gramps’ table names (eg, [“person”, “note”]) if the query only applies to those tables. Shall I make a PR for that on gramps-web-api?

3 Likes

Hi Doug, great to see you contributing to Gramps-related stuff again! I think your approach is an interesting alternative, and it’s good that it uses ast and not exec.

At this point I think I should also point out that, when I wrote GQL, I did not take into account the existence of SuperTool, as @kku pointed out in this thread. Am I right in assuming that that tool’s syntax is more similar to yours than to GQL?

(Btw, GQL is more SQLish than Pythonic and I wrote it after using Jira Query Language at work.)

3 Likes

I don’t want to spoil this interesting discussion here but I wonder if it wouldn’t be better in the long run to focus the development effort on allowing Gramps to use modern database backends (e g PostgreSQL)? This would not only allow to use plain vanilla SQL as query language together with all the tools available on the market for those databases, but it would also eliminate the terrible performance issues with large trees (e g 50k+). I have such a tree and as a matter of fact, I’m forced to use BSDDB since it has a little better performance compared with SQLite and PostgreSQL. And as an additional goodie: any discussion about the performance of queries would become obsolete.

Just thinking …

Thanks, I’m glad to be able to start digging into gramps-web!

It looks like SuperTool might have some related support. But still one would need all of the surrounding infrastructure to make it work as a web-based API.

And I found the following timing results startling. I ran equivalent query from PQL and GQL on the same data (the Gramps Example data):

=========================
Pythonish QL
    Query: 'person'
    Matches: 2157
    Time: 0.27110815048217773 (seconds)
Gramps QL
    Query: 'class = person'
    Matches: 2157
    Time: 2.1478145122528076 (seconds)
=========================
Pythonish QL
    Query: 'all([child.get_gender() == Person.FEMALE for child in sa.children(family)]) and len(sa.children(family)) == 3'
    Matches: 8
    Time: 0.31479573249816895 (seconds)
Gramps QL
    Query: 'class = family and child_ref_list.all.ref.get_person.gender = 0 and child_ref_list.length = 3'
    Matches: 8
    Time: 5.750255107879639 (seconds)

They give the same results, that’s good! I haven’t really done any optimizations yet, but looks like GQL is very expensive.

This discussion is about query languages that has the power forms (QBE), but via text. SQL is a possibility, but is pretty painful to write on no-sql like data like Gramps objects, and wouldn’t work on BSDDB.

Have you tried gramps-web on tables with 50k rows? If it doesn’t work well yet, I think it holds the promise to.

Quite but I’d like to point out the major strategic problem Gramps currently has and this is the missing support for modern database backends. I’ve had a lot of discussions here about that, I fully accept the arguments raised in those discussions, I fully understand the historical background, but I feel that this is the “mother of all problems” from a software strategy point of view. So developing a query language is an interesting thing to do but to be honest, I do not see how this would help to give Gramps a strategic push and advantage on the market.

I know from discussions with @DavidMStraub that it holds the promise but since I’m on Windows, I’ve never been able to get a Gramps-Web container up and running. So no, I haven’t tried on my own machine. I tried it on a Digital River droplet for a short time and was able to import a large tree (90k) with a lot of help from @DavidMStraub (since the import run into a time-out problem), but the performance using the tree was simply horrible. I guess the droplet I used had simply not enough power but as a conclusion, I cannot see that the promise you mentioned is currently realized.

There is a good chance that you don’t know who I am :smile: I lead the move to more modern databases here many years ago. It took quite some effort to make the first steps to abstracting the database in order to do that. Today you can run gramps on all kinds of engines, including mongodb, postgresql, sqlite, bsddb, and probably many others. Evolution takes some time.

You are not alone. The complexity of gramps-web is overwhelming. No, the promise is not yet realized. Work in progress.

2 Likes

@DavidMStraub , I implemented the get_tables() API mentioned (returns all table names if none explicitly used). For example:

pql.get_tables("[person for person in note]")

returns ["note"]

Timing results are about twice as fast for PQL (not surprisingly).

=========================
Pythonish QL
    Query: 'person'
    Matches: 2157
    Time: 0.12743520736694336 (seconds)
Gramps QL
    Query: 'class = person'
    Matches: 2157
    Time: 2.400017023086548 (seconds)
=========================
Pythonish QL
    Query: 'all([child.get_gender() == Person.FEMALE for child in sa.children(family)]) and len(sa.children(family)) == 3'
    Matches: 8
    Time: 0.11871480941772461 (seconds)
Gramps QL
    Query: 'class = family and child_ref_list.all.ref.get_person.gender = 0 and child_ref_list.length = 3'
    Matches: 8
    Time: 6.958705902099609 (seconds)
1 Like

Yep, I can confirm Gramps QL is very slow at the moment. This is particularly true for all queries involving get_. In the first implementation (which is as far as I got until now), I focused on getting all the syntax I wanted to work. So all profiling will be embarrassing at this point :blush:

1 Like

Bingo … Great to meet you!

As I said I’ve had a lot of discussions here and my impression was that my arguments were not shared by everyone. So may be I’ve been discussing with the wrong people. But if you say that Gramps now supports a lot of modern database backends (which of course is correct), I’d like to point out that this is not the kind of support I’ve in mind. Support in my understanding means that Gramps fully leverages the performance of the database backend which definitely is not the case with SQLite and PostgreSQL in v5.2. My understanding from the discussions was that fully leveraging the database backend (including indexes and all that stuff) would make it necessary to modify the existing Gramps object model in a significant way (or basically re-writing it) and nobody is particularly keen to open this Pandora’s box.

At least this was the explanation why I do not see any performance improvement when using SQLite or PostgreSQL in v5.2 with my 90k tree. The performance was so bad in my tests (query run times of 60 sec instead of 15 sec on v5.1.6/BSDDB) that I was forced to go back to v5.1.6/BSDDB (since the first v5.2.x did not support BSDDB as database which it now does).

May be I got all those discussions wrong, but that is my understanding for my current situation being stuck with v5.1.6/BSDDB. To be able to work with my 90k tree, I wrote a PostgreSQL application that basically is a port of the good ole SQLite export from Gramps. I use it to run regex queries for names and when I need some 15 sec for such a query in “native” Gramps, PostgreSQL needs some fractions of a second if the tables are indexed in a proper way. This is what I’d understand as “full support of a modern database backend”, so I guess there is still a significant way to go.

I fully agree and having been working in IT and business management for a lot of years, I know that it is quite important to stay focussed on the strategic goal and not to get distracted by other interesting things …:wink:

1 Like

From what I have seen in the git history (since I joined contributing to Gramps after the fact), indeed without your contributions @dsblank, none of the projects like Gramps Web building on Gramps “core” would be possible. And indeed we have had modern database backends as an option for many years, and SQLite has finally become the default, which is great!

@UlrichDemlehner you do have a point that there is still a lot of room for improvement, just two things I would like to point out,

  • Some of the things in Gramps Web or slow because they are using code that was never intended to run in a multi-user network environment, the overall Gramps project shouldn’t be blamed for that.
  • Some of the things in Gramps Web are slow because of poor/incomplete implementation (see GQL - but so far I though I’m the only one using that feature :wink:)
  • Some of the things in Gramps Desktop are slow over the network because they have been developed/test assuming a desktop application accessing an SQLite database on a local disk. Of these bottlenecks, some have been patched in Gramps Web, but not yet on desktop. See e.g. Sorting and date filtering is too slow · Issue #433 · gramps-project/gramps-web-api · GitHub.

So yes, there is lots to do, but I am optimistic an incremental improvement approach will work.

1 Like

BTW: the problem was not Gramps-Web. The problem is Docker on Windows. I really doubt that there is a signifikant user base for Docker on Windows and the documentation doesn’t help very much to tackle the problems. Between the lines you can always read: “Why waste your time with Windows and we don’t waste our time to provide any useful documentation for that” …

1 Like

Yes, you hit the nail on the head. A solution that I have been thinking about for many years is complex, but I think worth it. Briefly, I’m imagining a system that directly interprets the kinds of queries above in this Python QL into a native DB access, if the db supports it. If not, it carries out the query as it does today. But if it can support a low-level direct DB query (for example, in SQL) then it should be able to handle 50k records with ease.

But I’m not thinking about that yet. I’m just getting my feet wet in gramps-web to see where I can help for now.

1 Like

Please don’t misunderstand me: I do not in any way blame anybody for anything. As I said I fully understand the historical background and why Gramps developed into the direction it did. That’s perfectly fine for me. But as I tried to explain, the current most significant IT strategic issue is in my point of view how Gramps will be able to fully leverage the modern database backends we now have (and we did not have 15 years ago). And what I currently see is that there is a lot of room for improvement. Somebody pointed out in a different discussion that incremental improvement is quite limited in software development and may be, Gramps is in such a situation now.

Of course I’ve always been talking about local database backends. Everything (Gramps, SQLite, PostgreSQL, BSDDB) runs on my local machine.

1 Like

Quite – that’s my point.

Just a very vague idea: Gramps already has the SQLite export which obviously is able to “translate” between the native Gramps objects and the tables of a SQL database. Shouldn’t it be possible to use this code base for the system you’re thinking of?

I guess you will understand that I don’t appreciate your approach :wink:

Yeah, I wrote that. But it converts data at the low-level, not objects.

We’re on the same page, but this is a hard problem. We can do it. :smile_cat:

1 Like

Sure. My way of thinking was: an object is filled with data (an empty object basically is not worth being stored in a database), so if the SQLite export is able to translate those data into rows and columns of a SQL database, it basically translates between objects and a database backend. Consequently, the code base should be able to throw some low level SQL statements onto the database backend when being confronted with the high level request to e g list all individuals with a certain name born between this and that date. And if it is able to throw SQL statements onto the database backend, then it will be able to leverage the full performance of the backend.

object_ql now has some security restrictions, and resources protections. You can no longer access any item that starts with an underscore, and the user can’t accidentally, or purposely, create a query that locks up the server (current limit of time execution per expression evaluation: 1 second.

@DavidMStraubm, where should high-level and specific discussions happen related to the gramps-web development?