Error using PostgreSQL as database backend with Gramps v5.2.2

I installed Gramps v5.2.2 on my Win10/64 bit machine with the goal to use PostgreSQL as database backend. So I selected the Shared PostgreSQL option in the settings and created a new database. No problem with that but when I tried to open this new database, I was welcomed with this window:

image

Sorry for the German, it basically says that the database is damaged “on a low level”. Clicking on the Schließen (Close) button brings up the next error message with the following details and my system information:

Error Details: 
===================

893732: ERROR: dbloader.py: line 107: 'utf-8' codec can't decode byte 0xfc in position 97: invalid start byte
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gui\dbloader.py", line 198, in read_file
    db.load(
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gen\db\generic.py", line 647, in load
    self._initialize(directory, username, password)
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\SharedPostgreSQL\sharedpostgresql.py", line 105, in _initialize
    self.dbapi = Connection(uuid=uuid, **dbkwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\SharedPostgreSQL\sharedpostgresql.py", line 117, in __init__
    self.__connection = psycopg2.connect(*args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "AIO/psycopg2/__init__.py", line 122, in connect
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 97: invalid start byte

1040140: ERROR: dbloader.py: line 107: 'utf-8' codec can't decode byte 0xbb in position 79: invalid start byte
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gui\dbloader.py", line 198, in read_file
    db.load(
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gen\db\generic.py", line 647, in load
    self._initialize(directory, username, password)
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\SharedPostgreSQL\sharedpostgresql.py", line 105, in _initialize
    self.dbapi = Connection(uuid=uuid, **dbkwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\SharedPostgreSQL\sharedpostgresql.py", line 117, in __init__
    self.__connection = psycopg2.connect(*args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "AIO/psycopg2/__init__.py", line 122, in connect
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 79: invalid start byte


System Information: 
===================

Gramps version: AIO64-5.2.2-r1-f905d14 
Python version: 3.11.8 
BSDDB version: 6.2.9 (6, 0, 30) 
sqlite version: 3.45.2 (2.6.0) 
LANG: de_DE.UTF-8
OS: Windows 10/64 bit

GTK version    : 3.24.41
gobject version: 3.46.0
cairo version  : (1, 26, 0)

As far as I understand it, it basically repeats the information in the message pop up with another byte (0xfc @ position 97 instead of 0xbb @ position 79). The following traceback messages appear to be a consequence of this “I cannot read the database” message.

What can I do? Thanks for your help!

Ulrich

When I look up the offending character code, I see a suggestion that you may have some Latin-1 in your data. Can that be the case? Here’s what I found:

A start byte that starts with an F implies that you have a character that needs 4 bytes, and that would be very exotic character for Europe. And even then, 0xFC is illegal, because the 1st byte of a UTF-8 string starts with a number of binary 1’s that’s equal to the length of the UTF-8 string, followed by a binary 0, except when its lengt is 1, in which case the start byte starts with a binary 0, meaning that it’s standard ASCII.

For a 4 bytes code, this means that it starts with 4 1’s, meaning F, followed by binary 0, meaning that the next nibble can be no higher than 7.

Code 0xBB is wrong, because it starts with a single binary 1, followed by a 0, which is wrong too, because the UTF-8 start byte must have between 2 and 4 1’s at the start, not 1.

I recommend that you look up the offending codes in the Latin-1 character set, and see if you have these characters somewhere in your tree.

In my earlier message, I was assuming that Gramps was sending UTF-8 to the database, but that assumption is probably wrong for pickled Python objects, which are BLOB’s. This suggests that the problem may disappear if you switch the backend to binary, because in that case, it will not try to interpret any encoding at all.

This is a guess based on experience with other software, and on what I read on stack overflow. I’ve never used or configured PostgreSQL myself.

Hi Enno,

The error message appears with a completely new and empty Database as I described. So the content of my tree cannot be the Problem. It must come from a systemic issue, perhaps a PostgreSQL Server configuration issue. Any idea?

Thx Ulrich

How did you create your database ?
What coding did you use ? mine is UTF8.

Try PostgreSQL rather than SharedPostgreSQL.

Right, it’s UTF8 as well.

I would have but there was no PostgreSQL, only SharedPostgreSQL.

The PostgreSQL addon can be found in the addons repository.

My database is on a postgres server and is configured as SharedPostgreSQL.

What is the name of your database ?

Check all the ini files for the database in the gramps db folder, I have had some problems on Windows where there was or was not “” or ‘’ around the database name etc.

If there are single or double quotes around the database name and connection string, remove them, if there none, add them.
If there are single quotes, try double quotes.

If I remember right, you also need to do this change to the connection string etc.

I had a lot of problems with this both for the MongoDB and PostgresSQL on my Windows installations, and when trying to connect to a native PostgreSQL from WSL and hyper-v clients.

I think I wrote something about it a few years ago, but do not use it anymore and don’t remember exactly what I had to change.

1 Like

Hi Ulirich,

It looks like I missed the word ‘new’ in your message. Could it be that there are weird characters in the connection string for that database? I assume that the uuid is plain text, but I am a bit worried about those args. And it looks like some binary data is passed where your database doesn’t expect that.

Can you add print statements to the code, to figure out what it’s doing? And is it possible to switch the database to binary mode?

A lot of interesting and quite diverse ideas that were brought up by @StoltHD @ennoborg @Nick-Hall @SNoiraud – thanks! I would conclude that Gramps running with a PostgreSQL backend on Win10 appears to be not something we can find quite often :wink:

I’ve done some other tests and would like to share the results:

(1) After the add-on update this morning I now have both PostgreSQL and SharedPostgreSQL to choose from in the configuration, so following @Nick-Hall, I chose PostgreSQL. It appears that the AIO does only include SharedPostgreSQL and PostgreSQL has to be added by an add-on update.

(2) Then I created a new database in my PostgreSQL 16 instance running on localhost:5432 with pgAdmin 4. The database has the name HABGRI3D and its encoding is set to UTF8. The tablespace is pg_default (the sole option):

I don’t see any way to switch to “binary” as @ennoborg proposed.

(3) Then I created a new tree in Gramps with PostgreSQL as database backend. The tree has also the name HABGRI3D following the Linux implementation description. No problem with that.

(4) Then I tried to open this new tree in Gramps. The authorization popup asking for user id and password for the PostgreSQL database appears

image

and I intentionally entered a wrong user id and a wrong password. The error message with the low level corruption pops up

image

This time both the byte (0xfc instead of 0xbb) and position (97 instead of 79) are different from the first screenshot.

(5) After clicking on Schließen (Close) the usual Gramps error/traceback appears:

6407145: ERROR: dbloader.py: line 107: 'utf-8' codec can't decode byte 0xfc in position 97: invalid start byte
Traceback (most recent call last):
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gui\dbloader.py", line 198, in read_file
    db.load(
  File "C:\Program Files\GrampsAIO64-5.2.2\gramps\gen\db\generic.py", line 647, in load
    self._initialize(directory, username, password)
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\PostgreSQL\postgresql.py", line 105, in _initialize
    self.dbapi = Connection(**dbkwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Ulrich Demlehner\AppData\Roaming\gramps\gramps52\plugins\PostgreSQL\postgresql.py", line 118, in __init__
    self.__connection = psycopg2.connect(*args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "AIO/psycopg2/__init__.py", line 122, in connect
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 97: invalid start byte

I do not see any differences to the “old” error/traceback message above.

(6) Checking the database HABGRI3D in pgAdmin 4, I cannot identify any changes exactly as it should be with wrong credentials.

(7) Now I repeat (3) and (4) but this time with correct credentials. And voila, no error message appears. I import my Gramps XML file with the 96k individuals and 238k events and I see in pgAdmin that the tables I’d expect have been created. The import appears to be significantly slower compared with the import into the good ole BDDB, but I have no exact figures.

(8) I repeat (3) and (4) with a database and tree called HABGRI 3D (notice the blank!) and correct credentials. No error messages, everything is fine.

(9) Now the same story again (actually with HABGRI3D2 as name of database and tree) but with SharedPostgreSQL as database backend and correct credentials. And now the error is back.

Obviously SharedPostgreSQL is not a good idea whatever the reason may be.

But the really bad news is that working with PostgreSQL appears to be much slower compared with BDDB. A regex query in the Person view that used to run some 10 - 15 sec on BDDB, needs now 50 sec. Even the queries in the Citation and Places views that used to be finished in 1 - 2 sec, now need 7 - 8 sec. And a query in the Event view runs nearly 1 min. These observations fit into my playing around with SQLite as backend, since SQLite appears also to be much slower than BDDB. Ok, not everybody has a database with nearly 100k individuals, but it appears to be a bit counterintuitive that a modern database like PostgreSQL is left in the dust by an oldie like BDDB.

Thanks for your help!

Ulrich

2 Likes

the slowness of the postgreSQL and SQLite is because of the use of the BLOBS that can’t be indexes with either SQLite nor PostgreSQL…

I bet it will be a lot better when those are changed to JSON-objects and some Indexes and maybe even some database Views.

Or when the full functionality of tables, relations, views and indexes are utilized in the Relation databases.

That was my assumption (or better: my fear) as well … Therefore I’ve built a small app that extracts my data from the XML file and feeds them into a PostgreSQL database with a few basic indexes. As a consequence I can experience on a daily basis what PostgreSQL would be able to do once those BLOBS have been gone. Those BLOBS were probably a very good idea 10 or 15 years ago, but today they simply hamper progress.

1 Like

BSDDB is actually a very efficient key-value store. We also optimised the database environment for use with Gramps.

With SQLite, you could try increasing the cache size. In the BSDDB backand we specified 64M. After line 103 in the gramps/plugins/db/dbapi/sqlite.py file try adding:

self.__connection.execute("PRAGMA cache_size = -64000")

Our current filters loop through all records in a table, so you won’t get any performance increase by adding indexes.

We already extract some data from the BLOBs into separate columns so that they can be indexed.

If we move the raw data storage of objects from BLOBs to JSON this will avoid the duplication of data.

2 Likes

Ok, I see. If I use DBeaver as frontend for the PostgreSQL database table person, a regex query on the indexed column surname is finished after 0.15 sec. Just to define an objective :wink:

Yes. We create the person_surname index on the surname. I would expect it to be quick.

Some years ago now I wrote a prototype that passed a filter rule hint to the database. The idea was that the database could reduce the number of objects quickly using an index before looping through the remaining objects.