Tab Separated Value parser to supplement CSV?

I was experimenting with transferring my Place hierarchy using the CSV view export & the Import Text Gramplet. But one common step is to clean up the data a bit in Excel.

I was hoping to be able to just copy filtered chunks of spreadsheet & have it be parsed in a variant of the Import Text Grampletwithout messing with double-quoting Place Titles & names. And without applying an Excel formula to concatenate each row with comma delimiters or having to write the rows to a subsequently importable CSV file.

But the the Import Text Gramplet chokes on tab separated text. And our importer doesn’t recognize .tsv file extensions nor tab delimited content.

There’s a thread on StackOverflow that says the python csv module can delimit on tabs instead of commas.

with open("file.tsv") as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"')
    for row in rd:
        print(row)

Is there a way to support a different delimiter in our text parser?

It looks like you have found such a way. Since the Import Text Gramplet is apparently an addon, why don’t you try modifying it to work the way you want, and then submit the changes as a PR?

1 Like

Thanks for the encouragement. (That’s genuine, not sarcasm.) I’ve been slogging through doing that since posting.

Forked the add-on and doing a first pass as brute force… with a dedicated TSV version that ignores commas as the delimiter. Then I’ll try to figure out how to integrate a delimiter selector.

I think I’ve gotten most of the way to having Gramps recognize a .tsv mime type for import too.

If I get REALLY ambitious, I’ll find a way to have the Text Importer apply a selected Tag or Citation too. I really found those features useful for cleaning up a GEDCOM import. Then discover how to interface that too. (Arrghh!)

I did discover that the CSV from the Export View and what the Import Text will accept are slightly different.

Gramps exports views the ID columns labeled as ‘ID’ whereas the CSV exporter, importer and Import Text Gramplet all expect the Column label for ID to be labeled with the primary object type (Person, Marriage, Event) instead.

I wonder which should be the preferred. But it seems like they should be compatible.