Tab Separated Value parser to supplement CSV?

emyoulation · September 15, 2021, 3:23pm

I was experimenting with transferring my Place hierarchy using the CSV view export & the Import Text Gramplet. But one common step is to clean up the data a bit in Excel.

I was hoping to be able to just copy filtered chunks of spreadsheet & have it be parsed in a variant of the Import Text Gramplet … without messing with double-quoting Place Titles & names. And without applying an Excel formula to concatenate each row with comma delimiters or having to write the rows to a subsequently importable CSV file.

But the the Import Text Gramplet chokes on tab separated text. And our importer doesn’t recognize .tsv file extensions nor tab delimited content.

There’s a thread on StackOverflow that says the python csv module can delimit on tabs instead of commas.

with open("file.tsv") as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"')
    for row in rd:
        print(row)

Is there a way to support a different delimiter in our text parser?

prculley · September 17, 2021, 2:40pm

It looks like you have found such a way. Since the Import Text Gramplet is apparently an addon, why don’t you try modifying it to work the way you want, and then submit the changes as a PR?

emyoulation · September 17, 2021, 4:08pm

Thanks for the encouragement. (That’s genuine, not sarcasm.) I’ve been slogging through doing that since posting.

Forked the add-on and doing a first pass as brute force… with a dedicated TSV version that ignores commas as the delimiter. Then I’ll try to figure out how to integrate a delimiter selector.

I think I’ve gotten most of the way to having Gramps recognize a .tsv mime type for import too.

If I get REALLY ambitious, I’ll find a way to have the Text Importer apply a selected Tag or Citation too. I really found those features useful for cleaning up a GEDCOM import. Then discover how to interface that too. (Arrghh!)

emyoulation · September 17, 2021, 8:13pm

I did discover that the CSV header labels from the Export View and what the Import Text will accept are slightly different. (The labels are more compatible when Exporting a Tree than a View.)

Gramps exports views the ID columns labeled as ‘ID’ whereas the CSV exporter, importer and Import Text Gramplet all expect the Column label for ID to be labeled with the primary object type (Person, Marriage, Event) instead.

I wonder which should be the preferred. But it seems like they should be compatible.

emyoulation · January 7, 2023, 7:01pm

Patching eight 5.1.x files with Serge’s changes for Gramps 5.2 gave the TSV functionality for the Text Import Gramplet and CSV Import.

See:

emyoulation · September 2, 2024, 3:36pm

The labelings on the CSV dialect options are ambiguous to me.

excel

excel-tab

unix

Custom
– ,
– ;
– :
– |
– Tab

Here are some more explicit labels. But are they correct/accurate? And can we use a MS branded product name without Trademark infringement?

Excel comma separated values (CSV)

Excel tab separated values (TSV)

Unix (CSV with LF end-of-line)

Custom
– , (Comma; aka CSV)
– ; (Semi-colon)
– : (Colon)
– | (Vertical Bar aka pipe-delimited)
– Tab (TSV)

As for the Unix option, here’s what Perplexity ai describes as the difference:

The main differences between a CSV file created by Microsoft Excel on Windows and a Unix CSV file are:

Line endings:

Windows Excel CSV files typically use CRLF (\r\n) for line endings

Unix CSV files typically use just LF (\n) for line endings

Default field delimiter:

Windows Excel often uses semicolon ( ; ) as the default field delimiter, especially in regions that use comma as the decimal separator

Unix CSV files typically use comma ( , ) as the field delimiter

Character encoding:

Windows Excel CSV files are often saved with Windows-1252 or UTF-8 with BOM encoding

Unix CSV files typically use UTF-8 without BOM encoding

Quoting:
Excel may add quotes around fields containing commas or line breaks inconsistently
Unix CSV files tend to follow RFC 4180 more strictly for quoting

Handling of special characters:

Excel may handle certain special characters differently than standard Unix tools

Metadata:

Excel CSV files may contain hidden metadata or formatting information

Unix CSV files are typically plain text without extra metadata

Decimal separator:

Excel CSV files may use comma or period as decimal separator depending on regional settings

Unix CSV files typically use period as decimal separator

To ensure compatibility when working with CSV files across platforms, it’s often recommended to use a standardized format like RFC 4180 and explicitly specify encoding, delimiters, and line endings when creating or processing CSV files.

Topic		Replies	Views
CSV template for Text Import User Manual data-import	4	497	January 17, 2025
Gramps workflow (roughing in a tree) Gramps Web data-import	8	350	February 21, 2024
CSV import problem Help	5	691	February 6, 2022
Import from (Excel/LibreCalc) spreadsheet Help data-import	5	102	February 20, 2025
Importing Places CSV Help	13	1422	May 20, 2021

Tab Separated Value parser to supplement CSV?

Related topics