Sure deprecated does not mean stopped, as no more available, but I suppose this means that next updates on their API might block the transcription?
I was just wondering how to use such services without internet… LLMs as a local service makes sense if you have some limitations on the web traffic/flow or just do not have access to the web.
TrOCR
(Microsoft) seems to share some ‘generic’ models
but you can find some others more specific, like:
By looking at this video (youtube), we can see that more than 30 000 documents could be really expensive via specialized services like Transkribus
. It looks like that open-source ecosystem, LLMs, API fit better?
Running models created in Transkribus, I do not know. Just see that Arkindex
can import Transkribus
data and metadata. I was thinking on public models, like maybe the German 17th one. Most of them are PyTorch
compatible (e.g., above PyLaia
and public models references).