Policy for training AI models with our data?

Hello!

Derivated from post “GrampsChat gramplet for gramps 6.0”, I wonder if AI tools should have access to our local media objects ? I am looking to one AI, but I suppose that there is a policy (fair use ?) for limiting training models with our data ?

Go back on the user side, with a lazy prompt…

Why should I try to upload an image into Le chat from Mistral AI (IA) ?

Uploading an image into Le Chat from Mistral AI can be beneficial for several reasons:

  1. Image Analysis: Le Chat is equipped with advanced image understanding capabilities, allowing users to analyze, interpret, and extract information from images seamlessly. This can include inquiring about the image’s content, details, or extracting relevant information. For example, you can upload an image and ask questions about its content, such as identifying objects or describing the scene1.
  2. Document Processing: Le Chat can process large, complex PDF documents and images for analysis and summarization. This means you can upload documents containing graphs, tables, diagrams, text, formulas, and equations, and get insights or summaries of the content2.
  3. Optical Character Recognition (OCR): The platform uses high-accuracy OCR models to extract text from images, making it useful for digitizing printed documents or extracting text from scanned files3.
  4. Multiple Image Analysis: You can upload several images for comparison, trend identification, or detailed analysis. This feature is particularly useful for tasks that require analyzing multiple visual data points1.
  5. Integration with Other Features: Le Chat integrates image understanding with other features like internet search and PDF analysis, providing a comprehensive AI experience that can enhance various tasks, from research to creative projects4.

These capabilities make Le Chat a versatile tool for both personal and professional use, offering advanced image and document processing features that can simplify complex tasks.

Why should I not try to upload an image into Le chat from Mistral (AI) IA ?

There are several reasons why you might choose not to upload an image into Le Chat from Mistral AI:

  1. Privacy Concerns: Uploading images to any online service involves sharing your data with the service provider. If you have sensitive or personal images, you might prefer to keep them private rather than uploading them to a platform5.
  2. Usage Limits: Le Chat has daily usage limits for image uploads in its free tier. If you need to analyze a large number of images, you might hit these limits quickly, which could be a limitation for extensive use6.
  3. Content Restrictions: Mistral AI has certain restrictions on the type of content that can be generated or analyzed. For example, it refuses to produce images of video game characters. If your use case involves such restricted content, you might find the platform limiting7.
  4. Accuracy and Reliability: While Le Chat offers advanced image understanding capabilities, the accuracy of the analysis can vary. For critical applications where precision is essential, relying solely on AI-generated insights might not be sufficient. Human verification may still be necessary8.
  5. Alternative Tools: Depending on your specific needs, there might be other tools or services that offer more specialized or advanced features for image analysis. For instance, if you require highly detailed or industry-specific image processing, a dedicated tool might be more appropriate9.
  6. Cost Considerations: Although Le Chat offers many features for free, there are premium tiers with higher service guarantees and limits. If you require these advanced features, you might incur additional costs, which could be a consideration depending on your budget10.

These factors might influence your decision to upload images to Le Chat, depending on your specific requirements and concerns.


Jérôme

Hello.

After testing (I do not start a fine-tuned AI model yet), maybe mixing calculation and language still generates some mistakes. Anyway, with a limited number of technical media objects, on a specific domain (architecture), location and period (Alsace in 1905-1910), the generated answers were very good! The context still needs to be clear at the beginning. e.g., I do not know if the answers will be the same by making the test with the same sources and text, on an other location of the world…

So, the transcription (was handwritten “old-german”), the translation (technical, cultural, surnames, places and old dated words), then the resume, the filtering/matching and output (support, quality of text), were impressive for a free version. The time and size limitations (4 media / 3 hours gap) were not a problem on a quick conversation.

Some years ago, I tested Transkribus, which is still very good, for such type of document transcription (source). Often better and accurate then popular AI.

About the self-hosting and privacy, maybe one (or two) old PC offline might make the job!

regards,
Jérôme

One is also starting a git repository for using these OCR features.