AI Development Environment

I watched an AI Development example demo at work today. Just posting it here to give you an idea of how development is being done today by some companies.

  1. The user asked the system (Cursor with custom rules) to find candidate issues to fix (it has access to our bug tracker)
  2. It came back with a list of items
  3. The user picked one
  4. The system read the issue description, and created a plan (replicate, fix, etc)
  5. The system created a github branch
  6. The system created a fix, and a test, and then ran the test
  7. The system ran linters and other syntactic checks
  8. The system submitted the PR for the fix and new test file (with description, see below)
  9. Updated the issue ticket with link to PR

It doesn’t always work 100%, but this definitely a glimpse into what is possibly ahead. (We’re not necessarily using this for all issues, but we are actively testing this flow).

Result: [OPIK-2242] [BE] Fix validation error handling in stream endpoints by andrescrz · Pull Request #3197 · comet-ml/opik · GitHub

3 Likes

@dsblank Thanks for sharing. I’m aware of some folks using a similar pipeline.

While not automated end-to-end as outlined, I picked a bug today and asked AI to fix the bug. The result is in addons PR 765. It took AI just a minute or so to propose a resolution, I asked a follow-up question which made it change the code slightly, and then I tested manually in Gramps which took most of the time. Done.

This bug is one I had looked at previously, but I wasn’t familiar with the code so I left it open. Once AI resolved it and provided an explanation it seems like a simple fix, but digging into unfamiliar code would have taken me much longer.

I’ll keep experimenting and ask the tools I’m using to do the commit/branching/PR next time. Adding unit tests would be a good experiment for a fix in core code.

3 Likes

I asked Perplexity to rate Python AI development tools. (I’ve tried Perplexity, ChatGPT, Codex, Copilot and Cursor so far. Cannot recommend the 1st two but Cursor was amazing but too opaque. Cursor also becomes utterly unresponsive with no warning when you hit the trial limit. Won’t even respond to questions about how to subscribe. Which leads me to think the company will not value their subscriber’s UX.)

Top AI systems for Python coding prioritize fewest errors from vague (“wish-coder”) specs first, then maximum autonomy.
These rankings draw from recent benchmarks and developer tests up to 2026, focusing on Python accuracy and independent task handling.

Lowest Error Rates

Cursor leads with 95% accuracy in multi-file refactors and vague spec handling, excelling at NumPy, Pandas, and framework tasks without precise prompts.​
GitHub Copilot follows at 89% accuracy for completions from natural language, minimizing bugs in routine Python like Django queries.
Tabnine (local mode) hits 78% with fast, privacy-focused suggestions, strong on asyncio and SQLAlchemy from loose descriptions.​

Highest Autonomy

Aider tops autonomous coding via terminal integration with models like Claude 3.7 Sonnet, auto-mapping codebases, Git commits, and full edits from high-level instructions.​
Cursor shines in agent mode for end-to-end tasks, self-correcting errors and querying codebases independently.
Zencoder automates full apps with multi-file awareness, bug fixes, and workflows across Python projects with minimal oversight.​

Comparison Table

AI System Error Accuracy (Vague Specs) Autonomy Level Python Strengths Pricing (2026)
Cursor 95% End-to-end agent Refactors, frameworks, debugging $20+/month ​
GitHub Copilot 89% Inline completions Pandas, NumPy, Django $10/month ​
Aider High (model-dependent) Full codebase edits Terminal, Git, multi-language Free+LLM costs ​
Tabnine 78% Local completions Asyncio, privacy-focused $12/month ​
Zencoder Strong refactoring App automation Security, multi-file tasks Enterprise ​