I Built the Same App in Windsurf, Cursor, Lovable & Copilot - Here’s What Actually Worked

Introduction

In this blog, I'm sharing my first-hand experience testing four of the most talked-about AI coding tools on the market: Windsurf, Cursor, Lovable, and GitHub Copilot. The goal? To determine which one is best suited for developers who want to:

·       Build projects from scratch

·       Iterate quickly and efficiently

·       Maintain full developer control

·       Maximize bang for the buck

·       Run and test code easily

·       Integrate with external tools and LLMs

·       Handle version control (like pushing to Azure DevOps)

For this exercise, I’m not just comparing chat experiences, but the real outcomes: code that runs, is version controlled, and can be extended securely and efficiently.

Setup

To do this, I created the exact same chatbot in all four tools to keep the comparison grounded. This doesn't ensure perfect fairness, but it gives a common baseline: a simple, realistic application to test how well these tools perform in a real-world scenario. Here are the steps I executed with each tool:

·       Build a basic chatbot UI, inspired by an image reference (see below)

·       Add UI elements like feedback buttons, editable feedback inputs, and source mentioning

·       Connect the bot to an LLM like GPT-4.1

·       Log user feedback (I chose Azure CosmosDB for simplicity)

·       Manage the project with Git and Azure DevOps for version control

Ranking

While using the tools, I made notes on the experience of working with it, the quality of the output, the pricing and any other pros and cons. Based on that, I ranked them from best to worst.

🏆 The Winner: Windsurf

Output quality: For each tool, I wanted to know how well the AI agent would manage to build a chatbot in one single try based on the image alone. This is very important because it defines the starting point of your application from where you begin iterating. The better the starting point, the more time you potentially win. Windsurf nailed this from the get-go. After just one prompt, the generated chatbot looked very close to the provided example.

It also performed very well on follow-up tasks. For example, when asked to integrate GPT-4.1, Windsurf did it perfectly. More than that, it also automatically created a .env file to safely store my credentials and asked me to insert them before running the code. I didn’t even have to think about how to do these things securely myself. This shows a built-in awareness of security best practices, something not all tools delivered.

Pros:

  • Very easy to set up. I was onboarded in a couple of minutes

  • The agent came up with a clear folder structure and security concerns without me having to prompt for it

  • UI generation from the image was impressively close to the reference

  • Asks your permission to “Apply changes” before actually changing the files

  • Relatively budget-friendly: $15/month for 250 Claude Sonnet 4 requests**

  • OpenAI is showing interest in buying this company, which might only amplify its capabilities

** As Claude Sonnet 4 is currently the best model to write code, this model is often used to compare these tools

Cons:

  • Free trial is low-priority and rate-limited, making it feel sluggish

  • “hourly rate limit” in the free trial was already reached after the 3rd prompt

  • Some integrations needed some extra prompting / coding to fix issues

Pricing: $15/month (Pro) for 500 credits (= 250 Claude Sonnet 4 requests)

Final Verdict: Windsurf gives you the feeling that you're working with a stable, intelligent copilot that’s in control. It’s fast, efficient, and easily the most reliable choice for developers who want to get things done.

End Result with Windsurf:

🥈 The Runner-Up: Cursor

Output Quality: While Windsurf aced it from the get-go, Cursor took a bit of a false start in this particular test. When only giving it the reference image and a very basic prompt, Cursor created an application that rendered out as a full white screen. While trying to fix that, it first created an app that only displayed "Welcome to React". After a couple more iterations (and credits spent), the problem was solved. After that, the OpenAI integration and CosmosDB integration went smoothly, albeit not as smoothly as with Windsurf. A general feeling I had when using Cursor was that the output quality felt unpredictable. Sometimes I just added more context to the prompts that I used for Windsurf, just to prevent Cursor from “making another mistake again”. Eventually, it worked but it needed more babysitting. Of course, I just tested one use case, so this might also come down to luck of the draw.

Pros:

  • Asks your permission to execute important steps

  • Automatically installs dependencies and runs the code for you

  • Helpful prompts and step-by-step clarifications

  • Lets you debug and iterate like in a real IDE

Cons:

  • Often gets stuck in a loop trying to fix things. This can be really frustrating and time consuming

  • Sometimes breaks code that was previously working when prompting additional features

  • Folder structure was messy in my case: frontend and backend were scattered instead of logically grouped.

  • Less bang for more buck! 225 Claude Sonnet 4 requests for $20/month

Pricing: $20/month for 225 Sonnet 4 requests

Final Verdict: Cursor has potential, but feels less reliable and more error-prone than Windsurf. It’s a good tool for developers who are comfortable jumping in to fix what the AI breaks, but it's not the best for getting clean output from the start. If you’re currently working in Cursor, it’s a no-brainer to switch to Windsurf as it is cheaper per request and the output is more reliable.

End Result with Cursor:

🥉 The Honorable Mention: Lovable

Output Quality: In terms of output quality, Lovable was ok but not impressive. It always rendered something without errors, but the first version was the least like the image reference compared to Windsurf and Cursor. On top of that, the styling needed a couple of iterations before it actually looked nice. When asked to integrate GPT-4.1 and CosmosDB, I stumbled into bigger problems though. First of all, Lovable doesn’t support .env files. So natively storing credentials safely is not an option, unless you choose for their only Supabase data storage option. Talking about database connections: when I tried to connect CosmosDB to my Lovable project, it first complained and tried to make me use Supabase instead. When I insisted to use CosmosDB anyways, it created some code that tried to update my CosmosDB but did not succeed at it.

Pros:

  • Live preview is visible directly in the editor. No need for a separate browser tab

  • Past prompt results can be restored easily

  • UI is beginner-friendly and clean

  • You can export the codebase and edit the code manually

Cons:

  • No .env file support is a major security flaw when integrating APIs or services like OpenAI and CosmosDB

  • Supabase is the only officially supported backend. It typically fails when you try to use alternatives

  • Feels more like a no-code tool than a developer IDE (for some a plus, but for me a big minus)

  • Most expensive: $25/month for 100 credits

Pricing: $25/month

Final Verdict: Lovable is appealing for beginners, but it’s less suitable for professional developers who care about security, flexibility, and backend extensibility. If you want to go beyond Supabase or integrate with Azure securely, this tool isn’t enough.

End Result with Lovable:

💀 The Loser: GitHub Copilot

Output Quality: This is actually an unfair comparison as GitHub Copilot is not able to create a full project. It can assist you while writing code, but it doesn’t scaffold projects, generate folder structures, or integrate tools for you. You have to build the architecture yourself.

Pros:

  • Fast autocomplete for existing code

  • Helpful when making small additions or modifications

Cons:

  • Cannot build a full codebase from scratch

Pricing: $10–$19/month

Final Verdict: Copilot is fine inside an existing project, but for just $5 more you could use Windsurf, which can build an entire codebase. In this comparison, Copilot just doesn’t compete.

End Result with Github Copilot:

There is no demo for Github Copilot as Copilot didn’t create a full executable project.

Afterthoughts

I compared the 4 tools based on a very simple chatbot project. However, even from this simple project, a couple of things stood out to me that applied to all of them. Here are the most important ones.

Each tool chooses the folder structure and naming conventions of the project for you, which can make it hard to standardize projects across teams and projects. If you're planning to switch tools or collaborate with others, you may want to define your own starter repository and let the tool start writing from there.

Another important point to note is the fact that I just generated a very basic chatbot. However, the question still remains how well these tools perform for bigger code bases. Let’s say you are trying to code a multi-agent project with a complex folder structure: are these tools going to keep up? Or are they going to crumble under the complexity of large context windows and challenging requirements? More rigorous field tests are required here.

Next, these tools aren’t cheap, but their productivity potential is massive. What used to take days or weeks to develop, now takes minutes or hours. That is a massive leap in productivity that cannot be ignored. And if you’re in the market for an AI coding editor: Windsurf will give you the most “bang” and the least “buck”. However, these tools are evolving fast. So pick the one that suits your workflow, but don’t be afraid to experiment. The ROI is real.

If you have any questions regarding your own Generative AI journey, feel free to reach out!

Author: Joran Vergauwen

Joran Vergauwen

Hi, I’m Joran. I’m the AI Lead at Plainsight and one of the cofounders of the Generative AI Belgium meetup group, a vibrant community of over 3,000 AI enthusiasts. I have over five years of hands-on experience in AI, including time spent building my own GenAI startup. On that journey, I’ve learned that chasing the latest tech trends isn’t enough though. The real value comes from focusing on real problems and solving them effectively.

 

I’m always experimenting, trying out new tools, and exploring what’s possible with AI. My background has taught me to stay curious and practical, balancing innovation with impact.

 

When I’m not deep in code or strategy, you’ll find me reading, running, or playing the piano. Want to connect or chat about AI? Feel free to reach out!

 

Anxious to know what Plainsight could mean for you?

Next
Next

Generative AI in Azure: What to build, when and why