Current Position:fig. beginning " AI hands-on tutorials

AI Programming Tools Cross-Review: Who Really Does Prompt to High-Fidelity Prototyping?

2025-06-29

AI programming has become one of the most crowded tracks in this current wave of AI. From Cursor,Windsurf until (a time) v0 by Vercel, numerous Programming Agents have sprung up. Behind their rise is the Anthropic Claude,OpenAI GPT,Google Gemini The leap in code generation capability of the underlying big models such as the

However, as of June 2025, what are the true capabilities of these AI programming tools? How big is the gap in code generation quality between different models? This review will conduct a side-by-side comparison of the mainstream AI programming products on the market and their integrated models through a unified, real-world development requirement, in order to provide an intuitive and informative observation.

Benchmarking: an app called My MBTI Circle of Friends

In order to effectively test the combined capabilities of these tools, we designed a task of moderate complexity: generating a high-fidelity design prototype of an application called "My MBTI AI Circle of Friends".

The core concept of the app is to provide users with emotional companionship. Users can use cards to record life moments on the timeline like writing a diary, and a series of AI friends with different MBTI personalities built into the system will respond to the user's posts according to their own "persona". This task not only tests the AI's understanding of functional logic, but also puts forward clear requirements for UI design, code structure and front-end engineering capabilities.

Here is the core Prompt used for this review:

我想开发一款名为“我的MBTI AI朋友圈”的中文情感陪伴 App，功能需求如下：
1.  **核心功能**：用户可以通过卡片+时间线的交互方式，记录想法、计划、待办事项、情绪、链接等生活点滴。本质上，它首先是一款 AI 日记软件。
2.  **AI 交互**：系统预设了一系列不同 MBTI 性格的 AI Agent。这些 Agent 会根据各自的性格特点，对用户的记录做出不同的回应。
3.  **社交关系**：用户可以选择并关注不同的 AI Agent。
4.  **分享功能**：用户可以分享自己的记录以及 AI Agent 的回应。
5.  **核心价值**：通过 AI 的回应和分享功能，为用户提供情感陪伴。
现在，需要输出该 App 的高保真原型图。请通过以下方式完成所有界面的原型设计，并确保这些原型可以直接用于前端开发：
1.  **用户体验分析**：分析 App 的主要功能和用户需求，确定核心交互逻辑。
2.  **产品界面规划**：作为产品经理，定义关键界面，确保信息架构合理。
3.  **高保真 UI 设计**：作为 UI 设计师，设计贴近真实 iOS/Android 设计规范的界面，使用现代化的 UI 元素，使其具有良好的视觉体验。
4.  **HTML 原型实现**：使用 HTML + Tailwind CSS（或 Bootstrap）生成所有原型界面，并使用 FontAwesome（或其他开源 UI 组件库）让界面更加精美。代码文件需要拆分，保持结构清晰。
5.  **文件结构要求**：
-   每个界面作为一个独立的 HTML 文件存放，例如 `home.html`、`profile.html`、`settings.html`。
-   `index.html` 作为主入口，不直接写入所有界面的 HTML 代码，而是使用 `iframe` 的方式嵌入其他 HTML 页面，并将所有页面在 `index` 页面中平铺展示，而非通过链接跳转。
6.  **真实感增强**：
-   界面尺寸模拟 iPhone 15 Pro，并进行圆角化处理。
-   使用真实的 UI 图片（可从 Unsplash、Pexels 等图库选取），而非占位符。
-   添加模拟 iOS 顶部状态栏和底部 TabBar 导航栏。
请按照以上要求，在 `design-trae-DeepSeekR1` 文件夹中生成完整的 HTML 代码。

The challenge of this Prompt is that it requires the AI to be more than just a code generator, but also to play three roles: product manager, UI/UX designer, and front-end engineer. This is especially true for iframe The requirement for tiled presentations and separate file structures is a direct test of AI's ability to organize code at the project level.

Round 1: Native IDE Integration Tool Reviews

We first looked at AI programming plugins integrated in local IDEs, mainly in the form of the Cursor cap (a poem) Trae The advantage of this type of tool is that it is deeply integrated into the developer's existing workflow. The advantage of such tools is their deep integration into the developer's existing workflow.

Cursor

Cursor + Claude 3.5 Sonnet

Cursor & claude-3.5-sonnet

Score: 60
Evaluation: Qualified rough housing

Cursor together with Claude 3.5 Sonnet The combination basically accomplished the task. On the functional level, it accurately implements most of the core requirements, but generates a redundant and unopenable page.On the UI side, the interface skeleton is complete, but the icons fail to load from the web as required, and the overall style is plain. Although there are flaws, as a first version of the prototype, it meets the passing line.

Cursor + Claude 4 Sonnet

Cursor & claude-4-sonnet

Score: 90
Comments: Delivered with finishes

That's an impressive combination.Claude 4 Sonnet The performance of this project far exceeded the expectations, the functions were realized accurately, and even added interactive elements such as post type and mood selection in the "Create" page. the UI design was beautiful, the material was filled in, and the visual effect was excellent. The UI design is beautiful, the material is well-filled, and the visual effect is excellent. The degree of completion is so high that it can almost be directly delivered to the front-end for subsequent development.

Cursor + Gemini 2.5 Pro

Claude & Gemini-2.5-pro 展开

Score: 59
Evaluation: short set with critical deficiencies

Gemini 2.5 Pro The performance of Prompt is not as good as it could be. While the basic functionality and UI elements are fine, it fails to achieve the "direct tiling" effect explicitly called for in the Prompt, and instead generates pages that require manual clicks to switch between them. This is a critical functionality flaw that severely impacts the usability of the prototype, causing it to fail.

Cursor + GPT-4o

Cursor o3

Score: 70
Comments Off on Refreshingly Simple

GPT-4o The output of (sic o3) is functionally accurate in realizing the requirements, the interface is complete and the icons are displayed properly. Its design style is crisp and minimalist, and the overall effect is superior to that of the Claude 3.5 Sonnet, which could be considered a good quality prototype for a short set.

Trae

Trae + Claude 3.5 Sonnet

Trae & claude-3.5-sonnet

Score: 80
Comments Off on Exquisite Simplicity

Interestingly, the same Claude 3.5 Sonnet Modeling, in the Trae platform than on the Cursor The UI is much more beautiful and rich in material. Not only does it implement all the features accurately, but the UI is much more aesthetically pleasing and rich in material, although the design is a little less impressive than the Claude 4 Sonnet version, but already a quality delivery.

Trae + Claude 3.7 Sonnet

Trae & claude-3.7-sonnet 展开

Score: 59
Evaluation: same critical flaws

Surprisingly, the updated Claude 3.7 Sonnet version performance has instead regressed. With the Gemini 2.5 Pro made the same mistake, it failed to realize iframe Flat display, which is also a fatal feature flaw. Despite the aesthetically pleasing UI design, the failure of the core requirements leaves it to be used only in conjunction with the Gemini The versions of the book are juxtaposed.

Trae + Claude 4 Sonnet

Score: 90
Verdict: Delivered with finishes again

Trae together with Claude 4 Sonnet The combination of the model again proves to be decisive. Its output is similar to that of the Cursor The platform's version is also excellent, with full functionality, a beautiful UI, and even the addition of attachment uploading details to the creation function. Once again, this proves that Claude 4 Sonnet leadership in such tasks.

Trae + Gemini 2.5 Pro

Trae & gemini-2.5-pro 展开

Score: 50
Verdict: Sketchy and Flawed Roughs

Gemini 2.5 Pro exist Trae The performance on is even better than on the Cursor Worse on top. In addition to the core flaw of failing to achieve a tiled display, the interface suffers from missing icons and the overall style is very austere.

Trae + DeepSeek R1

Score: 40
Evaluation: Abandoned Rough

DeepSeek R1 The performance is the worst in the local IDE group. Not only does it fail to tile, but even the basic page switching is buggy, with the error page appearing after clicking Tab. The functionality and UI were incomplete, and the prototype was basically unusable.

Round 2: Cloud-based AI Programming Platform Review

Next, we turn to cloud-based products. These are tools that run directly in the browser without local installation and represent another AI programming paradigm.

Replit

结果展开

Score: 50
Verdict: flawed brief

Old school cloud IDE Replit (model unknown) was mediocre in this test. It also fails to fulfill the core requirement of tiled display. Although the basic framework and icons of the interface are retained, the overall design is rudimentary and there is a clear gap with the generation quality of mainstream AI tools.

Lovable

Lovable结果

Score: 95
REVIEW: Stunning custom hardcover

Lovable The result is eye-opening and the biggest dark horse of the review. It perfectly implements all the requirements in Prompt, including the iframe Tile display. Functionally, it not only realizes all the basic functions, but also adds advanced options such as privacy settings on the creation page. the UI is beautifully designed and rich in materials, and the visual experience even surpasses that of the Claude 4 Sonnet version of the model. This shows that an AI product that is deeply optimized for a specific task (e.g., front-end prototype generation) can outperform a general-purpose model.

v0

Launched by Vercel v0 is a tool that focuses on front-end component generation. We tested its v0-1.5-md cap (a poem) v0-1.5-lg Two models.

v0 + v0-1.5-md

v0-1.5-md结果展开

Score: 55
Evaluation: Flawed rough

v0-1.5-md The model's performance is average. It also fails to achieve a tiled display and has missing interface icons. Although the design is minimalist, its functionality flaws and lack of content diminish its value as a prototype.

v0 + v0-1.5-lg

v0-1.5-lg结果展开

Score: 65
Evaluation: Qualified rough housing

parametric v0-1.5-lg Model performance has improved. This time it correctly implements the tiled display and functions largely accurately. the UI interface is complete, but icons still need to be filled manually. The overall effect is similar to the Cursor + Claude 3.5 Sonnet The combination is comparable to that of a competent, first edition prototype awaiting further renovation.

Bolt.new

bolt产出展开

Score: 40
Evaluation: Abandoned Rough

Bolt.new matching performance of DeepSeek R1 It's just as bad. Not only does it fail to achieve a tiled presentation, the resulting UI elements suffer from severe layout crunching and missing content, and are basically unusable.

Reviews scorecard

Although this review is based on a single use case and the results are somewhat randomized, it clearly reveals the current landscape of the AI programming tools market.

offerings	mould	score	evaluation
Cursor	`Claude 3.5 Sonnet`	60	Qualified rough housing
	`Claude 4 Sonnet`	90	Delivery of finishes
	`Gemini 2.5 Pro`	59	Critical deficiencies exist
	`GPT-4o`	70	Crisp and simple
Trae	`Claude 3.5 Sonnet`	80	Exquisite Simplicity
	`Claude 3.7 Sonnet`	59	Critical deficiencies exist
	`Claude 4 Sonnet`	90	Delivery of finishes
	`Gemini 2.5 Pro`	50	Sketchy and flawed
	`DeepSeek R1`	40	Abandoned blanks
Replit	uncharted	50	Flawed brief
Lovable	uncharted	95	Stunning custom finishes!
v0	`v0-1.5-md`	55	Defective blanks
	`v0-1.5-lg`	65	Qualified rough housing
Bolt.new	uncharted	40	Abandoned blanks

The final evaluation results lead to a clear conclusion: in tasks like front-end prototyping, the choice of model is central to determining the quality of the final output.Claude 4 Sonnet Demonstrates a remarkable combination of skills, while the likes of Lovable Such verticals offer the best experience through deep optimization.

May not be reproduced without permission:Chief AI Sharing Circle " AI Programming Tools Cross-Review: Who Really Does Prompt to High-Fidelity Prototyping?