Current Position:fig. beginning " AI Professional Tools

Windows-MCP: Open Source Tool for Lightweight AI Control of Windows Systems

2025-07-06

AI Professional Tools/AI Tool Library/desktop automation

45 0

https://github.com/CursorTouch/Windows-MCP

Windows-MCP is a lightweight open source project designed to allow AI agents to directly control the Windows operating system through a large-scale language model (LLM). It simplifies the setup process by eliminating the need to rely on traditional computer vision techniques or specific models. Users can achieve keyboard and mouse operations and capture window state through simple tools for tasks such as file navigation, application control and UI interaction. The project is available under the MIT license and the code is open and easily extensible for developers and AI enthusiasts. Its low-latency feature (about 1.5-2.3 seconds between actions) ensures smooth real-time interactions and low system resource usage, making it suitable for local operation.

Function List

Support for arbitrary Large Language Models (LLMs) without the need for specific models or traditional computer vision techniques.
Keyboard and mouse manipulation tools are provided to simulate user input.
Capture window and UI states and get screen content for AI analysis.
Execute PowerShell commands for system-level operations.
Supports document navigation and application control to automate daily tasks.
Provides low-latency real-time interactions with action intervals of about 1.5-2.3 seconds.
Open source and lightweight, open code, few dependencies, easy to install and extend.

Using Help

Installation process

Windows-MCP has a simple installation process for Windows users. The following are the detailed steps:

clone warehouse
Open a terminal or command prompt and enter the following command to clone the project repository:
```
git clone https://github.com/CursorTouch/Windows-MCP.git
cd Windows-MCP
```
Installation of dependencies
The project relies on the Python environment and a handful of libraries. Make sure that Python 3.8 or above is installed. Once in the project directory, run the following command to install the dependencies:
```
pip install -r requirements.txt
```
Configuration environment
If using a specific LLM (e.g. Google Gemini), the API key needs to be configured. To create a.envfile, add your API key, for example:
```
GOOGLE_API_KEY=your_api_key_here
```
usabilityload_dotenv()Load environment variables, refer to the project documentation for details.
Running Projects
Run the main script in the project directory:
```
python main.py
```
When the project starts, it initializes the AI agent and waits for the user to enter commands.

Main Functions

The core function of Windows-MCP is to control Windows system through AI agent. The following is the detailed operation procedure of the main functions:

1. Use of the LLM control system

Windows-MCP supports arbitrary LLMs, users just need to specify the model in the code. For example, use the Google Gemini model:

from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash')
agent = Agent(llm=llm, use_vision=True)

The user enters a natural language command (e.g., "open notepad"), and the AI agent parses the command and performs the corresponding action. The result of the operation will return text or screen status.
procedure::

Enter a command in the terminal, such as "Open File Explorer".
AI parses and calls the system API to automatically open the specified application.
Check the return result to confirm that the operation was successful.

2. Keyboard and mouse operation

Windows-MCP provides tools to simulate keyboard input and mouse clicks. For example, after opening an application, the AI can enter text or click a button.
Example of operation::

Instruction: "Type Hello World in Notepad".
The AI invokes the keyboard tool, opens Notepad and enters the text.
Users can view operation details through logs to ensure accuracy.
take note of: The delay of mouse operation is about 1.5-2.3 seconds, which is affected by the system load. Adjusting the command clarity can improve the success rate.

3. Capturing window and UI states

Windows-MCP can intercept the current window or screen content for AI analysis. For example, to check if a certain button appears on the interface.
procedure::

Enter the command, "Check desktop for Chrome icon".
AI captures the screen state, analyzes whether the icon exists or not, and returns the result.
If visual mode is enabled (use_vision=True), AI will provide more precise feedback in conjunction with image analysis.

4. Execute PowerShell commands

The Shell-Tool allows users to run PowerShell commands. For example, to list the contents of a folder:
Example of operation::

Command: "List files in the root directory of the C drive".
AI implementationdir C:\command, which returns a list of files.
take note of: PowerShell commands should be used with caution to avoid compromising system security. It is recommended to operate in a test environment.

5. Document navigation and application control

Windows-MCP supports file manipulation and application management. For example, opening specific folders or launching programs.
Example of operation::

Command: "Open the Documents folder on the D drive".
AI invokes the File Navigator tool to open the specified path.
The user can enter further commands such as "New Text File".

Featured Function Operation

Low-latency real-time interaction

With an action interval as low as 1.5 seconds, Windows-MCP is suitable for fast tasks. Users can enter commands continuously and the AI will execute them in sequence. Example:

Instruction 1: "Open Browser".
Instruction 2: "Search for AI tools".
The AI will complete the operations sequentially to maintain a smooth experience.

Open Source Extensions

Users can modify the code as needed. For example, to add custom tools or to support other LLMs.The project documentation provides an extension guide, located in theCONTRIBUTINGDocumentation.
procedure::

show (a ticket)toolsdirectory to add custom scripts.
updateagent.pyto integrate new tools.
Test modifications to ensure compatibility.

Precautions for use

Ensure network stability, especially when using online LLM.
Check system privileges, some operations require administrator privileges.
Check the GitHub repository regularly for updates to get the latest features.

application scenario

automated office work
Windows-MCP automatically opens office software, enters data or organizes files. For example, batch rename files or auto-fill Excel sheets for administrators or data analysts.
UI Testing
Developers can use Windows-MCP to test the application interface, simulate user clicks and inputs, and verify that the functionality works. Suitable for QA engineers.
AI development experiments
AI enthusiasts can use Windows-MCP to test the performance of LLM in system control and explore how AI interacts with the operating system.
Simplification of daily tasks
Ordinary users can complete complex operations, such as moving files in bulk or setting system parameters, with natural language commands to reduce the difficulty of operation.

QA

What LLMs are supported by Windows-MCP?
It supports any LLM, such as Google Gemini, OpenAI GPT, etc. Users only need to configure the corresponding model and API key in the code.
Need computer vision skills?
Not required.Windows-MCP simplifies the setup process by enabling control through system APIs and optional vision modes.
How do I ensure safe operation?
It is recommended to run in a test environment to avoid direct execution of high-risk PowerShell commands. Check for code and command clarity.
What about high latency?
Latency is typically 1.5-2.3 seconds. If it is too high, check the system load or LLM inference speed and optimize the instruction formulation.

AI open source project MCP Services

Chief AI Sharing Circle " Windows-MCP: Open Source Tool for Lightweight AI Control of Windows Systems Posted on 2025-07-06, please contact us if you find the URL is out of date, or inaccessible.

0Bookmarked

0kudos

Windows-MCP: Open Source Tool for Lightweight AI Control of Windows Systems

Function List

Using Help

Installation process

Main Functions

1. Use of the LLM control system

2. Keyboard and mouse operation

3. Capturing window and UI states

4. Execute PowerShell commands

5. Document navigation and application control

Featured Function Operation

Low-latency real-time interaction

Open Source Extensions

Precautions for use

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Recommended Tools

New Releases

Windows-MCP: Open Source Tool for Lightweight AI Control of Windows Systems

Function List

Using Help

Installation process

Main Functions

1. Use of the LLM control system

2. Keyboard and mouse operation

3. Capturing window and UI states

4. Execute PowerShell commands

5. Document navigation and application control

Featured Function Operation

Low-latency real-time interaction

Open Source Extensions

Precautions for use

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Recommended Tools

New Releases

Quick query station AI tool