Windows-MCP is a lightweight open source project designed to allow AI agents to directly control the Windows operating system through a large-scale language model (LLM). It simplifies the setup process by eliminating the need to rely on traditional computer vision techniques or specific models. Users can achieve keyboard and mouse operations and capture window state through simple tools for tasks such as file navigation, application control and UI interaction. The project is available under the MIT license and the code is open and easily extensible for developers and AI enthusiasts. Its low-latency feature (about 1.5-2.3 seconds between actions) ensures smooth real-time interactions and low system resource usage, making it suitable for local operation.
Function List
- Support for arbitrary Large Language Models (LLMs) without the need for specific models or traditional computer vision techniques.
- Keyboard and mouse manipulation tools are provided to simulate user input.
- Capture window and UI states and get screen content for AI analysis.
- Execute PowerShell commands for system-level operations.
- Supports document navigation and application control to automate daily tasks.
- Provides low-latency real-time interactions with action intervals of about 1.5-2.3 seconds.
- Open source and lightweight, open code, few dependencies, easy to install and extend.
Using Help
Installation process
Windows-MCP has a simple installation process for Windows users. The following are the detailed steps:
- clone warehouse
Open a terminal or command prompt and enter the following command to clone the project repository:git clone https://github.com/CursorTouch/Windows-MCP.git cd Windows-MCP
- Installation of dependencies
The project relies on the Python environment and a handful of libraries. Make sure that Python 3.8 or above is installed. Once in the project directory, run the following command to install the dependencies:pip install -r requirements.txt
- Configuration environment
If using a specific LLM (e.g. Google Gemini), the API key needs to be configured. To create a.env
file, add your API key, for example:GOOGLE_API_KEY=your_api_key_here
usability
load_dotenv()
Load environment variables, refer to the project documentation for details. - Running Projects
Run the main script in the project directory:python main.py
When the project starts, it initializes the AI agent and waits for the user to enter commands.
Main Functions
The core function of Windows-MCP is to control Windows system through AI agent. The following is the detailed operation procedure of the main functions:
1. Use of the LLM control system
Windows-MCP supports arbitrary LLMs, users just need to specify the model in the code. For example, use the Google Gemini model:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash')
agent = Agent(llm=llm, use_vision=True)
The user enters a natural language command (e.g., "open notepad"), and the AI agent parses the command and performs the corresponding action. The result of the operation will return text or screen status.
procedure::
- Enter a command in the terminal, such as "Open File Explorer".
- AI parses and calls the system API to automatically open the specified application.
- Check the return result to confirm that the operation was successful.
2. Keyboard and mouse operation
Windows-MCP provides tools to simulate keyboard input and mouse clicks. For example, after opening an application, the AI can enter text or click a button.
Example of operation::
- Instruction: "Type Hello World in Notepad".
- The AI invokes the keyboard tool, opens Notepad and enters the text.
- Users can view operation details through logs to ensure accuracy.
take note of: The delay of mouse operation is about 1.5-2.3 seconds, which is affected by the system load. Adjusting the command clarity can improve the success rate.
3. Capturing window and UI states
Windows-MCP can intercept the current window or screen content for AI analysis. For example, to check if a certain button appears on the interface.
procedure::
- Enter the command, "Check desktop for Chrome icon".
- AI captures the screen state, analyzes whether the icon exists or not, and returns the result.
- If visual mode is enabled (
use_vision=True
), AI will provide more precise feedback in conjunction with image analysis.
4. Execute PowerShell commands
The Shell-Tool allows users to run PowerShell commands. For example, to list the contents of a folder:
Example of operation::
- Command: "List files in the root directory of the C drive".
- AI implementation
dir C:\
command, which returns a list of files.
take note of: PowerShell commands should be used with caution to avoid compromising system security. It is recommended to operate in a test environment.
5. Document navigation and application control
Windows-MCP supports file manipulation and application management. For example, opening specific folders or launching programs.
Example of operation::
- Command: "Open the Documents folder on the D drive".
- AI invokes the File Navigator tool to open the specified path.
- The user can enter further commands such as "New Text File".
Featured Function Operation
Low-latency real-time interaction
With an action interval as low as 1.5 seconds, Windows-MCP is suitable for fast tasks. Users can enter commands continuously and the AI will execute them in sequence. Example:
- Instruction 1: "Open Browser".
- Instruction 2: "Search for AI tools".
The AI will complete the operations sequentially to maintain a smooth experience.
Open Source Extensions
Users can modify the code as needed. For example, to add custom tools or to support other LLMs.The project documentation provides an extension guide, located in theCONTRIBUTING
Documentation.
procedure::
- show (a ticket)
tools
directory to add custom scripts. - update
agent.py
to integrate new tools. - Test modifications to ensure compatibility.
Precautions for use
- Ensure network stability, especially when using online LLM.
- Check system privileges, some operations require administrator privileges.
- Check the GitHub repository regularly for updates to get the latest features.
application scenario
- automated office work
Windows-MCP automatically opens office software, enters data or organizes files. For example, batch rename files or auto-fill Excel sheets for administrators or data analysts. - UI Testing
Developers can use Windows-MCP to test the application interface, simulate user clicks and inputs, and verify that the functionality works. Suitable for QA engineers. - AI development experiments
AI enthusiasts can use Windows-MCP to test the performance of LLM in system control and explore how AI interacts with the operating system. - Simplification of daily tasks
Ordinary users can complete complex operations, such as moving files in bulk or setting system parameters, with natural language commands to reduce the difficulty of operation.
QA
- What LLMs are supported by Windows-MCP?
It supports any LLM, such as Google Gemini, OpenAI GPT, etc. Users only need to configure the corresponding model and API key in the code. - Need computer vision skills?
Not required.Windows-MCP simplifies the setup process by enabling control through system APIs and optional vision modes. - How do I ensure safe operation?
It is recommended to run in a test environment to avoid direct execution of high-risk PowerShell commands. Check for code and command clarity. - What about high latency?
Latency is typically 1.5-2.3 seconds. If it is too high, check the system load or LLM inference speed and optimize the instruction formulation.