Alternate URL: www.kdjingpai.com
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

2024-11-12 1.2 K

General Introduction

Ichigo is an open source, real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo aims to be an open-source data, open-weighted, native-device voice assistant, similar to Siri.The project is open for partners to join in the crowdsourcing of speech datasets.

Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

 

Function List

  • Real-time speech recognition: The ability to process and understand user voice input in real time.
  • multicast dialogue capability: Supports multiple rounds of dialog and is able to maintain context in a conversation.
  • noise management: The ability to refuse to process non-speech audio inputs through training improves the user experience.
  • Open source and scalable: The project code and model weights are completely open source and users are free to download and extend them.
  • local deployment: Supports deployment on local devices to protect user privacy.

 

Using Help

Installation process

  1. environmental preparation ::
    • Ensure that Python 3.8 or above is installed.
    • Install the necessary dependency libraries:pip install -r requirements.txtThe
  2. Download model ::
    • Use the following command to download the Ichigo model:
      git clone https://github.com/homebrewltd/ichigo.git
      cd ichigo
      pip install -e .
      
  3. Configuring the dataset ::
    • Download the required dataset from HuggingFace and set the dataset path in the configuration file.
  4. Launch Demo ::
    • Start the local Gradio Demo with the following command:
      python demo.py --use-4bit --use-8bit
      

Usage Process

  1. Starting services ::
    • After running the above command, visit the locally provided URL to access Ichigo's Web UI interface.
  2. voice input ::
    • In the Web UI interface, click the microphone icon to start recording, and the system will process and display the speech recognition results in real time.
  3. many rounds of dialogue ::
    • The system supports multiple rounds of dialog, where the user can continuously input speech and the system will maintain the context to understand and respond.
  4. noise management ::
    • The system is trained to recognize and reject the processing of non-speech audio inputs to ensure the accuracy of the recognition results.
  5. Custom extensions ::
    • Users can modify the code and model as needed to add new features or improve existing ones.

Detailed Operation Procedure

  1. Download and Installation ::
    • Visit Ichigo's GitHub page and follow the installation process to download and install the necessary dependencies and models.
  2. Configuration and startup ::
    • According to the configuration file provided by the project, set the dataset path and model parameters to start the local service.
  3. Using the Web UI ::
    • Experience Ichigo's real-time speech recognition and multi-round dialog features by performing voice input and interaction through the Web UI interface.
  4. Extension and customization ::
    • Understand the architecture and workings of the system based on project documentation and code comments for custom extensions.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish