Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

Petals: distributed shared GPU running and fine-tuning of large language models, sharing GPU resources like a BitTorrent network

2024-11-20 896

General Introduction

Petals is an open source project developed by the BigScience Workshop to run Large Language Models (LLMs) through a distributed computing approach. Users can run and fine-tune LLMs such as Llama 3.1, Mixtral, Falcon, and BLOOM at home using consumer-grade GPUs or Google Colab.Petals uses a BitTorrent-like approach to distribute different parts of the model across multiple users' devices, enabling efficient inference and fine-tuning.

Petals: distributed inference running, fine-tuning large language models, sharing GPU resources like a BitTorrent network-1

 

Function List

  • Running Large Language Models: Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) and BLOOM (176B) models are supported.
  • distributed inference: Run the model through a distributed network with single-batch inference speeds of up to 6 tokens/sec (Llama 2 70B) and 4 tokens/sec (Falcon 180B).
  • Quick fine-tuning: Supports rapid user fine-tuning of models for a variety of tasks.
  • community-driven: Relying on a community of users to share GPU resources, users can contribute their own GPUs to increase the computational power of Petals.
  • Flexible API: Provides a flexible API similar to PyTorch and Transformers, with support for customizing paths and viewing hidden state.
  • Privacy: Data processing takes place over a public network and users can set up private networks to protect sensitive data.

 

Using Help

Installation and use

  1. Installation of dependencies::
    • Linux + Anaconda::
      conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
      pip install git+https://github.com/bigscience-workshop/petals
      python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct
      
    • Windows + WSL: Please refer to WikiThe
    • Docker::
      sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
      learningathome/petals:main \
      python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct
      
    • macOS + Apple M1/M2 GPUs::
      brew install python
      python3 -m pip install git+https://github.com/bigscience-workshop/petals
      python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct
      
  2. operational model::
    • Select any of the available models, for example:
      from transformers import AutoTokenizer
      from petals import AutoDistributedModelForCausalLM
      model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      model = AutoDistributedModelForCausalLM.from_pretrained(model_name)
      inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
      outputs = model.generate(inputs, max_new_tokens=5)
      print(tokenizer.decode(outputs[0]))
      
  3. Contribution GPU::
    • Users can increase the computational power of Petals by connecting a GPU. Model HubThe

Main function operation flow

  1. Select Model: Access Petals website, select the desired model.
  2. Loading Models: Load and run the model according to the installation steps above.
  3. fine-tuned model: Use the API provided by Petals to fine-tune the model for a variety of tasks.
  4. Generate Text: Generate text over a distributed network for chatbots and interactive applications.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish