???? HuggingFace • ???? ModelScope • ✡️ WiseModel

???? Discord • ???? Twitter • ???? WeChat

???? Paper • ???? FAQ • ???? Learning Hub

Intro

Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension.

Yi-1.5 comes in 3 model sizes: 34B, 9B, and 6B. For model details and benchmarks, see Model Card.

News

2024-05-13: The Yi-1.5 series models are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.

Requirements

Make sure Python 3.10 or a later version is installed.

Set up the environment and install the required packages. pip install -r requirements.txt

Download the Yi-1.5 model from Hugging Face, ModelScope, or WiseModel.

Quick Start

This tutorial runs Yi-1.5-34B-Chat locally on an A800 (80G).

from transformers import AutoModelForCausalLM , AutoTokenizer model_path = '<your-model-path>' tokenizer = AutoTokenizer . from_pretrained ( model_path , use_fast = False ) # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM. model = AutoModelForCausalLM . from_pretrained ( model_path , device_map = "auto" , torch_dtype = 'auto' ). eval () # Prompt content: "hi" messages = [ { "role" : "user" , "content" : "hi" } ] input_ids = tokenizer . apply_chat_template ( conversation = messages , tokenize = True , add_generation_prompt = True , return_tensors = 'pt' ) output_ids = model . generate ( input_ids . to ( 'cuda' )) response = tokenizer . decode ( output_ids [ 0 ][ input_ids . shape [ 1 ]:], skip_special_tokens = True ) # Model response: "Hello! How can I assist you today?" print ( response )

Deployment

Prerequisites: Before deploying Yi-1.5 models, make sure you meet the software and hardware requirements.

vLLM

Prerequisites: Download the lastest version of vLLM.

Start the server with a chat model. python -m vllm.entrypoints.openai.api_server --model 01-ai/Yi-1.5-9B-Chat --served-model-name Yi-1.5-9B-Chat Use the chat API.

HTTP curl http://localhost:8000/v1/chat/completions \ -H " Content-Type: application/json " \ -d ' { "model": "Yi-1.5-9B-Chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"} ] } '

Python client from openai import OpenAI # Set OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" client = OpenAI ( api_key = openai_api_key , base_url = openai_api_base , ) chat_response = client . chat . completions . create ( model = "Yi-1.5-9B-Chat" , messages = [ { "role" : "system" , "content" : "You are a helpful assistant." }, { "role" : "user" , "content" : "Tell me a joke." }, ] ) print ( "Chat response:" , chat_response )

Web Demo

python demo/web_demo.py -c <your-model-path>

You can use LLaMA-Factory, Swift, XTuner, and Firefly for fine-tuning. These frameworks all support fine-tuning the Yi series models.

API

Yi APIs are OpenAI-compatible and provided at Yi Platform. Sign up to get free tokens, and you can also pay-as-you-go at a competitive price. Additionally, Yi APIs are also deployed on Replicate and OpenRouter.

License

The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license.

[ Back to top ⬆️ ]