This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.
Commands Used:
pip3 install optimum
pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
system_message = "You are an expert at bathroom renovations."
prompt = """
Renovate the following old bathroom:
I have a 25 year old house with an old bathroom. I want to renovate it completely.
Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.
"""
prompt_template=f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
repetition_penalty=1.1
)
print(pipe(prompt_template)[0]['generated_text'])
1 comment:
Which CUDA version have you loaded?
I am getting a RuntimeError: "Te detected CUDA version 912.3) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions."
Thanks.
Post a Comment