[gemma2-2b] 모델 학습 및 huggingface/streamlit 배포

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

오구의코딩모험

[gemma2-2b] 모델 학습 및 huggingface/streamlit 배포 본문

AI/Streamlit

[gemma2-2b] 모델 학습 및 huggingface/streamlit 배포

오구.cpp 2024. 10. 3. 23:25

획기적이고 새로운 개방형 모델인 google의 Gemma 2를

금융 데이터를 통한 모델 학습을 시키고

학습 시킨 모델을 huggingface에 배포하여 streamlit을 통한 Web app을 구동시켜보았습니다!

학습 환경은 Colab 환경(GPU : A100)에서 진행하였으며,

학습 데이터는 huggingface의 아래의 링크 데이터를 학습하였다.

https://huggingface.co/datasets/nayohan/finance-alpaca-ko

nayohan/finance-alpaca-ko · Datasets at Hugging Face

경제 상황과 최근 졸업생들의 가까운 미래가 험난할 가능성을 고려할 때, 인생이 혼란스러운 동안에는 대규모 구매를 보류하는 것이 좋습니다. 여기에는 새 차와 주택 구입이 모두 포함됩니다.

huggingface.co

해당 데이터는 약 7만 행의 데이터 셋이며,

https://huggingface.co/datasets/gbharti/finance-alpaca

gbharti/finance-alpaca · Datasets at Hugging Face

My answer is specific to the US because you mentioned the Federal Reserve, but a similar system is in place in most countries. Do interest rates increase based on what the market is doing, or do they solely increase based on what the Federal Reserve sets t

huggingface.co

위 링크의 금융 데이터를 한국어로 번역한 데이터이다.

먼저

아래와 같이 데이터셋을 불러온 후,

학습시킬 프롬프트 형식에 맞춰 데이터 셋을 변경해준다.

from datasets import load_dataset

dataset = load_dataset("nayohan/finance-alpaca-ko")

# 'prompt' 필드 생성 함수
def format_instruction(example):

    # 추가 컨텍스트(input 필드)가 있는 경우
    if example['input'] and len(example['input']) > 0:
        text = f"""<start_of_turn>user\n{example["instruction"]}\n{example["input"]}<end_of_turn>\n<start_of_turn>model\n{example["output"]}<end_of_turn>"""
    # input 필드가 없는 경우
    else:
        text = f"""<start_of_turn>user\n{example["instruction"]}<end_of_turn>\n<start_of_turn>model\n{example["output"]}<end_of_turn>"""

    return {'prompt': text}

# 데이터셋의 prompt 필드를 업데이트
dataset = dataset.map(format_instruction)

파인튜닝에 필요한 모듈들과 허깅페이스의 Gemma-2 모델을 불러오기 위해

notebook에 로그인을 해준다.

notebook 로그인에 API Key가 필요한데,

Huggingface의 settings > Acess token에서 발급할 수 있다.

import torch

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from huggingface_hub import notebook_login

notebook_login()

2b 모델을 사용해도 colab의 GPU 성능으로는 학습시키기에 자원이 부족하다.

양자화를 이용하여 연산을 줄여 학습시킨다.

양자화를 사용해도 GPU 메모리가 부족한 경우, LoraConfig의 r 값을 조정해준다.

bnb_config = BitsAndBytesConfig(load_in_4bit=True,
                                bnb_4bit_quant_type="nf4",
                                bnb_4bit_compute_dtype=torch.bfloat16)


model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it",
                                             quantization_config=bnb_config,
                                             device_map={"":0})


tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it", add_eos_token=True)
tokenizer.pad_token = tokenizer.eos_token

torch.cuda.empty_cache()

lora_config = LoraConfig(
    r=32,
    target_modules=['o_proj', 'q_proj', 'up_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj'],
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    dataset_text_field="prompt",
    peft_config=lora_config,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=10,
        output_dir="outputs",
        optim="paged_adamw_8bit",
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

trainer.train()

학습한 모델을 허깅페이스에 push하면,

huggingface에 배포되어 있는 내 모델을 확인할 수 있다.

model.push_to_hub("gemma-2-finance")
tokenizer.push_to_hub("gemma-2-finance")

이젠 huggingface의 space를 활용하여

streamlit으로 Web app을 만들어보겠다.

먼저 app.py를 작성하여,

웹 페이지의 UI와 모델을 로컬 다운 및 추론을 진행할 수 있게 구성한다.

import streamlit as st
import torch
from transformers import (
    pipeline,
    AutoTokenizer,
    AutoModelForCausalLM,
    )
from PIL import Image

st.title("주식 고수가 되고싶어요.")

@st.cache
def get_completion(query: str, model, tokenizer):
    prompt_template = """<start_of_turn>user
    {query}
    <end_of_turn>
    <start_of_turn>model
    """
    prompt = prompt_template.format(query=query)
    encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
    model_inputs = encodeds.to("cuda:0")
    # Generate the response with repetition penalty and sampling strategies
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=256,
        repetition_penalty=1.2,  # Apply repetition penalty
        top_k=50,               # Use top-k sampling
        top_p=0.9,              # Use nucleus sampling
        temperature=0.7         # Control randomness
    )

    decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    return decoded

def main():

    model_id = "kyungbae/gemma-2-finance"
    model = AutoModelForCausalLM.from_pretrained("kyungbae/gemma-2-finance", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)
    tokenizer.padding_side = 'right'
    return model, tokenizer


if __name__ == "__main__":
    model, tokenizer = main()

    while():
        input_text = st.text_input("Enter some text 👇")

        if input_text:
            with st.spinner('탐색 중...'):
                output_text = get_completion(query=input_text, model=model, tokenizer=tokenizer)

            st.write("answer", output_text)