แแพแขแแแธแแ แแถ Fine-tuning?
Fine-tuning แแบแแถแแ
แแ
แแแแแแแฝแแแแแปแ Machine Learning แแแแแแแแแพแแพแแแแธแแแแแแขแแแผแแแแแแแแถแ Train แแฝแ
แ แพแ แแแแแแแพแแแแถแแแแทแแแแแแแแถแแแแถแแแแแแแถแแแแถแแแทแ
แแ
แฌแแแแแถแแแแถแแแแฝแแ แแถแแถแแแแพแแแถแแแแแถแแแแแแแแแแแฝแแแแผแแแแแแแแถแแแแแถแแ แแพแแแแธแฑแแแแถแขแถแ
แแแแพแแแถแแแถแแแแขแแแแแพแแแพแแถแแแทแ
แแ
แแถแแแแถแแแแถแแฝแแ
แ แแแปแ แแแแถแแแแแ Fine-tuning:
- แแถแแแแแพแแแแถแแแแแผแแแแแแแแถแแแแแแปแแแแแแถแแแถแแปแ: Fine-tuning แ แถแแแแแแพแแแถแแฝแแแแผแแแแแแแแถแแแแแแปแแแแแแถแแแฝแ แ แพแแแพแแแแปแแแทแแแแแแแแแ
- แแถแแแแแแปแแแแแแถแแแแแแแ: แแแผแแแแแแแแแแผแแแถแแแแแแปแแแแแแถแแแแแแแแแพแแแแปแแแทแแแแแแแแผแ แแถแแแปแ แแแปแแแแแแถแแแแถแแแแแแแถแแแแถแแแทแ แแ แฌแแแแแถแแแแถแแแ
- แแถแแแแแแถแ แแแแแแนแแแพแ: Fine-tuning แแแแแถแ แแแแแแนแแแผแแ แแแแแแผแแแแแถแแแแแแธแแปแ แแแแแแแแแแแแแแแแฝแแแถแแแแแถแแแแถแแแทแ แแ แแแแธแ
- แแถแแแแแแแแแ
แแแแแถแ: แแถแแแแผแแแถแแแแแแแแถ แแทแแแทแแแแแแแแทแ
แแถแแแถแแแแแแปแแแแแแถแแแแผแแแแแธแแผแแแแ
แ แแแปแขแแแธแแถแแแถ Fine-tuning แแถแแแถแแแแแแถแแ?
- แแถแแแแแแพแแแแแแทแแแแแถแ: แแถแขแถแ แแแแแพแแแแแแทแแแแแถแแแแแแแแผแแแแแพแแถแแแทแ แแ แแถแแแแถแแแ
- แแถแแแแแแแแแแฝแแแ แแนแแแแ: แแถแขแแปแแแแถแแฑแแแแแผแแแแแแแแแแแแฝแแแ แแนแแแถแแถ แฌแแถแแแแแแ แแแแถแแแแถแแแแแงแแแแถแ แแแแแฌแแแแแถแแฝแแ
- แแถแแแแแพแแแแถแแแแแแถแแแทแ : แแถแแถแแแแแแทแแแแแถแแแถแแแถแแแแแแปแแแแแแถแแแแผแแแแแธแแผแแแแ
- แแถแแขแแปแแแแแแฟแ: แแถแขแแปแแแแถแแฑแแแขแแแแขแแทแแแแแแแแแแแพแแแแผแแแแแแแแถแแแแแแทแแแแแถแแแแแแแแแแปแแแแแแแแแแแธแ
Fine-tuning แแแแผแแแถแแแแแพแแแแถแแแแแถแแแผแแแแผแแถแแแแแปแแแถแแแแแแแถแแแธแ แแถแแทแแแแแแแปแ Natural Language Processing (NLP) แแแแแถแแแแถแแแทแ
แแ
แแผแ
แแถแแถแแแทแแถแแขแถแแแแแแ แแถแแแแแพแแแแแฝแ แแทแแแถแแแแแแแแแถแแถแ
แงแแถแ แแแแแผแ
- แแแแปแแแถแแแแแถแแแแ แแพแแแนแแแแแแแถแแแแแแพแแแถแแแแแถแ Fine-tune แแผแแแแผแแแ GPT-2 แแพแแแแปแแแทแแแแแแแแแแถแแแแแแฝแ แแแแแแแพแแแแถแแ Library Hugging Face Transformersแ
- แแแแแถแแแงแแถแ แแแแแแ แแพแแแนแแแแแพแแแแปแแแแแแแแแปแแแทแแแแแแ “emotion” แแธ Library แแแแปแแแทแแแแแแแแแแ Hugging Faceแ แแแแปแแแทแแแแแแแแแแแถแแแถแแขแแแแแแแแแธแแแแแแถแแแถแแแแแแถแแแถแแฝแแขแถแแแแแแแ
- แแถแแแแผแ แแผแแแแกแพแ Library แแแแแแแผแแแถแแ
!pip install transformers datasets torch
!pip install transformers[torch] -U
!pip install accelerate -U
แแแแแบแแถแแแแแแธแ Python แแแแแแแ แถแแแธแแแแพแแแถแ Fine-tuningแ
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# Load pre-trained model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
# Set padding token
tokenizer.pad_token = tokenizer.eos_token
# Load and preprocess the dataset
dataset = load_dataset("emotion", split="train")
def preprocess_function(examples):
return tokenizer([f"Emotion: {text}" for text in examples["text"]], truncation=True, padding="max_length", max_length=64)
tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=dataset.column_names)
# Convert to PyTorch tensors
tokenized_dataset.set_format("torch")
# Create DataCollator
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False
)
# Define training arguments
training_args = TrainingArguments(
output_dir="./gpt2-emotion-finetuned",
overwrite_output_dir=True,
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=10_000,
save_total_limit=2,
)
# Create Trainer
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=tokenized_dataset,
)
# Fine-tune the model
trainer.train()
# Save the fine-tuned model
model.save_pretrained("./gpt2-emotion-finetuned")
tokenizer.save_pretrained("./gpt2-emotion-finetuned")
แแแแแถแแแแธแแถแ Fine-tune แขแแแแขแถแ แแแแพแแแผแแแแแพแแแแธแแแแแพแแขแแแแแแแแแแแขแแแแพแขแถแแแแแแแ
# Load the fine-tuned model and tokenizer
fine_tuned_model = GPT2LMHeadModel.from_pretrained("./gpt2-emotion-finetuned")
fine_tuned_tokenizer = GPT2Tokenizer.from_pretrained("./gpt2-emotion-finetuned")
# แแแแแพแแขแแแแแ
prompt = "Emotion: i didnt feel well"
input_ids = fine_tuned_tokenizer.encode(prompt, return_tensors="pt")
output = fine_tuned_model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2)
generated_text = fine_tuned_tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Output
Emotion: i didnt feel well enough to go to the doctor but i was feeling good enough that i could go for a walk and not feel so bad about it all the time and that was good for me too because i had been feeling pretty good about my health for the past week and a half and i just needed to get through it without feeling like i got a bad grade or something and then i would be fine again and again until i went to my doctor and got my results and it was
แงแแถแ แแแแแแแแแแ แถแแแธแแแแแแแแพแฑแแแแแแแพแแกแพแแแผแ GPT-2 แแพแแแแปแแแทแแแแแแ “emotion” แแทแแแแแพแแถแแพแแแแธแแแแแพแแขแแแแแแแแแแแขแแแแพแขแถแแแแแแแ แแผแแ แแ แถแแแถแแแแผแแแแแแแแฝแแแแถแแแถแแแแแแ แแทแแแแ แแแแแปแแแทแแแแแแแแ แแถแแแแแแผแแแถแแแถแแแแถแแแแแแแขแแแ แแทแแแแแถแแแแแถแแแแแถแแ
References
1. Hugging Face Transformers Library Documentation:
- Main documentation: https://huggingface.co/transformers/
- Fine-tuning tutorial: https://huggingface.co/transformers/training.html
2. GPT-2 Model:
- Model card: https://huggingface.co/gpt2
3. Datasets Library:
- Main documentation: https://huggingface.co/docs/datasets/
- Emotion dataset: https://huggingface.co/datasets/emotion
- Hugging Face's language modeling example:
https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling
- Fine-tuning GPT-2 for text generation tutorial:
https://huggingface.co/blog/how-to-generate