YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
SFT and PEFT with NeMo 1.0
docker run -d --gpus all -it --rm \
--shm-size=16g \
--ulimit memlock=-1 --ulimit stack=67108864 \
nvcr.io/nvidia/nemo:24.07
note: ensure enough space writable within docker runtime, or expose a volume -v <host/path>:<container/path>
git clone https://huggingface.co/vuiseng9/nemo1-sft-peft-gemma-7b
cd nemo1-sft-peft-gemma-7b
# PEFT:
./run_peft.sh
# Eval:
./run_eval.sh
# SFT:
./run_sft.sh
Dataset (Output of this section has been pushed to this repo)
git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
# script in container
python3 /opt/NeMo-Framework-Launcher/launcher_scripts/nemo_launcher/collections/dataprep_scripts/dolly_dataprep/preprocess.py --input databricks-dolly-15k/databricks-dolly-15k.jsonl
# split into train/val/test
python3 split_train_val.py
Prepare Packed Dataset (optional, expect error, token not on cuda, just move it yourself)
# need the tokenizer
git clone https://huggingface.co/google/gemma-7b
HYDRA_FULL_ERROR=1 python /opt/NeMo/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
model.data.train_ds.file_names=[databricks-dolly-15k/training.jsonl] \
model.data.train_ds.max_seq_length=2048 \
+tokenizer_path=gemma-7b/tokenizer.model \
+output_dir=databricks-dolly-15k/ \
+pack_sizes=[2048,4096,8192]
# [NeMo I 2025-08-17 03:02:05 prepare_packed_ft_dataset:148] Done, output written to databricks-dolly-15k/packed_8192_seed0.npy
# [NeMo I 2025-08-17 03:02:05 prepare_packed_ft_dataset:150]
# โ
Packed datasets with pack sizes [2048, 4096, 8192] are prepared successfully.
# To train with packed sequences, you need to change three things in the SFT/PEFT config file
# 1. Turn on the packed_sequence flag
# > +model.data.train_ds.packed_sequence=True
# 2. Use the new dataset file instead of the original jsonl file
# > model.data.train_ds.file_names=/path/to/packed_dataset.npy
# 3. Specify the packed sequence length. This should be one of the ``pack_sizes`` you specified during data preparation.
# > model.data.train_ds.max_seq_length=<pack_size>
# 4. Adjust the batch sizes.
# Micro batch size has to be set to 1 as a nominal constraint. This is because batches are now concatenated
# in the preprocessing step. You can increase the pack_size to achieve the same purpose of increasing micro batch size.
# Global batch size has to be reduced by the average number of sequences per pack `n`,
# where n = total number of sequences / total number of packs. This ensures that each gradient iteration
# sees (on average) the same number of sequences so that the recipe is maintained.
# Please scroll up to see the value of n for each of your pack sizes.
# > model.micro_batch_size=1
# > model.global_batch_size=<previous GBS divided by n>
Convert the Hugging Face Gemma model to .nemo model (done)
git clone https://huggingface.co/google/gemma-7b
python3 /opt/NeMo/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py \
--input_name_or_path gemma-7b/ \
--output_path gemma-7b.nemo \
--tokenizer_path gemma-7b/tokenizer.model
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support