Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is possible to fine-tune CodeGen using Huggingface Transformers! Then you'd be able to fine-tune it on your own code and use the resulting model. However, training is more expensive -- you'd need an A6000 or better to train the 6B model. Something like the following should work:

    deepspeed --num_gpus 1 --num_nodes 1 run_clm.py --model_name_or_path=Salesforce/codegen-6B-multi --per_device_train_batch_size=1 --learning_rate 2e-5 --num_train_epochs 1 --output_dir=./codegen-6B-finetuned --dataset_name your_dataset --tokenizer_name Salesforce/codegen-6B-multi --block_size 2048 --gradient_accumulation_steps 32 --do_train --fp16 --overwrite_output_dir --deepspeed ds_config.json
Where run_clm.py is this script: https://github.com/huggingface/transformers/blob/main/exampl...

It might be doable to set this up on an AWS machine with a beefy GPU or two. I haven't tried it yet though.

Once you have a model trained in Huggingface Transformers you'd be able to convert it using this script:

https://github.com/moyix/fauxpilot/blob/main/converter/huggi...



I train models 24/7 right now and PLEASE do not use AWS for it. You're going to pay out of your backside for it.

Better alternatives: Google Colab, Paperspace Gradient, Lambdalabs Cloud, Vultr GPU instances

Colab will give you a T4, K80, V100 or P100 (alternatively their own TPUs) for free - $50 for 24h uninterrupted background jobs, Gradient will give you free A6000s and sometimes even free A100s for a $40 subscription for 6 hours (repeatable ad infinitum), Lambdalabs gives you a RTX 6000 for 0.50/hour and A6000 for 0.80/hour and Vultr GPU will give you 1/7th of an A100 for 0.37/hour


Thank you for sharing the command for finetuning! Is it possible to share your ds_config.json? I tried to finetune the 2B model on A100 (40GB) using your command, but got a CUDA out of memory error. The ds_config I used was the one from huggingface (https://github.com/huggingface/transformers/blob/main/tests/...).


A friend of mine runs Sushi Cloud (https://www.sushi.cloud/), which could help make things cheaper than AWS for training purposes.


I can't see how this is relevant to the discussion. There is no mention of GPU instances in the first place.


How do I create a dataset?


Have a look at the datasets library [1], but as a shortcut, you can just create a file named "my_code.json" in jsonlines format with one line per source file that looks like:

   {"text": "contents_of_source_file_1"}
   {"text": "contents_of_source_file_2"}
   ...
And then pass that my_code.json as the dataset name.

[1] https://github.com/huggingface/datasets




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: