BigCode StarCoder系列模型

StarCoderBase

HF: https://huggingface.co/bigcode/starcoderbase
Training dataset: The Stack v1.2
Orchestration: bigcode/Megatron-LM
Neural networks: PyTorch
#Paramters: 15.5B
#TrainingTokens: 1T
#ContextWindow : 8192 tokens
#GPUs: 512 Tesla A100
#TrainingTime: 24 days
Language: 80+ Programming languages

StarCoder

Fine-tuned from StarCoderBase, on 35B Python tokens from same Python dataset with 2 epochs.
HF: https://huggingface.co/bigcode/starcoder
Languages: 80+ Programming languages
refs

StarCoder-Megatron

Megatron-version of StarCoder
HF: https://huggingface.co/bigcode/starcoder-megatron
Language: 80+ Programming languages

StarCoder Plus

Fine-tuned from StarCoderBase on 600B tokens from the English web dataset RedefinedWeb combined with StarCoderData from The Stack (v1.2) and a Wikipedia dataset.
HF: https://huggingface.co/bigcode/starcoderplus
Language: English & 80+ Programming languages

StarChat Alpha

Fine-tuned from StarCoderBase, on a blend of oasst1 and databricks-dolly-15k datasets.
HF: https://huggingface.co/HuggingFaceH4/starchat-alpha
GitHub: https://github.com/bigcode-project/starcoder

StarChat Beta

Fine-tuned from StarCoderPlus, on an "uncensored" variant of the openassistant-guanaco dataset.
HF: https://huggingface.co/HuggingFaceH4/starchat-beta

posted @ 2023-07-03 16:54 LexLuc 阅读(58) 评论(0) 编辑收藏举报

刷新页面返回顶部

Lex个人随想乡

Attention before pay attention