Skip to content
  • Jacek Strzalkowski's avatar
    Port pipeline from GPT-2 to DeepSeek-R1-Distill-Llama-8B · ae1e33f2
    Jacek Strzalkowski authored
    - Rewrite for Llama architecture: RMSNorm, GQA (32Q/8KV heads), RoPE, SwiGLU
    - Separate Q/K/V/output/gate/up/down projections (7 per block, was 4)
    - No biases on linear layers, no position embeddings
    - Add tokenizer_gguf.py: BPE tokenizer extracted from GGUF metadata
    - Fix 64-bit offset in llama_set_ptr (8GB+ weights file)
    - Fix _ftelli64 portability (MSVC vs GCC/Clang)
    - KV cache at N_KV_DIM=1024 (4x memory savings vs full embed)
    ae1e33f2
Loading