摘要:
CUDA Refresher: The CUDA Programming Model https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/ To execute any CUDA program, there 阅读全文
摘要:
vLLM https://github.com/vllm-project/vllm https://docs.vllm.ai/en/latest/ 推理和服务,但是更加偏向推理。 vLLM is a fast and easy-to-use library for LLM inference and 阅读全文