| | Where to Buy or Rent GPUs for LLM Inference: The 2026 GPU Procurement Guide (bentoml.com) |
| 3 points by sherlockxu 10 days ago | past | discuss |
|
| | ChatGPT Usage Limits: What They Are and How to Get Rid of Them (bentoml.com) |
| 1 point by bbzjk7 18 days ago | past |
|
| | LLM Benchmark and Optimization Explorer (bentoml.com) |
| 1 point by tanelpoder 60 days ago | past |
|
| | AMD Data Center GPUs Explained: MI250X, MI300X, MI350X and Beyond (bentoml.com) |
| 1 point by djhu9 67 days ago | past |
|
| | Nvidia Data Center GPUs Explained: From A100 to B200 and Beyond (bentoml.com) |
| 4 points by bbzjk7 74 days ago | past |
|
| | Benchmarks Show Speculative Decoding Needs the Right Draft Model for 3× Gains (bentoml.com) |
| 1 point by bbzjk7 3 months ago | past |
|
| | LLM Inference Handbook (bentoml.com) |
| 366 points by djhu9 4 months ago | past | 26 comments |
|
| | What Is InferenceOps (bentoml.com) |
| 2 points by sherlockxu 4 months ago | past |
|
| | The Shift to Distributed LLM Inference (bentoml.com) |
| 4 points by djhu9 5 months ago | past |
|
| | How to Beat the GPU CAP theorem in AI Inference (bentoml.com) |
| 3 points by sherlockxu 6 months ago | past |
|
| | Cold-Starting LLMs on Kubernetes in Under 30 Seconds (bentoml.com) |
| 2 points by djhu9 7 months ago | past |
|
| | Six Infrastructure Pitfalls Slowing Down Your AI Progress (bentoml.com) |
| 2 points by djhu9 7 months ago | past |
|
| | The Complete Guide to DeepSeek Models: From V3 to R1 and Beyond (bentoml.com) |
| 2 points by bbzjk7 8 months ago | past |
|
| | Survey shows 65% of organizations are still establishing their AI foundations (bentoml.com) |
| 1 point by sherlockxu 8 months ago | past | 1 comment |
|
| | 2024 State of AI Inference Infrastructure Survey Results (bentoml.com) |
| 2 points by bbzjk7 8 months ago | past |
|
| | Secure and Private DeepSeek Deployment (bentoml.com) |
| 1 point by djhu9 8 months ago | past |
|
| | A Guide to ComfyUI Custom Nodes (bentoml.com) |
| 1 point by bbzjk7 10 months ago | past |
|
| | A List of Top Open-Source Embedding Models (bentoml.com) |
| 5 points by bbzjk7 on Oct 30, 2024 | past | 1 comment |
|
| | Top Open-Source Vision Language Models (bentoml.com) |
| 1 point by sherlockxu on Oct 11, 2024 | past |
|
| | Exploring the World of Open-Source Text-to-Speech Models (bentoml.com) |
| 2 points by sherlockxu on Sept 20, 2024 | past |
|
| | Tuning TensorRT-LLM for Optimal Serving (bentoml.com) |
| 1 point by djhu9 on Sept 20, 2024 | past |
|
| | Compound AI Systems (bentoml.com) |
| 1 point by AnhTho_FR on Aug 24, 2024 | past |
|
| | Why should you care about compound AI? (bentoml.com) |
| 1 point by bbzjk7 on Aug 16, 2024 | past |
|
| | From Ollama to OpenLLM: Running LLMs in the Cloud (bentoml.com) |
| 3 points by sherlockxu on July 18, 2024 | past |
|
| | Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, TGI (bentoml.com) |
| 15 points by chaoyu on July 5, 2024 | past | 1 comment |
|
| | Is LMDeploy the Ultimate Solution? Why It Outshines VLLM, TRT-LLM, TGI, and MLC (bentoml.com) |
| 16 points by helloericsf on June 20, 2024 | past | 8 comments |
|
| | Stable Diffusion 3: Text Master, Prone Problems? (bentoml.com) |
| 3 points by sherlockxu on June 18, 2024 | past |
|
| | Building a RAG App with BentoCloud and Milvus Lite (bentoml.com) |
| 1 point by fzliu on June 14, 2024 | past |
|
| | Comparing LLM Optimization Tools: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and TGI (bentoml.com) |
| 2 points by bbzjk7 on June 8, 2024 | past |
|
| | Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TRT-LLM, and TGI (bentoml.com) |
| 12 points by sherlockxu on June 6, 2024 | past | 2 comments |
|
|
| More |