地端模型 | LocalLLaMA (07/04) | 凱凱的技術筆記

GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey

🔥 讚數: 821 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

這篇文章記錄了一位硬核玩家為了在本地運行強大的 GLM5.2 模型，打造了一套豪華的硬體配置：五張 RTX Pro 6000 顯卡搭配一張 RTX 5090。這種「壕無人性」的組合不僅展示了本地部署大型語言模型的極致算力需求，也引發了社群對於性價比與硬體投資回報率的熱烈討論。許多網友驚嘆於這套系統的龐大成本，同時也好奇其在實際推理速度與效能上的表現。

Deepseek drops another HUGE breakthrough - DSpark. Waaay faster than MTP [Video explaining it]

🔥 讚數: 527 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

DeepSeek 再次推出重磅突破，發表了名為 DSpark 的新技術。根據社群影片與測試結果，DSpark 的運算速度遠超以往的 MTP 技術，這意味著本地模型在生成回應時將更加流暢且即時。這項進展被視為本地 AI 生態系的重要里程碑，因為它大幅縮小了本地部署模型與雲端大型模型之間的效能差距，讓使用者能以更低的延遲享受高品質的 AI 體驗。

Mistral released Leanstral-1.5-119B-A6B

🔥 讚數: 438 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

Mistral AI 正式發布了全新的 Leanstral-1.5-119B-A6B 模型。這是一個擁有 1190 億參數的大型語言模型，專為高效能與輕量化設計。該模型的推出豐富了 Hugging Face 上的開源模型庫，為開發者和研究人員提供了一個強大的新工具，特別適合那些需要處理複雜任務但又不希望硬體負擔過重的本地部署場景。

Palantir is a free org on HF with 0 open-source models and 0 public datasets shared

🔥 讚數: 427 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

這張圖表引發了社群對 Palantir 在 Hugging Face 平台上表現的調侃。儘管 Palantir 是一家知名的數據分析與 AI 大公司，但其在 HF 上的官方組織頁面卻顯示擁有零個開源模型和零個公開數據集。網友們戲稱這是一家「只進不出」的公司，並藉此諷刺某些企業在開源社群中的「免費搭便車」行為，引發了關於企業開源貢獻度的有趣討論。

llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090

🔥 讚數: 364 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

透過對 llamacpp 進行關鍵補丁（patch），使用者成功在單張 RTX 5090 顯卡上本地運行 DeepSeek V4 Flash 模型，並實現了完整的 100 萬 Token 上下文窗口。這項技術突破展示了優化後的推理引擎如何釋放硬體潛力，讓消費級顯卡也能處理極長文本的記憶與分析任務，對於需要處理文檔、代碼庫或長篇對話的用戶來說是一大福音。

Follow-up: DeepSeek V4 Flash on 2x RTX PRO 6000 finishes real coding tasks faster than Sonnet and Opus, at about Sonnet quality

🔥 讚數: 222 | 📂 討論板: r/LocalLLaMA
🔗 原文連結: 點擊這裡

這是一篇關於 DeepSeek V4 Flash 在兩張 RTX PRO 6000 顯卡上運行效能的後續追蹤報告。測試結果顯示，該本地模型在處理真實編碼任務時，速度甚至超過了 Anthropic 的 Sonnet 和 Opus 模型，且品質大致相當於 Sonnet 的水準。這一結果有力地證明了本地部署模型在特定任務（如程式碼生成）上不僅具備競爭力，甚至在速度上還能超越雲端巨頭，為企業私有化部署提供了極具吸引力的選項。

GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey#

Deepseek drops another HUGE breakthrough - DSpark. Waaay faster than MTP [Video explaining it]#

Mistral released Leanstral-1.5-119B-A6B#

Palantir is a free org on HF with 0 open-source models and 0 public datasets shared#

llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090#

Follow-up: DeepSeek V4 Flash on 2x RTX PRO 6000 finishes real coding tasks faster than Sonnet and Opus, at about Sonnet quality#

GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey

Deepseek drops another HUGE breakthrough - DSpark. Waaay faster than MTP [Video explaining it]

Mistral released Leanstral-1.5-119B-A6B

Palantir is a free org on HF with 0 open-source models and 0 public datasets shared

llamacpp patch - DeepSeek V4 Flash running with full 1M token context locally on RTX 5090

Follow-up: DeepSeek V4 Flash on 2x RTX PRO 6000 finishes real coding tasks faster than Sonnet and Opus, at about Sonnet quality