We’re living in interesting times. Traveled ~300km from home. Left a Claude Code session running on my M3 Ultra to test continuous batching across all models (2TB of weights) and check for regressions. Overnight the M3 Ultra auto-updated, restarted, and killed both my session… https://twitter.com/Prince_Canuma/status/2045781748571681231/photo/1
中文: 我们生活在有趣的时代。 离家约300公里。在我的M3 Ultra上运行一个Claude Code会话,测试所有模型(2TB权重)的连续批处理,并检查回归。 隔夜,M3 Ultra 自动更新,重新启动,并导致我的两次训练都被击碎了......
RT @ActuallyIsaak: Introducing the MLX-Benchmark Suite!! https://github.com/Goekdeniz-Guelmez/MLX-Benchmark The first comprehensive benchmark for evaluating LLMs on…
中文: RT @ActuallyIsaak:推出MLX-Benchmark套件!! 首个用于在......上评估LLM的综合基准
Unlocking even more perf 😤🚢 https://twitter.com/Prince_Canuma/status/2045098671008575691/photo/1
中文: 解锁更多 😤🚢
RT @N8Programs: Qwen3.6 4bit DWQ now up on MLX, uses custom quantization scheme (4bit MLP 8bit everything else) + DWQ for additional gains.…
中文: RT @N8Programs:在MLX上显示的Qwen3.6 4位DWQ,采用自定义量化方案(4位MLP 8bit其他所有)+ DWQ以获取额外收益。......
My wife: “You should journal more, our kids will benefit from your life lessons while they are fresh.” Me: “Yeah, I will do it. I don’t have time… then tweets and validates wife requests”
中文: 我妻子:“你应该多写日记,我们的孩子在新生时会从你的人生课程中受益。” 我:“是的,我会去做。我没有时间......然后发推文并验证妻子的请求
RT @jelveh: Fantastic work by @Prince_Canuma - you rock!!
中文: RT @jelveh:@Prince_Canuma 的出色作品——太棒了!
RT @neural_avb: MLX bros and sises - DON’T miss this guy’s next post! Youll be able to do parallel and async requests to mlx vlm server af…
中文: RT @neural_avb:MLX 兄弟与丝——别错过这家伙的下一篇帖子! 您可以对 mlx vlm 服务器进行并行和同步请求。
The cat is out of the bag Dflash + continuous batch is coming as well. The current draft models are best with text-only inputs. https://twitter.com/Prince_Canuma/status/2044912770718511486/photo/1
中文: 猫已经出柜了 Dflash + 连续批次也即将到来。 当前的草稿模型最好使用仅使用文本输入。
When y’all realise how much I cooked here this will blow up
中文: 当你们都意识到我在这里做饭的多少时,就会大吃一惊
RT @dreamworks2050: @Prince_Canuma This is too much bro. 😭
中文: RT @dreamworks2050:@Prince_Canuma 这太过分了。😭
RT @dreamworks2050: MLX-VLM FEELS ILLEGAL to use 🔥 💀 🔥
中文: RT @dreamworks2050:MLX-VLM 使用 🔥 💀 🔥
Y'all aint ready for local multimodal coding agents on your Mac! Coming to a Mac near you tomorrow :) https://twitter.com/Prince_Canuma/status/2044883144982028292/photo/1
中文: 准备好在 Mac 上使用本地多模式编码代理! 明天来到你附近的一家Mac上 :)
Next mlx-vlm release will ship with continuous batching support on the server 🚀 What's coming: → Continuous batching — new requests join the active batch immediately, no waiting. Mixed image + text batches supported → OpenAI-compatible API — field-for-field match with… https://twitter.com/Prince_Canuma/status/2044882569020518746/video/1
中文: 下一个mlx-vlm版本将在服务器上持续发货🚀 即将发生的事情: → 连续批处理——新请求立即加入激活批次,无需等待。支持混合图像 + 文本批次 → 兼容 OpenAI 的 API — 字段匹配...
Great job guys, I expected as much!
中文: 工作很棒,我期望的一样!
RT @Ridheshdabhi: Cracked devs be like…
中文: RT @Ridheshdabhi:破解的开发者就像......
Awesome release, congrats to the. @PrismML team! It comes with day-0 support on MLX thanks to some of the work we did with bitnet-1.58 kernels a year ago. https://huggingface.co/collections/prism-ml/ternary-bonsai
中文: 精彩的发布,恭喜。@PrismML 团队! 得益于我们一年前使用 bitnet-1.58 内核所做的一些工作,它在 MLX 上支持了 Day-0。
👀
❤️
Haha, it's not that easy! I have my skills and they save me time but they are far from replacing me.
中文: 哈哈,这可不是那么容易! 我掌握自己的技能,他们为我节省时间,但远未取代我。
Congrats to @pcuenq, it was awesome to collaborate and trade notes on this! One of the most exciting ideas that came up thanks to the level of quality coding agents have achieved. I have been running a similar workflow since earlier this year and it saves me hours to sometimes…
中文: 恭喜@pcuenq,合作并就此进行交易真是太棒了! 得益于高质量编码代理的水平,提出了最令人兴奋的想法之一。 从今年早些时候开始,我一直在运行类似的工作流程,这为我节省了时间,有时甚至......
When I get too comfortable with a model, then boom 💥 provider drops a new iteration 🙌🏽
中文: 当我对一个模型过于自在时,那么 boom 💥 供应商就会放弃新版本 🙌🏽
This is the most detailed benchmark I have seen on my recent implemetations of TriaAttention + TurboQuant
中文: 这是我最近对TriaAttention + TurboQuant的影响最详细的基准
Coming to MLX 🚀
中文: 来到MLX 🚀
RT @elliotarledge: @Prince_Canuma wow! i'll be coming back to this from time to time to refresh myself. i think the education thing is pret…
中文: RT @elliotarledge:@Prince_Canuma 哇!我会时不时地回到那里来重新振发自我。我觉得教育问题很精彩......
My wife: “You should journal more, our kids will benefit from your life lessons while they are fresh.” Me: “Yeah, I will do it. I don’t have time… then tweets and validates wife requests”
中文: 我妻子:“你应该多写日记,我们的孩子在新生时会从你的人生课程中受益。” 我:“是的,我会去做。我没有时间......然后发推文并验证妻子的请求
RT @_karthik: > filling forms on the web sucks! you're usually giving the same info over and over again > so i made clacky, inspired by cl…
中文: RT @_karthik: 网上填写表格很糟糕!你通常会一遍又一遍地提供相同的信息 等等,我制作了clacky,灵感来自cl...
TriAttention MLX benchmark run on the full MATH500 is done after ~30h. We ran Gemma4-26B (5-bit) on M3 Ultra with KV cache budgets of 512, 1024, and 2048: → TA-2048: 76.6% vs 77.4% baseline — 4 problems lost out of 500 (-0.8%) → TA-1024: 75.6% — 9 problems lost (-1.8%) →… https://twitter.com/Prince_Canuma/status/2044539341762933040/photo/1
中文: 完整 MATH500 上的 TriAttention MLX 基准运行时间在 ~30 小时后完成。 我们在M3 Ultra上运行了Gemma4-26B(5位),KV缓存预算为512、1024和2048: → TA-2048:76.6% 对 77.4% 基线——在 500 个基准基准中丢失了 4 个问题(-0.8%) → TA-1024:75.6%——9个问题丢失(-1.8%) → . . .
RT @kyutai_labs: We're releasing OVIE, a novel view generation model trained entirely on single images. No multi-view datasets needed. Giv…
中文: RT @kyutai_labs:我们将推出OVIE,这是一种完全基于单一图像训练的新型视图生成模型。无需多视图数据集。 吉夫......
Makes me happy to hear this!❤️ Reminds me of how much fastMLX would have been great. We are cooking something new…
中文: 听到这个让我很开心!EE0]️ 让我想起 fastMLX 会有多棒。 我们正在烹饪一些新东西......
I will replace my dash cam analysis with Gemma 4 + Falcon Perception 🔥🙌🏽
中文: 我将用Gemma 4 + Falcon Perception 🔥🙌🏽 替换我的仪表场分析
Well done to the z-lab team 🔥🚀
中文: 对z-lab团队做得很好🔥🚀
❤️
This is running using MLX-VLM 😎 https://github.com/Blaizzy/mlx-vlm/pull/926
中文: 这是使用MLX-VLM 😎运行的
I have friends in high places 😎
中文: 我在高处有朋友😎
In case you missed 😎
中文: 如果错过了😎
Well said!
中文: 说得好!
👀
RT @WolframRvnwlf: My @aiDotEngineer Europe 2026 Highlight Reel - personal impressions from 3 days at the world's best AI conference: https…
中文: RT @WolframRvnwlf:我的@aiDotEngineer欧洲2026年高亮卷轴——全球最佳人工智能大会3天的个人印象:https...
RT @ivanfioravanti: @Prince_Canuma This is pure power!!!
中文: RT @ivanfioravanti:@Prince_Canuma 这是纯粹的力量!!!
RT @ClementDelangue: Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-co…
中文: RT @ClementDelangue:在拥抱面部中心上介绍内核 ✨ 如果运送一个GPU内核像推一个模型一样简单呢? - 预科...
Woohoo! Congratulations to my brother for this awesome release 🚀
中文: 哇呜!祝贺我哥哥发布这个精彩版本🚀
📊 TriAttention perplexity results on Gemma4-31B (bf16, wikitext-2) using MLX-VLM TA-2048 is lossless at 1K–2K context when it activates, then degrades gracefully: • +0.46 PPL at 4K • +1.25 at 8K • +1.95 at 50K — and stabilizing, not blowing up Important nuance:… https://twitter.com/Prince_Canuma/status/2044043971391893765/photo/1
中文: 📊 使用MLX-VLM对Gemma4-31B(bf16,wikitext-2)进行TriAttention的困惑效果 TA-2048 在 1K–2K 上下文激活时无损耗,然后优雅地降解: • 4K时+0.46 PPL • 8K时+1.25 • 50公里时+1.95,稳定,不起重 重要细微差别:
🧮 MATH 500 results for TriAttention on Gemma4-26B-A4B-it (5-bit quantized, M3 Ultra 512GB) using MLX-VLM TA-2048 preserves 96% of baseline accuracy (22/30 vs 23/30) with KV cache capped at 2048 tokens, regardless of reasoning length. Throughput stays rock-solid at ~77 tok/s… https://twitter.com/Prince_Canuma/status/2044040708571410763/photo/1
中文: 使用MLX-VLM在Gemma4-26B-A4B-it(5位量子化,M3 Ultra 512GB)上的MATH 500结果 TA-2048 保留了基线精度的96%(22/30 对 23/30),其KV缓存上限为2048个代币,无论其推理长度如何。 吞吐量保持坚如磐石,网址为 ~77 次......
RT @NarayanSanath: Our Falcon Perception with Gemma4 prompting for open-vocabulary segmentation, running locally on MLX
中文: RT @NarayanSanath:我们使用Gemma4的猎鹰感知,提示在MLX本地运行的开放式词汇细分
RT @neural_avb: Got to try out this VoxCPM2 model locally. Was trying out some voice cloning with the Pytorch as well as the 4-bit MLX ve…
中文: RT @neural_avb: 可以尝试在本地使用这款 VoxCPM2 模型。 正在尝试使用Pytorch以及4位MLX ve.com进行语音克隆。
RT @altryne: Our world is changing. I spent the last week listening to, chatting, dining, dancing with and interviewing the top AI Enginee…
中文: RT @altryne:我们的世界正在发生变化。 我花了上一周时间听、聊天、用餐、跳舞,并采访了顶尖的AI Engine......
One of the videos I featured at my @aiDotEngineer talk 🔥
中文: 我在 @aiDotEngineer 演讲中展示的其中一段视频 🔥
RT @MaziyarPanahi: Gemma 4 sees a kid and three dogs. Decides what matters. Calls SAM 3.1 Mask and bounding box. Spotlight on subjects. Ba…
中文: RT @MaziyarPanahi:Gemma 4 看到一个孩子和三条狗。决定什么重要。呼叫 SAM 3.1 面具和带边框。关注主题。巴......
Absolute killer use case for MLX-VLM!🔥🙌🏽
中文: MLX-VLM绝对杀手使用案例!EE0[EE]🏽
Local grounded reasoning using MLX will power a whole new generation of use cases that were previously only available on the cloud! From satellite imagery analysis, security systems all the way to robotics. I’m really excited for the latter. I spoke at length about these… https://twitter.com/Prince_Canuma/status/2042761667017105517/video/1
中文: 使用MLX进行本地接地推理,将为此前仅在云端可用的新一代使用案例提供动力! 从卫星图像分析,安防系统一直到机器人。 我对后者感到非常兴奋。 我详细谈到了这些......
RT @osanseviero: Our first successful Gemma 4 Runtime in London with @swyx @patloeber @nick_kango @cormacb and others! 💎Great to go out for…
中文: RT @osanseviero:我们首次在伦敦成功完成Gemma 4 Runtime,与 @swyx @patloeber @nick_kango @cormacb 等同!💎 出去很棒......
RT @adrgrondin: I’m excited to announce that I’ve joined @lmstudio 👾 The team behind the app is amazing and I couldn’t be more proud. I’l…
中文: RT @adrgrondin:我很高兴地宣布,我已加入 @lmstudio 👾 这款应用背后的团队非常出色,我再为此感到自豪。 我会......
❤️
🚀🔥
Woohoo, congratulations @adrgrondin! I couldn’t imagine a better match 🚀
中文: 哇,恭喜@adrgrondin! 我无法想象会有更好的比赛🚀
Just implemented TriAttention in MLX and the results are wild! You can get up to 81% KV compression at 60K tokens for Gemma-4-31B-IT in BF16 🔥 Unlike TurboQuant, which quantizes KV cache values, TriAttention prunes low-importance tokens entirely by scoring keys using… https://twitter.com/Prince_Canuma/status/2042021304270819394/photo/1
中文: 刚刚在MLX中实施了TriAttention,结果很疯狂! 在 BF16 中,Gemma-4-31B-IT 的 60K 代币最多可获得 81% 的 KV 压缩 🔥 与 TurboQuant 使用 TurboQuant 量化 KV 缓存值不同,TriAttention 完全通过使用密钥评分来修剪低重的代币......
I’m behind them chatting with @altryne and @marlene_zw 😂🙌🏽
中文: 我跟在他们后面聊天,@altryne 和 @marlene_zw 😂🙌🏽
🚀
RT @ClementDelangue: Anthropic had the most powerful cyber-security model in the history of this world and their internal code based still…
中文: RT @ClementDelangue:Anthropic 拥有当今世界历史上最强大的网络安全模式,其内部代码代码依然存在......
RT @julien_c: We are giving away Safetensors to the @pytorch foundation (shepherded by the Linux Foundation) Our shared goal is to make th…
中文: RT @julien_c:我们将向 @pytorch 基金会(由 Linux 基金会提供)赠送 Safetensors 我们共同的目标是实现......
Ask Mythos to leak its own weights 😂 https://twitter.com/Prince_Canuma/status/2041839027217641750/photo/1
中文: 请让神话泄露自身权重 😂
RT @angeloskath: A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm…
中文: RT @angeloskath:即将到来很长一段时间,但新的 mlx-lm 已在服务器和 Gemma 4 中提供了更好的批处理支持。 点胶安装 -U mlx-lm...
I love the internet! 😂 For me the most important part was OSS attempt (humbling experience) and seeing my child hood fav actress show up in an unexpected place. It’s obvious our beloved Milla knows nothing about the space, and honestly didn’t expect her too. Two things that… https://twitter.com/Prince_Canuma/status/2041612468208988354/photo/1
中文: 我热爱互联网!😂 对我来说,最重要的部分是奥西斯的尝试(令人谦卑的经历),看到我那个童子女女主角出现在一个意想不到的地方。 很明显,我们心爱的米拉对这个空间一无所知,说实话也没想到她。 两件事......
Ain’t no way your name is…
中文: 你的名字绝不是......
RT @OlivierBachem: Our goal in the Gemma team is to ship models that are useful by generalizing to unseen tasks. Hence, we are extremely s…
中文: RT @OlivierBachem:我们在Gemma团队的目标是通过通用化未知任务来提供有用的模型。因此,我们极其......
I have my visa for the UK for 6 months If you would like me to speak at your event, DMs are open 🚀
中文: 我持有英国签证六个月 如果你想让我在活动上发言,DMs 是开放的 🚀
Literally got this at 3pm today and have to fly tomorrow, Thank God!
中文: 今天下午3点终于得到了这个,明天必须飞起来。 感谢上帝!
I got my UK visa 😭❤️🙌🏽 UK and @aiDotEngineer here comes the King! https://twitter.com/Prince_Canuma/status/2041574009284767856/photo/1
中文: 我收到了英国签证 😭❤ø�🙌 英国和@aiDotEngineer来了国王!
👀 will you donate the Mac Mini for the cause https://twitter.com/Prince_Canuma/status/2041524355083989460/photo/1
中文: 👀 会捐赠 Mac Mini 以获取该事业的
Medium was once a great place... I wrote my best articles there back in 2018
中文: Medium 曾经是个很棒的地方...... 我在2018年在那里写过我最好的文章
Well done guys ❤️🚀🔥
中文: 干得好,伙计们❤🚀🔥
My favourite action actress from Resident Evil and many awesome movies is doing open source ❤️ First, never saw that coming! Second, that a time to be alive and doing open source! Open source for the win 🚀
中文: 我最喜欢的《生化危机》动作演员以及许多精彩电影都在做开源的❤ 首先,永远也看不到那一幕! 其次,是时候活着并做开源了! 获胜的开源 🚀
If this works well, we are looking into a new era! Well done Anemll 🔥🙌🏽
中文: 如果这效果好,我们正着眼于一个新时代! 干得好,阿尼姆🔥🙌🏽
This example as so much alpha! You can now literally generate vision agent traces and train smaller VLM on it completly on-device 🔥 cc: @TheZachMueller @MaziyarPanahi @ivanfioravanti @ActuallyIsaak https://twitter.com/Prince_Canuma/status/2041286374431633886/photo/1
中文: 这个例子就是阿尔法! 现在,您可以完全在设备上生成视觉代理痕迹并训练较小的VLM 🔥 @TheZachMueller @MaziyarPanahi @ivanfioravanti @AclyIsaak
The best ideas are the simplest, thank you @dahou_yasser! "the idea: Gemma4 looks at the image, decides what to segment, Falcon Perception returns pixel-accurate masks + metadata (centroid, area_fraction, bbox), Gemma4 reasons on the numbers and calls the next tool or answers."…
中文: 最好的想法很简单,谢谢@dahou_yasser! 想法:Gemma4 会查看图像,决定分割内容,Falcon Perception 返回像素准确的 mages + 元数据(centroid、area_fractie、bbox),Gemma4 对数字进行推理,并调用下一个工具或答案。......
RT @nibzard: added @cohere transcribe to my small transcribing cli running natively on Apple Silicon via MLX-audio from @Prince_Canuma http…
中文: RT @nibzard:通过 @Prince_Canuma 的 @cohere 转录到我通过 MLX-audio 原生运行的 Apple Silicon 小转录中。
🫡❤️
Awesome work by @no_stp_on_snek 🔥
中文: @no_stp_on_snek 的出色作品 🔥
RT @roboflow: here's what you can build for $0.00 with 3 open source models token cost breakdown below and the company getting rich off i…
中文: RT @roboflow:以下是使用3种开源模型,售价0.00美元的版本 以下是代币成本的细分,公司从中致富......
RT @TheZachMueller: Well, that's kinda cool https://twitter.com/TheZachMueller/status/2041139872849690789/photo/1
中文: RT @TheZachMueller:嗯,这有点酷
Woohoo 🎉
中文: 伍胡 🎉
RT @MaziyarPanahi: https://x.com/i/article/2041078649185591296
Gemma 4 26B A4B IT (4bit) + M5 Max + MLX-VLM 🚀
中文: Gemma 4 26B A4B IT(4位)+ M5 Max + MLX-VLM 🚀
We exist in a corner of X 🫡
中文: 我们存在于X 🫡的一个角落
Have a new label for certain type of PRs 😤 https://twitter.com/Prince_Canuma/status/2040902161555464504/photo/1
中文: 为特定类型的公关人员(EE0)提供新标签
Awesome work! 🔥
中文: 很棒的工作!🔥
Hopefully this shines a light 💡on anyone trying to benchmark TBQ on MLX but doesn’t know how
中文: 希望这能让任何试图在MLX上对TBQ进行基准测试但不知道如何进行的人大放异彩
@zigelbaum @GoogleDeepMind “Via codex” It’s hard to test something if you don’t know how to test it yourself. I put up a benchmark script in the thread that you can use to test and have codex interpret the results for you. But before you run it, ask codex to install the changes in this branch.…
中文: @zigelbaum @GoogleDeepMind “通过密码” 如果你不知道如何自己测试,就很难测试它。 我在帖子中设置了一个基准脚本,供你用来测试,并让密码为你解释结果。 但在运行之前,请让 Codex 安装此分支中的更改。......
This is benchmark is multimodal (images + text) It has from 1 up-to 26 images in a prompt Use this PR, it has patch to enable Gemma 4 to support multiple images: https://github.com/Blaizzy/mlx-vlm/pull/938
中文: 基准是多式联运(图片 + 文本) 提示中最多可显示1到26张图片 使用此PR,它具有补丁,使Gemma 4能够支持多种图像:
Why TBQ only quantizes full-attention layers in Gemma 4 31B, not the sliding-window ones: TLDR, It’s a bad idea because the sliding layers are already memory-efficient by design. 😂 → 50 sliding layers hold a fixed ~400MB regardless of context length → 10 full-attention…
中文: 为何TBQ仅量化Gemma 4 31B中的全图层,而不是滑动窗口层: TLDR,这不太好意,因为滑动层在设计上已经具有了高能效。😂 → 50个滑动层,无论上下文长度如何,都固定在约400MB → 10个全心......
Alongside MM-NIAH I’m also running LongBench-V2 to truly showcase where TurboQuant shines which is large context ( above 60K) Running will take around 24h to complete. Meanwhile, here is a sneak peak of 6 samples across different context sizes. See you in a day or two 🫡 https://twitter.com/Prince_Canuma/status/2040881635449598238/photo/1
中文: 除了MM-NIAH之外,我还将运行LongBench-V2,以真正展示TurboQuant在大背景环境中的亮点(高于60K) 跑步大约需要24小时才能完成。与此同时,以下是不同上下文大小的6个样本的快速峰值。 一两天内见 🫡
TurboQuant: Open Evals on MLX 🔥 Yesterday I launched mlx-vlm v0.4.4 with major TurboQuant performance improvements. Today, the open benchmark results on MM-NIAH (val, 520 samples) using Gemma 4 26B IT by @GoogleDeepMind on M3 Ultra: → 0 quality loss — 78% accuracy for both… https://twitter.com/Prince_Canuma/status/2040877782922649865/photo/1
中文: TurboQuant:在MLX上打开Evals 🔥 昨天我推出了 mlx-vlm v0.4.4 ,带来了 TurboQuant 的显著性能改进。 今天,使用 @GoogleDeepMind 的 Gemma 4 26B IT 在 M3 Ultra 上对 MM-NIAH(val,520 个样本)的公开基准结果进行: → 0 质量损失——两者准确率均为78%......
RT @jtdavies: TurboQuant from mlx-vlm seems to help with larger context (64k and above). I ran the full 4, 8-bit and bf16 of the Gemma 4 26…
中文: RT @jtdavies:来自 mlx-vlm 的 TurboQuant 似乎有助于更大的上下文(64k 及以上)。我运行了Gemma 4 26的完整4、8位和bf16。
RT @osanseviero: See you there! Excited to share about Gemma 4 and what the team has been cooking for the last few months
中文: RT @osanseviero:见!很高兴能分享关于Gemma 4以及团队过去几个月所烹饪的内容
Falcon Perception by @TIIuae on MLX-VLM 🚀
中文: 《猎鹰感知》,@TIIuae,MLX-VLM,EE0
RT @osanseviero: Gemma 4 is now in Android Studio! You can use Android Studio Agent mode to develop features, vibe code Android apps, refa…
中文: RT @osanseviero:Gemma 4 现已在 Android Studio 中! 您可以使用 Android Studio Agent 模式开发功能、 vibe 代码 Android 应用、refa...
Another really awesome visual grounding example powered by a couple vision language models (Gemma 4 + Falcon Perception) on a M1 Pro with 32GB using mlx-vlm 🚀 Well done @korale77
中文: 另一款出色的视觉接地示例,采用M1 Pro,配备32GB,采用MLX-vlm,采用多种视觉语言模型(Gemma 4 + Falcon Perceptions)🚀 干得好 @korale77
Goals 🤣🙌🏽
中文: 进球🤣🙌🏽
RT @ivanfioravanti: I spent 3 hours this morning working with coding agents on MacBook 16" M5 Max in LOW POWER mode! 😱 I noticed 0 differe…
中文: RT @ivanfioravanti:今天早上我花了3个小时在低功率模式下与 MacBook 16 英寸 M5 Max 的编码代理合作!😱 我注意到0个不同......
❤️
😂 oh my
中文: 😂 天啊
❤️
❤️ https://x.com/Prince_Canuma/creator-subscriptions/subscribe
RT @awnihannun: Because of AI people are starting to value experience in a domain more than they used to. It feels short sighted. - Many (…
中文: RT @awnihannun:由于人工智能,人们开始比过去更重视在域名中的体验。感觉视线很短。 - 很多(......
First time lapse of Gym Geeks 🤣 Where wifey and I train hard, and maybe discuss latest updates in the AI space. @MaziyarPanahi https://twitter.com/Prince_Canuma/status/2040564084576281022/video/1
中文: 健身房间歇🤣 在我们和妻子进行艰苦训练的地方,或许可以讨论人工智能领域的最新动态。 @MaziyarPanahi
Awesome work by Yasser from @TIIuae 🚀
中文: 雅瑟尔从 @TIIuae 🚀 出的作品精彩
This demo is such powerful example of what’s possible on your Mac using MLX-VLM! It joins two of my favourite latest releases from @GoogleDeepMind and @TIIuae 🚀
中文: 这个演示是使用MLX-VLM在Mac上实现的强大示例! 它与我最喜爱的两个最新版本来自 @GoogleDeepMind 和 @TIIuae 🚀
If you quantize the model you have even more memory savings and speeds ups 🚀 Thanks to @jtdavies for testing it out!
中文: 如果你量化模型,你的内存节省更多,并加快了速度 🚀 感谢@jtdavies 进行测试!
Woohoo Gemma 4 in your pocket thanks to @adrgrondin MLX-Swift port 🚀 Download and try out on his @LocallyAIApp
中文: Woohoo Gemma 4 在您的口袋里,感谢 @adrgrondin MLX-Swift 端口 🚀 下载并试用他的@LocallyAIApp
RT @Prince_Canuma: @ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wan…
中文: RT @Prince_Canuma:@ptremblay 我知道你不是,我们之前有过互动😊 对我来说,一整天都过得不久,因为其他人都不想......
RT @phonezawphyo: @Erik0XAi @Prince_Canuma This is what I’m running python3 -m mlx_vlm.server --model gemma-4-26b-a4b-it-4bit --port 8086…
中文: RT @phonezawphyo:@Erik0XAi @Prince_Canuma 这就是我正在运行的 python3 -m mlx_vlm.server --- 型号 gemma-4-26b-a4b-it-4位 - 端口 8086...
@ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wanted to reason about this. In short and simple terms, I think current models have significantly higher usable context, most around 128K to 256K. But we are now seeing…
中文: @ptremblay 我知道你不是,我们之前有过互动 😊 对我来说只是漫长的一天,因为其他人都不想为此感到拋子。 简而言之,我认为现有模型的可用环境显著更高,大多数约为128K到256K。但我们现在看到了......
RT @ollama: @Prince_Canuma @spark_arena @WesEklund @Prince_Canuma Thank you for all the work you do! Here to just give you ❤️❤️❤️❤️
中文: RT @ollama:@Prince_Canuma @spark_arena @WesEklund @Prince_Canuma 感谢你们所做的一切工作!在这里,请给你❤️❤️❤️❤️
This is incredible work by my great friend Zach 🔥 Generating synthetic data using an OSS models and agent harness. The data is all open too. Check it out!
中文: 这是我的好朋友扎克🔥的这项出色作品 使用OSS模型和代理线束生成合成数据。 数据也全部开放。 来看看吧!
Don’t understand people that try to gaslight others when they start losing an argument, it rarely works… One of my mentors and former boss always used to asked me: “Is that opinion backed by data or intuition? It’s ok if you don’t have data, just make sure you don’t make…
中文: 不要理解那些在别人开始失去争论时试图点燃他人的人,这种事很少奏效...... 我的一位导师和前任老板总是问我: 这种观点是由数据还是直觉支持的? 没有数据就没问题,只需确保你不要做......
🤣
Lol I’ve been doing ML research for a decade and helped the field progress fam… https://twitter.com/Prince_Canuma/status/2040489309040423100/photo/1
中文: 哈哈 我从事机器学习研究已有十年,并帮助该领域取得进步......
Thanks Rojan! More public tests on the improved TurboQuant 🚀 You should see improvements across the board Here you see a slightly improvement in speed and peak memory even at lower context settings between v0.4.3 and v0.4.4 It should be much larger as context grows
中文: 谢谢罗扬! 改进后的TurboQuant 🚀 上的更多公开测试 你应该全面看到改进 在这里,即使在 v0.4.3 和 v0.4.4 之间的较低上下文设置下,您也能看到速度和峰值内存的略微提升 随着背景的增长,应该会大得多
Can’t wait 🚀
中文: 迫不及待了 🚀
RT @ivanfioravanti: BOOM! Let's test this magic!
中文: RT @ivanfioravanti: BOOM!让我们来测试一下这个魔法!
Do your homework before speaking or forming opinions…
中文: 在说话或形成意见之前先做好功课......
@spark_arena @WesEklund I have been a contributor on MLX-LM since 2024 I know everything there is about that project. It has real benchmarks that work and it’s the inspiration for MLX-VLM and all my projects. You confuse model tests to guard against regressions with benchmarks. Those tests are…
中文: @spark_arena @WesEklund 自2024年以来,我一直是MLX-LM的撰稿人,我了解该项目的全部内容。 它具有真正有效的基准,也是MLX-VLM及我所有项目的灵感来源。 将模型测试与基准测试混淆,以防止回归。这些测试是......
Well done 🔥🙌🏽
中文: 干得好🔥🙌
My brother is pushing his Mac to the max using MLX and torch🚀 Don’t know why he is using torch 😭 when the entire pipeline exists in MLX-VLM Sam 3 ✅ RF-DETR ✅
中文: 我哥哥正在用MLX和火炬将他的Mac推向最大值🚀 不知道为什么在MLX-VLM中存在整个管道时,他使用了火炬😭 萨姆 3✅ 射频-DETR ✅
I hope I can make in time for AI Engineer next week ❤️ Still waiting for the VISA
中文: 我希望下周能及时为人工智能工程师工作❤ 仍在等待签证
Haha 😎 I will share a detailed post later about all the improvements
中文: 哈哈 😎 稍后我会分享一篇关于所有改进的详细文章
Awesome testimony 🫡 It makes me happy to hear stories like this
中文: 出色的证词 🫡 听到这样的故事让我感到很开心
Now let’s pull some heavy weights Back day 🏋🏽‍♂️ https://twitter.com/Prince_Canuma/status/2040467019879854104/photo/1
中文: 现在让我们减轻一些沉重负担 返回日 🏋🏽 ♂️
You can’t fake what you care about! I truly care about helping people through technology. That’s my life’s mission and motto. We will win 🏆
中文: 你不能假装你在乎的东西! 我真心关心通过技术帮助人们。这就是我人生的使命和座右铭。 我们将获胜🏆
More public tests coming through 🚀
中文: 更多公开测试通过 🚀
The hardest part was benchmarking with one machine 🥲 Each iteration takes 30-1h to validate, so I lost sleep trying to land this ahead of the Gemma 4 launch, but failed. I’ll need one more Maxed out Mac Studio to help me ship faster and test distributed. That’s why I could… https://twitter.com/Prince_Canuma/status/2040463753536327897/photo/1
中文: 最难的部分是使用一台机器进行基准测试🥲 每次迭代需要30到1小时才能验证,因此在Gemma 4发布之前,我试图实现这一点时失去了睡眠,但失败了。 我还需要一台 Maxed Out Mac Studio 来帮助我更快地发货并测试分发。 这就是为什么我可以......
For the MatFormer variants (E2B and E4B) I don’t see memory savings but do see faster generation
中文: 对于MatFormer变体(E2B和E4B),我看不到内存节省,但确实能看到更快的生成速度
Will test compressing RotatingKVCache later today and see it yields better performance overall 🚀 If it works, we might see massive improvements and potentially unlocking 1M context at 50-100B param range
中文: 今天晚些时候将测试压缩旋转KVCache,并使其整体表现更出色🚀 如果它有效,我们可能会看到大规模改进,并可能在50到100B的参数范围内解锁1M环境
Correction: Device: M3 Ultra 512GB
Gemma 4 31B-IT gets 1.4GB memory savings with TurboQuant on MLX-VLM v0.4.4 💾 This one’s a 59GB dense model — all 60 layers use full attention, but 50 of them use RotatingKVCache with a fixed 1024-token window. TBQ only compresses the 10 full-attention layers (every 6th),… https://twitter.com/Prince_Canuma/status/2040456230737453301/photo/1
中文: 使用 MLX-VLM v0.4.4 的 TurboQuant,Gemma 4 31B-IT 可节省 1.4 GB 内存 💾 这款机型采用59GB的密集模式——全部60层都采用全神贯注,但其中50层采用带有固定1024个令牌窗口的RotatingKVCache。TBQ 仅压缩了10个全注意力层(每6个)......
Shout out to @no_stp_on_snek for his awesome llama.cpp turboquant implementation and tip to skip QJL. One of the many improvements of the latest release was to skip QJL and it worked well with no noticeable loss in coherence 🚀
中文: 向@no_stp_on_snek大声呼喊,感谢他出色的 llama.cpp 涡轮增压器实现效果,并提示他跳过 QJL。 最新版本的众多改进之一是跳过QJL,且效果良好,且一致性没有明显损失🚀
Gemma 4 26B-A4B is now ~2x faster at 375K context with TurboQuant on MLX-VLM v0.4.4 🚀 The model's official max context is 262K but I pushed it to 375K anyway. That's roughly 5–6 full novels (the entire LOTR trilogy + The Hobbit). Up to ~20K tokens they're neck and neck, but… https://twitter.com/Prince_Canuma/status/2040454774357676344/photo/1
中文: Gemma 4 26B-A4B 在 375K 的 TurboQuant 上使用 MLX-VLM v0.4.4 实现速度快 ~2 倍 🚀 该模型的官方最大值为262K,但我还是将其推送到了37.5K。这大约是5到6部完整的小说(整个LOTR三部曲《霍比特人》)。 最多可达约20K的代币,它们是颈部和颈部,但......
mlx-vlm v0.4.4 is out 🚀🔥 New models: 🦅 Falcon-Perception 300M by @TIIuae Highlights: ⚡️ TurboQuant Metal kernels optimized — upto 1.90x decode speed up over baseline on longer context with 89% KV cache savings. 👀 VisionFeatureCache — multi-turn image caching so you don’t… https://twitter.com/Prince_Canuma/status/2040451789363851350/photo/1
中文: mlx-vlm v0.4.4 已出局 🚀🔥 新模型: 🦅 Falcon-Perception 300M 由 @TIIuae 拍摄 亮点: ⚡️ TurboQuant 金属内核经过优化——在较长的上下文中,解码速度可在基准上加速,可节省 89% KV 缓存。 👀 VisionFeatureCache — 多转图像缓存,让您无法使用......
Well, if this trend continues most open-source projects will become invite only contributions. I’m seeing the same issues my friend @ngxson 😄 https://twitter.com/Prince_Canuma/status/2040366474036936865/photo/1
中文: 如果这种趋势持续下去,大多数开源项目将仅成为邀请捐款。 我看到的同样问题是我的朋友 @ngxson 😄
RT @MLStreetTalk: I couldn't find any benchmarks of folks running the Gemma models on an M4 Max (with Ollama 0.20 and mlx-vlm), so I just g…
中文: RT @MLStreetTalk:我找不到任何使用M4 Max(使用Ollama 0.20和mlx-vlm)运行Gemma模型的基准测试标准,所以我就......
RT @Karmedge: Its happening. 6:30 presidio https://twitter.com/Karmedge/status/2040201718986944528/photo/1
中文: RT @Karmarge: 正在发生。6:30 presidio
Pretty cool, well done 👏🏽
中文: 相当酷,做得不错👏🏽
🚀
Guess they called it "Turbo" for a reason 👀 Model: Gemma-4-26B-A4B-it Precision: BF16 Device: M3 Max 96GB https://twitter.com/Prince_Canuma/status/2040260062963286051/photo/1
中文: 猜猜他们之所以称之为“Turbo”,是有原因的👀 型号:Gemma-4-26B-A4B-it 精度:BF16 设备:M3 Max 96GB
Success is about the reps and dedication to the craft.
中文: 成功在于对工艺的代表和奉献。
My wife says I should post time lapses of me working What do you think?
中文: 我妻子说我应该把工作时间间隔发一次 你觉得怎么样?
Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the…
中文: 使用 MLX 上的 TurboQuant KV 缓存运行 Gemma 4 31B 🔥 12.8万语境: → KV 内存:13.3 GB → 4.9 GB(减少63%) → 峰值内存:75.2 GB → 65.8 GB(-9.4 GB) → 质量得以保留 TurboQuant 压缩秤,序列长度较长,上下文越长,...
You can now run Ollama using MLX as a backend 🚀
中文: 现在你可以使用MLX作为后端来运行Ollama 🚀