Flash weight streaming for MLX: run massive models larger than your RAM on Apple Silicon.
macos machine-learning metal optimization mlx memory-optimization apple-silicon large-language-models llm llm-inference lm-studio weight-streaming
-
Updated
Mar 26, 2026 - Python