Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
-
Updated
Apr 13, 2025 - Python
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
An architectural persistence experiment for large language models. Claude’s Home gives an AI time, memory, and place by combining scheduled execution with a durable filesystem, allowing one continuous instance to reflect, create, and evolve across sessions.
Production-grade architecture patterns, decision frameworks, and best practices for building reliable AI agents. Framework-agnostic reference for engineers.
Jetta-Reinforcement-Learning-Hybrid-LLM-Architecture
Visualize some important concepts related to LLM architectures.
The Compositional Agentic Architecture (CAA): A blueprint for building reliable, deterministic, and safe industrial AI agents.
Multi-agent, policy-driven AI system for processing sensitive enterprise documents with extraction, analysis, verification, deterministic orchestration, and full audit logging. Designed for regulated environments (banking, finance, insurance).
A collection of Small Language Models (SLMs) built from scratch in PyTorch.
Educational AI chat client: provider abstraction, token compression & state management in ~600 lines Python. Learn robust AI integration patterns.
The first end-to-end programming language and compiler fully developed by AI.
Code and data for: Three Phases of Expert Routing — How Load Balance Evolves During MoE Training
An LVM-based Instruction Set Architecture (ISA) for context management. Modeling LLMs as Logic Processors with recursive logic trees to solve attention dilution in complex tasks. | 基于逻辑虚拟内存 (LVM) 与指令集架构 (ISA) 的 LLM 上下文协议。将模型建模为逻辑处理器,通过递归逻辑树与分层寻址,解决长程任务中的注意力稀释与智力坍缩。
A distributed, LLM-powered microservices architecture for deterministic marketing orchestration on Google Cloud.
Technical architecture and engineering lessons from building MyMate — a persistent-memory AI desktop application for long-session performance.
HSPMN: Hybrid Sparse-Predictive Matter Network - LLM architecture optimized for Blackwell GPUs bridging O(N) and O(N^2) routing via ALF-LB
Production-oriented Telegram → n8n → FastAPI intake CRM with deterministic state machine and audit log
Hackable PyTorch template for decoder-only transformer architecture experiments. Llama baseline with RoPE, SwiGLU, RMSNorm. Swap components, train, compare
DeepHelix — The DNA-Inspired DeepSeek Architecture
Reference architecture for structured AI memory lifecycle management — from the OPHION Memory OS Protocol.
Add a description, image, and links to the llm-architecture topic page so that developers can more easily learn about it.
To associate your repository with the llm-architecture topic, visit your repo's landing page and select "manage topics."