Generative AI
Published 16th September 2025
đŁ Headlines
⢠Anthropic lets Claude remember previous interactions, adding persistent memory, an optional incognito mode, and crossâexport to rival assistants to streamline enterprise workflows.
⢠OpenAI's first AI chip could be launched in 2026 to reduce reliance on Nvidia/AMD, while a new MicrosoftâOpenAI deal hints at IPO prospects and deeper collaboration.
⢠Oracle's new product is power, unveiling cloud hardware and AI chips to scale training/inference via OCI and OpenAI-aligned partnerships.
⢠Onâdevice AI accelerated with Armâs Lumex compute subsystem for smartphones/PCs, Firefox for iOS summarization (local on A17 Pro, cloud on older devices), and Appleâs iPhone Air/17 lineup with AIâforward hardware.
⢠Agentic AI moved from concept to practice as enterprises explore delegating multistep workflows with expert oversight [(https://news.crunchbase.com/ai/agentic-ai-evolution-wong-hron-thomson-reuters/)]. Startups launched security agents: AegisAI to neutralize email threats in real time, Lookoutâs Smishing AI for mobile social engineering, and Miruâs unified cyber investigations copilot.
⢠U.S. AI policy heated up: regulators probe AI companionship platforms, California advanced frontier model risk disclosure rules, while a proposal seeks a multiâyear federal regulatory waiver and sandbox for AI firms.
⢠Microsoft expanded Fabric with a native graph database and realâtime geospatial maps powered by LinkedIn tech, integrated with OneLake for unified analytics.
⢠RL training markets surged as Mercor targets a $10B+ valuation on a $450M run rate, linking model providers with domain experts for reinforcement learning workflows.
đ§ Company Engineering Blogs
Jupyter Agents: training LLMs to reason with notebooks (huggingfaceâ.co). Jupyter Agent builds a data science workflow inside notebooks using Qwen models, scaffolding, QA generation, and E2B execution pipelines
Accelerating scientific discovery with AI-powered empirical software (researchâ.google). Google Research presents an AI-powered system, built on Gemini, that writes, optimizes, and empirically evaluates scientific software across genomics, public health, geospatial analysis, neuroscience, and time-series forecasting
Scientific frontiers of agentic AI (amazonâ.science). Agentic AI explores embedding languages, context, negotiation, common sense, and privacy with embeddings, context windows, and behavioral economics insights
đ§ Model Architecture & Optimization: Qwen3-Next, MoE, Tokenization, Test-Time Compute
Qwen3-Next-80B-A3B: đ§đŚŠ Who needs legs?! (simonwillisonâ.net). Qwen3-Next-80B-A3B-Instruct and Thinking models; 80B with 3B active per round; OpenRouter deployment; llm-openrouter plugin; pelican SVG prompt; performance claims
lecture three (aarnphmâ.xyz). Lecture three on tokenizers, LLMs, alignment, sparse autoencoders, residual streams, and speculative decoding for efficient inference
assignment three reports. (aarnphmâ.xyz). Discussion of replacing one-hot cross-entropy, 2D GEMMs, batching, tokenization, and optimization techniques for large V vocabularies
Qwen 3 Next (sibellaviaâ.lol). Qwen3-Next-80B models with hybrid Gated DeltaNet, ultra-sparse MoE (512 experts), YaRN context up to 1,000,000 tokens, and multi-token prediction
LLM-driven Evolutionary Search to squeeze even more value out of Test-Time Compute (alexdongâ.com). LLM-driven evolutionary search uses islands, contextual feedback, and critique through role separation to optimize test-time compute
⥠Deterministic & Efficient LLM Inference and Serving
Defeating Nondeterminism in LLM Inference (simonwillisonâ.net). Nondeterminism in LLM inference arises mainly from varying load and batch size; paper proposes invariant kernels in PyTorch to achieve determinism
Speculative cascades â A hybrid approach for smarter, faster LLM inference (researchâ.google). Speculative cascades combine cascades and speculative decoding with a deferral rule to speed LLM inference and improve costâquality trade-offs
Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing (andlukyaneâ.com). Decentralized RL post-training with SAPO sharing rollouts across a swarm for LM fine-tuning and reward-based learning
The Rise of Multimodal LLMs and Efficient Serving with vLLM (pyimagesearchâ.com). Multimodal LLMs (LLaVA, GPT-4V, BakLLaVA) and vLLM enable OpenAI-compatible visionâlanguage inference and efficient deployment
Defeating Nondeterminism in LLM Inference â Thinking Machines Lab (jmasonâ.ie). Defeating nondeterminism in LLM inference by examining sampling, temperature effects, and deterministic behavior across stacks and libraries
đ Not for the Faint-Hearted: Diving Deep into GPT-OSS (visokioâ.com). GPT-OSS 20B & 120B open-weight models tested across llama.cpp, vLLM, HuggingFace, and lmstudio from MacBooks to H100 GPUs in Omniscope workflows
đ¤ Agentic Systems & RL: Frameworks, Evals, and Enterprise Patterns
Exploring Active Agent, or can we build AI features the Rails way? (evilmartiansâ.com). Rails-style AI abstractions with Active Agent: agents, prompts, callbacks, templates, and battle-tested Rails examples
Lessons learned from a 100 blog posts on AI (frontieraiâ.substackâ.com). Big-picture AI trends: economics of inference, token costs vs. volume, open-loop agents, evals, data quality, context management, and UX in AI apps
Generalists Can Also Dig Deep (towardsdatascienceâ.com). Generalist Ida SilfverskiĂśld on AI agents, RAG, evals, and design choices in agentic systems
Verlog: A Multi-turn RL framework for LLM agents (blogâ.mlâ.cmuâ.edu). Verlog introduces multi-turn RL for long-horizon LLM agents with turn-level abstraction, fixed-turn batching, dual discounting GAE, and critic pre-training
Beyond the Chatbot: What Actually Works in Enterprise AI (thedataexchangeâ.media). RAG systems evolution, evaluation as IP, embeddings, enterprise security, agent workflows, multi-modality, small models, and AI-enabled coding tools
đ ď¸ Applied LLMs: RAG, Data Pipelines, and AI in Science
Text analytics in Data Pipelines using AI (mediumâ.com/@edâ.bullen). Databricks AI Query workflows for ETL pipelines; using LLMs to classify, rate sentiment, and justify results on Amazon Reviews data
Single-cell analysis and infectious disease forecasting: Google's new AI scientist (blogâ.stephenturnerâ.us). AI systems generate and test new methods for single-cell RNA-seq batch integration and COVID-19 forecasting, surpassing some benchmarks
Stumbling into AI: Part 3âRAG (rmoffâ.net). Explains Retrieval-Augmented Generation (RAG) using embeddings, vector stores (ChromaDB), Ollama, and Llama models with Kafka release notes as example
Benchmarking AI & ML on local CPU/GPUs: an end-to-end Python project (allaboutdataâ.substackâ.com). Benchmarking AI/ML on local CPU/GPU with Python: XGBoost, Ollama, CUDA, uv, Altair, Streamlit dashboard and Docker-free workflow
đ Academic Research
Inpainting-Guided Policy Optimization for Diffusion Large Language Models (arxiv:cs). Inpainting-guided RL for diffusion LLMs improves exploration, using partial ground-truth reasoning to boost GRPO, with synthetic traces and entropy filtering
Can Understanding and Generation Truly Benefit Together -- or Just Coexist? (arxiv:cs). Unified multimodal learning: encoderâdecoder paradigm with long-context captions, UAE framework, Unified-GRPO RL, and Unified-Bench benchmark
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning (arxiv:cs). AgentGym-RL trains LLM agents for multi-turn decision making using RL, ScalingInter-RL for exploration-exploitation balance across diverse environments
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining (arxiv:cs). MuSe: efficient multipole-based attention for transformers via dual semantic clustering and dipole corrections
RewardDance: Reward Scaling in Visual Generation (arxiv:cs). RewardDance: scalable reward modeling for visual generation using yes-token probability, enabling large RMs and CoT integration
đ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves â vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worriesâthe newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!