🧠

Generative AI

Published 16th September 2025

📣 Headlines

• Anthropic lets Claude remember previous interactions, adding persistent memory, an optional incognito mode, and cross‑export to rival assistants to streamline enterprise workflows.

• OpenAI's first AI chip could be launched in 2026 to reduce reliance on Nvidia/AMD, while a new Microsoft–OpenAI deal hints at IPO prospects and deeper collaboration.

• Oracle's new product is power, unveiling cloud hardware and AI chips to scale training/inference via OCI and OpenAI-aligned partnerships.

• On‑device AI accelerated with Arm’s Lumex compute subsystem for smartphones/PCs, Firefox for iOS summarization (local on A17 Pro, cloud on older devices), and Apple’s iPhone Air/17 lineup with AI‑forward hardware.

• Agentic AI moved from concept to practice as enterprises explore delegating multistep workflows with expert oversight [(https://news.crunchbase.com/ai/agentic-ai-evolution-wong-hron-thomson-reuters/)]. Startups launched security agents: AegisAI to neutralize email threats in real time, Lookout’s Smishing AI for mobile social engineering, and Miru’s unified cyber investigations copilot.

• U.S. AI policy heated up: regulators probe AI companionship platforms, California advanced frontier model risk disclosure rules, while a proposal seeks a multi‑year federal regulatory waiver and sandbox for AI firms.

• Microsoft expanded Fabric with a native graph database and real‑time geospatial maps powered by LinkedIn tech, integrated with OneLake for unified analytics.

• RL training markets surged as Mercor targets a $10B+ valuation on a $450M run rate, linking model providers with domain experts for reinforcement learning workflows.

🔧 Company Engineering Blogs

Jupyter Agents: training LLMs to reason with notebooks (huggingface.co). Jupyter Agent builds a data science workflow inside notebooks using Qwen models, scaffolding, QA generation, and E2B execution pipelines

Accelerating scientific discovery with AI-powered empirical software (research.google). Google Research presents an AI-powered system, built on Gemini, that writes, optimizes, and empirically evaluates scientific software across genomics, public health, geospatial analysis, neuroscience, and time-series forecasting

Scientific frontiers of agentic AI (amazon.science). Agentic AI explores embedding languages, context, negotiation, common sense, and privacy with embeddings, context windows, and behavioral economics insights

🧠 Model Architecture & Optimization: Qwen3-Next, MoE, Tokenization, Test-Time Compute

Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?! (simonwillison.net). Qwen3-Next-80B-A3B-Instruct and Thinking models; 80B with 3B active per round; OpenRouter deployment; llm-openrouter plugin; pelican SVG prompt; performance claims

lecture three (aarnphm.xyz). Lecture three on tokenizers, LLMs, alignment, sparse autoencoders, residual streams, and speculative decoding for efficient inference

assignment three reports. (aarnphm.xyz). Discussion of replacing one-hot cross-entropy, 2D GEMMs, batching, tokenization, and optimization techniques for large V vocabularies

Qwen 3 Next (sibellavia.lol). Qwen3-Next-80B models with hybrid Gated DeltaNet, ultra-sparse MoE (512 experts), YaRN context up to 1,000,000 tokens, and multi-token prediction

LLM-driven Evolutionary Search to squeeze even more value out of Test-Time Compute (alexdong.com). LLM-driven evolutionary search uses islands, contextual feedback, and critique through role separation to optimize test-time compute

⚡ Deterministic & Efficient LLM Inference and Serving

Defeating Nondeterminism in LLM Inference (simonwillison.net). Nondeterminism in LLM inference arises mainly from varying load and batch size; paper proposes invariant kernels in PyTorch to achieve determinism

Speculative cascades — A hybrid approach for smarter, faster LLM inference (research.google). Speculative cascades combine cascades and speculative decoding with a deferral rule to speed LLM inference and improve cost–quality trade-offs

Paper Review: Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing (andlukyane.com). Decentralized RL post-training with SAPO sharing rollouts across a swarm for LM fine-tuning and reward-based learning

The Rise of Multimodal LLMs and Efficient Serving with vLLM (pyimagesearch.com). Multimodal LLMs (LLaVA, GPT-4V, BakLLaVA) and vLLM enable OpenAI-compatible vision–language inference and efficient deployment

Defeating Nondeterminism in LLM Inference – Thinking Machines Lab (jmason.ie). Defeating nondeterminism in LLM inference by examining sampling, temperature effects, and deterministic behavior across stacks and libraries

🚀 Not for the Faint-Hearted: Diving Deep into GPT-OSS (visokio.com). GPT-OSS 20B & 120B open-weight models tested across llama.cpp, vLLM, HuggingFace, and lmstudio from MacBooks to H100 GPUs in Omniscope workflows

🤖 Agentic Systems & RL: Frameworks, Evals, and Enterprise Patterns

Exploring Active Agent, or can we build AI features the Rails way? (evilmartians.com). Rails-style AI abstractions with Active Agent: agents, prompts, callbacks, templates, and battle-tested Rails examples

Lessons learned from a 100 blog posts on AI (frontierai.substack.com). Big-picture AI trends: economics of inference, token costs vs. volume, open-loop agents, evals, data quality, context management, and UX in AI apps

Generalists Can Also Dig Deep (towardsdatascience.com). Generalist Ida Silfverskiöld on AI agents, RAG, evals, and design choices in agentic systems

Verlog: A Multi-turn RL framework for LLM agents (blog.ml.cmu.edu). Verlog introduces multi-turn RL for long-horizon LLM agents with turn-level abstraction, fixed-turn batching, dual discounting GAE, and critic pre-training

Beyond the Chatbot: What Actually Works in Enterprise AI (thedataexchange.media). RAG systems evolution, evaluation as IP, embeddings, enterprise security, agent workflows, multi-modality, small models, and AI-enabled coding tools

🛠️ Applied LLMs: RAG, Data Pipelines, and AI in Science

Text analytics in Data Pipelines using AI (medium.com/@ed.bullen). Databricks AI Query workflows for ETL pipelines; using LLMs to classify, rate sentiment, and justify results on Amazon Reviews data

Single-cell analysis and infectious disease forecasting: Google's new AI scientist (blog.stephenturner.us). AI systems generate and test new methods for single-cell RNA-seq batch integration and COVID-19 forecasting, surpassing some benchmarks

Stumbling into AI: Part 3—RAG (rmoff.net). Explains Retrieval-Augmented Generation (RAG) using embeddings, vector stores (ChromaDB), Ollama, and Llama models with Kafka release notes as example

Benchmarking AI & ML on local CPU/GPUs: an end-to-end Python project (allaboutdata.substack.com). Benchmarking AI/ML on local CPU/GPU with Python: XGBoost, Ollama, CUDA, uv, Altair, Streamlit dashboard and Docker-free workflow

📚 Academic Research

Inpainting-Guided Policy Optimization for Diffusion Large Language Models (arxiv:cs). Inpainting-guided RL for diffusion LLMs improves exploration, using partial ground-truth reasoning to boost GRPO, with synthetic traces and entropy filtering

Can Understanding and Generation Truly Benefit Together -- or Just Coexist? (arxiv:cs). Unified multimodal learning: encoder–decoder paradigm with long-context captions, UAE framework, Unified-GRPO RL, and Unified-Bench benchmark

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning (arxiv:cs). AgentGym-RL trains LLM agents for multi-turn decision making using RL, ScalingInter-RL for exploration-exploitation balance across diverse environments

Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining (arxiv:cs). MuSe: efficient multipole-based attention for transformers via dual semantic clustering and dipole corrections

RewardDance: Reward Scaling in Visual Generation (arxiv:cs). RewardDance: scalable reward modeling for visual generation using yes-token probability, enabling large RMs and CoT integration

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!