🧠

Generative AI: 2nd September 2025

Published 2nd September 2025

📣 Headlines

• Anthropic settled a class-action copyright lawsuit with authors over Claude training data, potentially avoiding billions in damages and reshaping AI liability for training on copyrighted content.

• Security researchers warn that "vibe-hacking" has become a top AI threat, with hackers using Anthropic's AI to write malicious code for cyber-attacks and Grok providing dangerous instructions including assassination plots.

• AI agents remain "science fiction not yet ready for primetime" despite early deployment efforts, though enterprise platforms are showing agents are here to stay across customer support, cybersecurity, and workflow automation.

• Meta is burning billions chasing AI superintelligence while facing brain drain at its Superintelligence Labs and cracks forming in its partnership with Scale AI over data quality concerns.

• Medical AI systems can go "dangerously haywire" from simple typos in patient records, while US Attorneys General warn AI companies about child safety failures following reports of harmful chatbot interactions.

• Enterprise AI faces a 95% failure rate prompting new solutions, while Google strikes demand-response deals with utilities to manage AI's massive energy consumption at data centers.

• OpenAI plans a 1GW data center in India as part of its Stargate expansion, highlighting the global infrastructure race for AI computing power.

• More than 10 European startups became unicorns this year, including AI companies like Lovable and Parloa, signaling strong growth in European AI innovation.

🔧 Company Engineering Blogs

Moving ahead faster with fallbacks (booking.ai). Fallbacks in ranking service enable fast experimentation, reliability, and ML-induced innovation without outages

Breaking AI Testing Barriers: Dynamic Assertions and AI Automation Deliver 1000%+ Productivity Gains (engineering.salesforce.com). AI quality testing, dynamic semantic assertions, AI evaluation pipelines, N-minus-one validation, LiveKit audio virtualization, prompt engineering, Gemini LLM prompts, Agentforce/Prompt Builder/Data Cloud integration

Engineering stories behind the Medium Daily Digest Algorithm: Part 1 (medium.engineering). How Apple Mail Privacy Protection and filtering adjustments boosted digest quality and engagement through adjusted filtering rules and A/B testing

How Google’s AI can help transform health professions education (research.google). Generative AI for medical education, LearnLM and Gemini for Learning, UX co-design, LearnLM 2.5 Pro, qualitative/quantitative studies, AI tutor for clinical reasoning

Simplifying Large-Scale LLM Processing across Instacart with Maple (tech.instacart.com). Maple: Instacart’s batch LLM processing service for scalable, cost-efficient, auditable prompts across catalogs, fulfillment, and search

📈 Industry Trends, Evaluation & Development

LLM System Design and Model Selection (oreilly.com). Choosing the right LLM amid rising costs, open vs closed models, inference scaling, and practical design steps

Model inference, model products, and AI applications (frontierai.substack.com). OpenAI GPT-5 adoption, inference vs. product, open-weight models, and垂 OpenAI’s UX, reasoning models, and decentralized inference implications

Re:BEAM #4 - A New Direction (notyourlanguage.com). Re:BEAM #4 explores Python integration with BEAM-like ergonomics, Elixir, Pythonx, NIFs, actor model, and inter-process message passing

LLM Validation Test: Output Similarity (barnesanalytics.com). LLM validation via 15-generation embeddings, cosine similarity, Jaccard tokens, prompts vs. unrelated prompts, CG plotting, ECDF, and statistical comparisons

Import AI 427: ByteDance’s scaling software; vending machine safety; testing for emotional attachment with Intima (jack-clark.net). ByteDance HeteroScale for LLMs over 10,000 GPUs; vending machine safety tests; INTIMA companionship benchmark; AI alignment via meditation-inspired approaches

I was wrong about tidymodels and LLMs (simonpcouch.com). Databot and Predictive: tidymodels usage, run_r_code, run_experiment, evaluative findings, and model performance across Claude Sonnet 4 and Gemini Pro 2.5

🤖 AI Agents & Enterprise Applications

Thinking About Thinking (funcall.blogspot.com). Thinking about thinking: LLM ‘thinking’ mode, Cursor token billing, agent mode prompts, newer models vs. older, enhanced outputs

How executives select GenAI vendors (shekhargulati.com). Memory-centric GenAI product design; executive vendor selection; memory architectures; MSFT talks; LLMs; enterprise GenAI products

Roadmap: Developer Tooling for Software 3.0 (nextbigteng.substack.com). From Hello, World to Hello, AI: AI-native development, memory management, MCP, agent experience, and prompts-as-programs shaping Software 3.0

Building Agents for Small Language Models: A Deep Dive into Lightweight AI (msuiche.com). Lightweight open-source AI agents for 270M-32B parameter LLMs; externalized logic, minimal context, robust safety, batch optimization, multi-layer safety, micro-agent prompting, tool-calling with structured outputs, UTF-8 streaming safety, and model-specific config for Gemma, Qwen, TinyLlama

⚡ Deployment, Infrastructure & Optimization

Deploying DeepSeek on 96 H100 GPUs (lmsys.org). Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs

P (tonybaloney.github.io). Benchmarking LLMs for latency, throughput, and prompts using llm CLI, llm-profile, graph outputs, and embedding benchmarks

The Parallelism Mesh Zoo (blog.ezyang.com). Overview of device mesh concepts and parallelism strategies: DP, FSDP, HSDP, TP, SP, Ulysses, CP, PP, EP across multi-dimensional device meshes

Draft - Efficient RL Training - Optimizing Weight Sync in slime (hebiao064.github.io). Weight synchronization in slime: CUDA IPC, asynchronous tensor gathering, tensor bucketing, SGLang server calls, and 120s→7s optimizations for RL training with Megatron, PPO/GRPO

quantisation basics (aarnphm.xyz). Quantization, uniform/non-uniform, MSQE; kv cache pruning; KV quantization (KVQuant, SKVQ, KIVI, AdaKV, PyramidKV); multi-head attention, per-token KV; RoPE conflicts; DeepSeek KV compression; two-batch overlap (TBO); RMDA/NIXL; KV-aware routing; prefill/decode timing;

🔧 RAG Systems & Context Engineering

Context Engineering Series: Building Better Agentic RAG Systems (jxnl.co). Context engineering for agentic RAG systems: tool portfolios, slash commands, AGENT.md patterns, persistent agent loops with SearchTool, vector_search, and multi-turn tool.use

23 RAG Pitfalls and How to Fix Them (nb-data.com). RAG pitfalls: data chunking, outdated knowledge, embedding choices, metadata use, retrieval mix, prompt design, multi-part queries, evaluation, latency, and attribution safeguards

Why Embedding Models Cannot Scale to All Retrieval Tasks, A Comprehensive Analysis of LLM-based Reranking Methods, and More! (recsys.substack.com). Theoretical limits of embedding-based retrieval, random truncation resilience, ultra-long attention, conditional retrieval, and LLM-based reranking

Agentic RAG and Context Engineering for Agents (vincirufus.com). Agentic RAG, retrieval agent networks, dynamic context management, hierarchical context, adaptive summarization, multi-modal integration, feedback-driven refinement, use cases in research, software development, and customer support

🧠 LLM Architecture & Fundamentals

From Multi-Head to Latent Attention: The Evolution of Attention Mechanisms (vinithavn.medium.com). Overview of attention mechanisms: MHA, MQA, GQA, MHLA, and their memory, speed trade-offs for LLMs

A Brief History of GPT Through Papers (towardsdatascience.com). Transformer origins, attention mechanism, GPT-1 to GPT-3, RLHF, tool use, and ChatGPT milestones in translation, scaling, and few-shot learning

Positional Embeddings in Transformers: A Math Guide to RoPE & ALiBi (towardsdatascience.com). Mathematical deep dive into RoPE, ALiBi, APE: sinusoidal embeddings, rotary position embeddings, PyTorch code, TinyStories experiments

What AI chatbots are actually doing under the hood (gilesthomas.com). Plain-English overview of LLMs, GPT/Transformers, tokenizers, tokens, logits, next-token prediction, transcripts, pre-training, and chat-based prompting mechanics

Writing an LLM from scratch, part 19 -- wrapping up Chapter 4 (gilesthomas.com). Wrapping up Chapter 4: coding a GPT-like LLM from scratch, with gpt.py, shortcut connections, and a roadmap to training

🎨 Diffusion Models & Multimodal AI

Paper Review: Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (andlukyane.com). Pref-GRPO uses pairwise preferences with UniGenBench/UniGenNBench, Flow Matching GRPO, and PRref-GRPO to stabilize T2I RL and curb reward hacking

Reading list: diffusion models (konradb.substack.com). Reading list on diffusion models: tutorials, papers, and implementations—DDPM, VAEs, score-based models, Stable Diffusion, Diffusers, InST, cross-attention, and diffusion courses

Diffusion models: the intro (konradb.substack.com). Diffusion models explained: forward/reverse noising, DDPM, latent space, U-Net, attention, CLIP/text encoder, cross-attention, schedulers, image/video/3D/audio diffusion, inpainting/outpainting, editing via inversion, efficiency & future directions

📚 Academic Research

rStar2-Agent: Agentic Reasoning Technical Report (arxiv:cs). Microsoft Research's 14B parameter model achieves frontier-level math reasoning performance using agentic reinforcement learning, demonstrating advanced cognitive behaviors like code reflection and autonomous problem-solving. This represents a major breakthrough in combining RL with reasoning capabilities at relatively small scale

Mixture of Contexts for Long Video Generation (arxiv:cs). Stanford and ByteDance introduce a learnable sparse attention mechanism that enables practical long-context video generation by treating it as an information retrieval problem. This breakthrough addresses the fundamental memory and computational challenges that have limited video generation to short clips

One More Glance with Sharp Eyes: Rethinking Lightweight Captioning as a Practical Visual Specialist (arxiv:cs). CMU and KAIST researchers demonstrate that a 125M parameter model (56x smaller than LLaMA-7B) can achieve performance comparable to large multimodal models on captioning tasks. This work provides a pathway for deploying capable vision-language models on resource-constrained devices

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning (arxiv:cs). Researchers from Munich, Cambridge, and HKU develop an RL framework enabling LLMs to actively manage external memory through specialized agents that learn when to store, update, or retrieve information. This addresses LLMs' fundamental stateless limitation and enables long-horizon reasoning with minimal supervision

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards (arxiv:cs). HKU and Huawei introduce a rigorous benchmark exposing critical limitations in current multimodal browsing agents, where even the strongest model (o3) achieves only 36% accuracy. This benchmark highlights the gap between current capabilities and practical web navigation requirements

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning (arxiv:cs). Tencent's unified GraphRAG framework achieves up to 90.71% token cost savings while improving accuracy by 16.62% over state-of-the-art baselines. This addresses domain transfer challenges and knowledge organization in retrieval-augmented generation systems

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning (arxiv:cs). Tencent develops an auto-thinking multimodal LLM that adaptively decides when complex reasoning is needed based on problem difficulty, achieving comparable performance to larger models with lower computational cost. This addresses the inefficiency of always using step-by-step thinking for simple problems

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!