🧠

Generative AI: 21st October 2025

Published 21st October 2025

📣 Headlines

• AI compute build-out accelerated as a Microsoft/Nvidia-backed consortium agreed to acquire Aligned Data Centers for $40B, the UK’s Nscale will deploy ~200,000 Nvidia GB300 GPUs to Microsoft via Dell Nscale deal, and Google pledged $15B for a 1‑GW AI hub and subsea-linked data center in India Google India.

• IBM deepened its enterprise AI stack, partnering with Groq to bring LPU-accelerated inference into watsonx IBM–Groq while showcasing an Anthropic tie-up, Project Bob, and the Infragraph at TechXchange 2025 IBM TechXchange.

• Data plumbing for AI advanced as Informatica added MCP Model Context Protocol, no-code connectors, and MDM on OCI to streamline model context and governance Informatica–Oracle, and NetApp launched an AI Data Engine with a disaggregated AFX appliance for resilient, simplified AI data flows NetApp AI Data Engine.

• Web infrastructure pushed back on AI scraping: Cloudflare introduced a Content Signals Policy updating robots.txt controls to influence Google’s AI Overviews and RAG consumption across millions of sites Cloudflare vs AI Overviews.

• Anthropic released Claude Haiku 4.5 as a cheaper, faster variant alongside a Deloitte enterprise deal, with the company projecting tripled revenues by 2026 Anthropic Haiku 4.5.

• AI funding stayed hot but divisive: active investors drove an AI-centric quarter with mega-rounds like Anthropic’s $13B Series F Active investors, even as VCs debated whether current valuations reflect an AI bubble.

• Copyright and provenance stayed in the spotlight as Vermillio claimed to trace neural fingerprints to quantify copyrighted image usage in AI outputs Vermillio analysis, while tests showed it’s still easy to generate protected characters in ChatGPT and raised rights concerns around video models Copyright challenges.

• An AWS outage disrupted major apps and reignited questions about cloud redundancy, multi-region design, and resilience practices AWS outage.

🔧 Company Engineering Blogs

How a Gemma model helped discover a new potential cancer therapy pathway (deepmind.google). C2S-Scale 27B uses Gemma models for single-cell language understanding to identify a CK2 inhibitor–interferon synergy boosting antigen presentation

How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware (engineering.fb.com). Meta uses AI, NLP, LLMs (Llama 3.1) and PCFs to estimate IT hardware Scope 3 emissions, building a component-level taxonomy for data centers

Building Data Cloud’s New Unstructured Data Governance: Automated PII Detection at Enterprise Scale (engineering.salesforce.com). Automated PII detection across hundreds of gigabytes of unstructured data using Spark pipelines, Microsoft Presidio integration, and policy-driven masking

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface.co). Google Cloud C4 with Intel Xeon Granite Rapids improves GPT OSS MoE inference throughput and 1.7x TCO savings over C3 via Intel-Hugging Face optimizations

Using AI to identify genetic variants in tumors with DeepSomatic (research.google). DeepSomatic uses CNNs to identify somatic variants in tumor genomes across Illumina, PacBio, and ONT data with CASTLE training data

🌐 Open Models & Community

The State of Open Models (interconnects.ai). Reflections on a year in open models, covering DeepSeek, Llama’s fade, Qwen, GPT-OSS, and next steps

AI (jankythoughts.com). Discusses AI, ML, LLMs, AGI terminology, hallucinations, meta-model concepts, and investment angles in NVidia and LLM startups

The Compounding Stack ↗️ (aspiringforintelligence.substack.com). Micro improvements in data, model logic, and UX form a defensible, compounding stack above foundation models

Why Large Language Models Won’t Replace Engineers Anytime Soon (fastcode.io). LLMs mimic text patterns, not real engineering causality; RL, RLHF limits, gradient descent, and human judgment protect engineering work

We are in the "gentleman scientist" era of AI research (seangoedecke.com). Gentleman scientist era of AI: amateurs vs professionals, GRPO, RL baselines, LLM tools, and informal research

🤖 Agents & Tooling

An agent-coded search reranker (softwaredoug.com). Agent-guided code generation to build a generalized search reranker using search_esci, guardrails, and iterative patching for improved NDCG

Building an AI Agent for Order Management Systems (OMS) in Finance – A Step-by-Step Guide (blog.devgenius.io). Step-by-step guide to building an AI agent for OMS with FIX parsing, structured JSON IO, ReAct/LangChain, memory (RAG), multi-agent design, and UI dashboards

How to Train your Chatbot - Chapter Zero (blog.apiad.net). Building an autonomous LLM agent from scratch using ARGO, BeaverDB, and Streamlit, step by step

How RecSys & LLMs Will Converge: Architecture of Hybrid RecoAgents (medium.com/criteo-engineering). Hybrid Reco Agents combine scalable RecSys retrieval with LLM reasoning and explanations for trustworthy recommendations

From Monolithic Prompts to LangChain Agents: A Practical Migration Guide (pedroalonso.net). Migrates from a single monolithic prompt to a LangChain agent with tools, state, and human-in-the-loop patterns

📚 Retrieval & RAG

Production RAG: what I learned from processing 5M+ documents (blog.abdellatif.io). Open-source RAG stack lessons from processing 5M+ docs with Langchain, LlamaIndex, Usul AI, and a legal enterprise dataset

How to Evaluate Retrieval Quality in RAG Pipelines: Precision@k, Recall@k, and F1@k (towardsdatascience.com). Precision@k, Recall@k, and F1@k for evaluating retrieval quality in RAG pipelines using FAISS, OpenAI embeddings, CrossEncoder reranking, and Ground Truth chunks

“RAG (Retrieval-Augmented Generation) for Not-Quite-Dummies” on the Pure AI Web Site (jamesmccaffreyblog.com). Overview of RAG with vector stores, embeddings, and LLM integration, plus a demo and security notes

Go-Specific Code Assistant (blog.arcbjorn.com). Go-focused fine-tuning of Qwen2.5-Coder-7B-Instruct with Go code curation, LoRA, compiler feedback, and RAG deployment

⚡ Inference & Training

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 (simonwillison.net). EXO 1.0: NVIDIA DGX Spark vs M3 Ultra Mac Studio for LLM prefill and decode; 4x faster inference on Llama-3.1 8B via streaming KV cache over 10Gb Ethernet

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system (tomshardware.com). Alibaba Cloud’s Aegaeon pooling cuts Nvidia GPU use by 82%, enabling up to 9x output with 213 GPUs for multiple LLMs

I blew through 24 million tokens in a day (blog.kronis.dev). Explores using Cerebras for massive token processing, AI tool cost, rate limits, and implications on coding workflows and energy use

How to scale RL (interconnects.ai). RL scaling laws for LLMs: predicting final performance from early RL runs, with ScaleRL, TIS, CISPO, GSPO, PipeLine RL, and continuous batching

Why do the AI/LLM folks hate people that run Macs so much? (markjgsmith.com). GPU-accelerated local LLMs on Mac, PyTorch vs TensorFlow, Vulkan backends, and containerized AI workflows

Writing an LLM from scratch, part 22 -- finally training our LLM! (gilesthomas.com). Training an LLM from scratch, comparing GPT-2 weights, using AdamW, temperature and top-k sampling, and cost considerations

supplement to 0.430 (aarnphm.xyz). DeepSeek MLA: multi-head latent attention, KV compression, and on-device serving with vLLM and RoPE-enhanced queries

modded-nanogpt medium world record: Re-using intermediate activations in the output latents (snimu.github.io). Modded-nanogpt medium record reuses layer-11 activations in output latents with learned weights, exploring backout hypothesis and multi-layer skip experiments

🧪 Evaluation & Debugging

Tracking Down Mysterious ML Training Stalls (medium.com/@Pinterest_Engineering). Pinterest's ML training stalls traced to torch.compile interactions and a Ray monitoring task, resolved by removing psutils.memory_full_info

Why AI Systems Can’t Catch Their Own Mistakes – And What to Do About It (novaspivack.com). Conformal prediction, thermometer calibration, multi-agent evaluation, CRITIC tool-critique, and human-in-the-loop for reliable AI deployment

Evaluating Long Context (Reasoning) Ability (nrehiew.github.io). LongCodeEdit and PSRBench benchmarks critique long-context reasoning; introduces LongCodeEdit-Hard for code repair tasks

Recursive Language Models (alexzhang13.github.io). Recursive Language Models (RLMs) decompose long contexts via a Python REPL environment, enabling recursive LM calls to process unbounded input and output lengths

🧠 Research & Architectures

BERT Is Just a Single Text Diffusion Step (nathan.rs). RoBERTa diffusion: finetuning RoBERTa for text generation via discrete diffusion with variable masking, diffusion_collator, and iterative denoising steps

The case for the return of fine-tuning (welovesota.com). Fine-tuning resurges with LoRA, Tinker, PEFT, and open-weight ecosystems enabling modular, controlled AI with personal hardware touches

Faísca: The modern LLM stack in a single script (duarteocarmo.com). Faísca: a 1000-line uv-runner implementing pre-training, SFT, and RL for a 13M-parameter Portuguese headlines LLM

O(N) the Money: Scaling Vulnerability Research with LLMs (noperator.dev). Scaling vulnerability research with LLMs via Raink and Slice for target selection using listwise ranking and convergence

Exploring the Potential of Large Language Models in Generating Saturated DAGs for Causal Inference (lucymcgowan.com). LLMs explored for generating saturated DAGs to map all causal pathways and identify backdoor paths

Three AI customisation concepts (anna.kiwi). Three AI customization concepts explained via embeddings, RAG vs fine-tuning, and LoRA with developer and non-technical metaphors

📚 Academic Research

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization (arxiv:cs). Reveals a 'preplan-and-anchor' attention rhythm in LLMs and defines metrics to identify key tokens. Proposes targeted RL credit-assignment to boost reasoning performance and interpretability

Glyph: Scaling Context Windows via Visual-Text Compression (arxiv:cs). Glyph renders long text as images processed by VLMs to compress contexts ~3–4× and enable extreme, million-token workloads. Delivers large speedups in prefill/decoding and document understanding

xLLM Technical Report (arxiv:cs). xLLM is an enterprise-grade inference framework optimizing scheduling, KV cache, and accelerator utilization to yield ~1.7–2.2× throughput gains. Practical for large-scale LLM/MLLM production deployment

Vision-Centric Activation and Coordination for Multimodal Large Language Models (arxiv:cs). VaCo activates and coordinates vision foundation models via Modular Task Queries and Visual Alignment Layers, aligning task-aware visual signals with MLLM training for stronger visual comprehension

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval (arxiv:cs). OmniVinci introduces OmniAlignNet, temporal embedding innovations, and a 24M-conversation data pipeline to build an omni-modal LLM that outperforms Qwen2.5-Omni using far fewer tokens

👋 Before you go

Blaze newsletters will soon be moving to substack as the main email delivery service. This is primarily to make managing subscriptions, sending email and archived newsletters more streamlined - there will be no change to your newsletters, they will continue to be completely free and you will be able subscribe and unsubscribe just as easily as before.

Blaze's sister site https://blognerd.app, a search engine for blogs and posts, has had a major makeover, and is a good place to search for smart, independent writing.

Finally, if you get value from your newsletter, please consider supporting me by joining the patreon page at patreon.com/blazeemail. Becoming a patron helps me to cover my costs and to keep blaze going so everyone can enjoy the newsletters for free.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!