🧠

Generative AI: 14th October 2025

Published 14th October 2025

📣 Headlines

• Anthropic unveiled Petri, an autonomous-agent safety tool that audits 14 leading LLMs for behaviors like deception and power-seeking, providing reproducible evaluations and risk benchmarks.

• Samsung introduced a 7M-parameter Tiny Recursive Model that leverages recursive reasoning and adaptive halting to outperform much larger models on puzzles like Sudoku, Maze-Hard, and ARC-AGI.

• Google committed $15B to an AI infrastructure hub in India, building a 1-GW data center in Visakhapatnam with subsea cables and TPU capacity to expand regional AI compute.

• California passed SB 243, imposing safety requirements on AI chatbots (e.g., suicide-prevention referrals, age checks) and penalties for profiting from deepfakes targeting minors.

• Informatica expanded Oracle Cloud integrations with MCP Model Context Protocol support, no-code connectors, MDM on OCI, and Dedicated Region Cloud@Customer to streamline enterprise AI data pipelines.

• NetApp launched an AI Data Engine and AFX disaggregated appliance to orchestrate high-throughput AI data pipelines with ransomware resilience, isolated recovery, and simplified data movement.

• The UK CMA gave Google “strategic market status” for search and ads, opening the door to mandated choice screens and controls over AI-generated responses in search results.

• Wiley launched an AI-native research platform that unifies access to peer-reviewed content and integrates assistants like Claude, Le Chat, and Perplexity on AWS for AI-powered literature workflows.

🔧 Company Engineering Blogs

Over Palantir (blog.palantir.com). Palantir uitleg over data ownership, privacy-by-design, governance, and ethics in AI, with ICE contract context and European data sovereignty

Introducing the Gemini 2.5 Computer Use model (deepmind.google). Gemini 2.5 Computer Use model enables UI-interacting agents via Gemini API with low latency for web and mobile tasks

From Single-Node to Multi-GPU Clusters: How Discord Made Distributed Compute Easy for ML Engineers (discord.com). Discord details building a Ray-based ML platform with CLI, Dagster + KubeRay orchestration, and X-Ray observability for multi-GPU training

Engineering Real-Time Multimodal AI Pipelines: Scaling File Processing to 50M Daily Uploads (engineering.salesforce.com). Real-time multimodal AI pipelines for 50M daily uploads: file processing, validation, base64 grounding, and cross-platform prompts

How to build reliable AI workflows with agentic primitives and context engineering (github.blog). Three-layer agentic framework using Markdown prompts, agentic primitives, and context engineering to build reliable AI workflows with Copilot CLI and APM

🗺️ Ecosystem & Strategy

The Infrastructure for Production AI (thedataexchange.media). Zhen Lu discusses AI-first clouds, production use cases, GPU reliability, and agent-driven software at The Data Exchange

AI Apps -> Agent Labs (akashbajwa.co). Agent Labs vs Model Labs: product-first AI apps, RL incentives, data moat, and developers' shift toward vertical integration

dead framework theory (aifoc.us). Explores how React dominates as platform, LLMs and training data create a dead framework effect, and implications for new frameworks, tools, and browser features

🧭 Agentic & Applied AI

CS Researchers at SOSP 2025 (cs.columbia.edu). DAPLab at Columbia presents SOSP 2025 workshops on agentic infrastructure, self-tuning kernels, and safe LLM agent operation

Exploring the Potential of Large Language Models in Generating Saturated DAGs for Causal Inference (lucymcgowan.com). LLMs explored for generating saturated DAGs to map all causal pathways and identify backdoor paths

I'm Writing a Book on Production-Grade Agentic AI (And You Can Read It Now) (aroussi.com). Explores production-grade agentic AI, memory management, orchestration, observability, and deployment patterns with LeanPub chapter-by-chapter release

Book Review: Time Series Forecasting using Foundation Models (sujitpal.blogspot.com). Book review surveys seven Foundation Models for time series forecasting, with zero-shot, fine-tuning, probabilistic forecasts, anomaly detection, and a capstone project

🧪 Open-source LLM Stacks

nanochat (simonwillison.net). nanochat offers a full ChatGPT-style LLM pipeline: training, inference, and a web UI in a small codebase using PyTorch, with CUDA-focused and CPU fallbacks

Running Llama 3.1 8B Locally (LangChain and SQLite) (confessionsofadataguy.com). Local Llama 3.1 8B with Ollama, LangChain, SQLite; Python uv toolchain; RAG indexing with FAISS; terminal chatbot on a laptop

R port of llama2.c (thierrymoudiki.github.io). R port of llama2.c with Shiny app, installation steps, and API access for educational use

Cross Talk (joecooper.me). Markov text generation, DeBERTa-based reranking, and OCR-like text sorting for multiturnChat on a 3090, OpenSubtitles data, and bespoke quality-control models

The Best Choice for AI Inference: vLLM (terrytangyuan.github.io). vLLM enables open-source, memory-efficient LLM inference with KV-Cache, KV-Cache, PagedAttention, and multi-parallelism; llm-d orchestrates distribution on OpenShift AI

modded-nanogpt world record: Decoupling embedding size from model dimension (snimu.github.io). Modded-NanoGPT uses multiple input embeddings with learned layer-wise weights to decouple embedding size from model dimension

📏 Evaluation & Benchmarks

Who watches the watchers? LLM on LLM evaluations (stackoverflow.blog). LLMs judge LLM outputs at scale using golden datasets, teacher models, and ProLLM; StackOverflow data informs evaluation benchmarks

Importance of offline evaluation to guide model choice (tech.olx.com). OLX compares open embedding models with internal Item2Vec using MTEB benchmarks, fine-tuning, and offline evaluation for multilingual recall

Inspect AI (alexdong.com). Inspect AI: exploring a Petri Alignment plugin, Inspect AI scaffolding, and extending evaluation workflows with typed, well-documented code

Comparison: Qwen3:30b vs GPT-OSS:20b (glukhov.org). Tech benchmark comparison of Qwen3:30b, Qwen3:30b-instruct, Qwen3:30b-thinking vs GPT-OSS:20b across speed, context windows, and token benchmarks

Debugging DSPy token usage and prompts (danielcorin.com). Debugging DSPy token usage, prompts and LM configurations across Gemini, GPT-5, and OpenAI APIs

🔎 RAG Optimization

Meta Superintelligence's surprising first paper (paddedinputs.substack.com). MSI's REFRAG enables 30x faster TTFT in RAG by using chunk embeddings and a lightweight RL policy to expand select chunks

Why using a reranker? (zansara.dev). RAG with bi-encoders and cross-encoders, reranking strategies, distillation, late interaction (ColBERT), listwise reranking, caching, and hybrid architectures

Why did Meta’s superintelligence team publish an obscure paper? (tornikeo.com). Meta's MSI publishes REFRAG, a fast retrieval-augmented generation method that speeds RAG 30x without accuracy loss for business-scale document search

What Problem Is Traditional RAG Solving? (gojiberries.io). Traditional RAG uses pre-chunked text and embedding‑based search for fast, small‑evidence reasoning on uniform, time‑neutral corpora

🧠 Long-Context & Attention

From 2K to 2M+ Tokens: The Long-Context Frontiers of GenAI (medium.datadriveninvestor.com). Long-context LLMs, Lost-in-the-Middle, RAG, position engineering, prompt compression, LIFT, and agentic RAG for reliable reasoning

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750 (twimlai.com). Long-context transformers with Jacob Buckman; windowed attention, grouped query attention, latent space attention, Power Retention, and Vidrial/PowerCoder open-source projects

KV Cache Optimization via Multi-Head Latent Attention (pyimagesearch.com). KV Cache optimization with Multi-Head Latent Attention (MLA) reduces KV cache memory in transformers for long-context inference

🧬 Model Internals & Training

Replacing RL w/ Parameter-based Evolutionary Strategies (lesswrong.com). Parameter-based evolutionary strategies (ES) scale to billion-parameter models for fine-tuning LLMs, using distributional weight perturbations and reward normalization

LLM Poisoning [1/3] - Reading the Transformer's Thoughts (synacktiv.com). Explores Transformer internals, FFN key–value memory, trigger detection in pre-down MLP activations, and causal tracing for hidden knowledge in LLMs

Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com). Explores Karpathy's 2015 RNN post, contrasts vanilla RNNs with LLMs, discusses byte-level inputs, training via truncated BPTT, and PyTorch vs Lua Torch implementations

📚 Academic Research

Scaling Language-Centric Omnimodal Representation Learning (arxiv:cs). Language-centric omnimodal embedding with LCO-Embedding; cross-modal alignment via generative pretraining and GRSL scaling revealed

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models (arxiv:cs). Analyzes ViT attention sinks to reveal high-norm visual tokens guiding LLM reasoning in LVLMs and proposes training-free and training-based utilization methods

Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging (arxiv:cs). Tiny-R1V: a 3B lightweight multimodal model using LIPO reinforcement learning and AMM model merging for unified reasoning across tasks

Spotlight on Token Perception for Multimodal Reinforcement Learning (arxiv:cs). Visually-Perceptive Policy Optimization (VPPO) reweights and focuses updates on tokens with high visual dependency for multimodal RLVR in LVLMs

ASPO: Asymmetric Importance Sampling Policy Optimization (arxiv:cs). ASPO corrects importance sampling in OSRL for LLMs by flipping IS ratios of positive-advantage tokens and introducing soft dual-clipping

👋 Before you go

Blaze newsletters will soon be moving to substack as the main email delivery service. This is primarily to make managing subscriptions, sending email and archived newsletters more streamlined - there will be no change to your newsletters, they will continue to be completely free and you will be able subscribe and unsubscribe just as easily as before.

Blaze's sister site https://blognerd.app, a search engine for blogs and posts, has had a major makeover, and is a good place to search for smart, independent writing.

Finally, if you get value from your newsletter, please consider supporting me by joining the patreon page at patreon.com/blazeemail. Becoming a patron helps me to cover my costs and to keep blaze going so everyone can enjoy the newsletters for free.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!