Generative AI: 30th September 2025
📣 Headlines
• Google DeepMind unveiled VLA robotics stacks that fuse perception, language and action with web search and cross-robot knowledge transfer via Gemini Robotics 1.5 and ER 1.5, as investors tout a coming wave of Physical AI spanning autonomous agents and embodied systems.
• Google expanded its cheaper AI Plus plan to 40+ countries with Gemini 2.5 Pro and creative tools, and partners like Qualcomm say they're "incredibly excited" about desktop Android and a unified cross-device AI stack.
• Developers are already shipping on-device experiences using Apple's local Foundation Models on iOS 26, powering apps from image generation to personal finance without cloud calls.
• Opera launched its agentic Neon browser with repeatable Tasks, Neon Do, and code-aware prompts, with an early-access build that emphasizes on-device privacy and card-based workflows.
• OpenAI published a matrix of job-level capabilities, claiming ChatGPT can already perform parts of 44 occupations via its GDPval task list and evaluations.
• AI’s copyright and creative-use tensions sharpened as an Anthropic lyric-lawsuit settlement appeared to stall, spotlighting fair-use and damages math, while startups pushed deeper into Hollywood with generative tools and partnerships.
• AI for software assurance accelerated as Aikido acquired Allseek and Haicker to deliver sub-hour autonomous pen tests, and Greptile raised $25M to take on CodeRabbit and Graphite in AI code validation.
• AI’s climate footprint took center stage, with MIT experts detailing ways to cut operational and embodied carbon in data centers, and Climate Trace using AI to map PM2.5 emissions from 660M sources with city-scale plume visuals.
đź”§ Company Engineering Blogs
How Palantir’s Strategic Privacy Investments Enable Future Customer Success (blog​.palantir​.com). Strategic privacy tools like Checkpoints enable governance, e-signature compliance, and human-in-the-loop AI oversight across Foundry and AIP
Gemini Robotics 1.5 brings AI agents into the physical world (deepmind​.google). Gemini Robotics-ER 1.5 enables agentic, multi-step embodied reasoning for physical tasks with tools and safety controls
AssetGen: Generating 3D Worlds With AI (engineering​.fb​.com). Meta reveals AssetGen: a foundation model for AI-driven 3D assets and world generation for VR, with LLMs playing a key role
How AI Tools Cut Customer Escalation Time: From Days of Manual Work to Minutes (engineering​.salesforce​.com). AI-powered agents and automation cut manual data collection, enable pattern-based problem solving, and accelerate escalations for Customer Centric Engineering at Salesforce
GitHub Copilot gets smarter at finding your code: Inside our new embedding model (github​.blog). Copilot embeds a new code/documentation embedding model to improve code search in VS Code with faster retrieval and smaller memory footprint
🔍 Evaluation, safety, and privacy
Membership Privacy Risks in LLMs (brave​.com). CAMIAContext-Aware Membership Inference Attack analyzes token-level uncertainty to reveal LLM memorization risks
HELM Long Context (crfm​.stanford​.edu). HELM Long Context leaderboard evaluates modern LLMs on long-context tasks using curated benchmarks with hundreds of thousands of tokens
Comparative Analysis of Black Box Methods for Detecting Evaluation Awareness in LLMs (lesswrong​.com). Systematic comparison of black-box methods for evaluating evaluation awareness in LLMs; introduces two new methods and a taxonomy of cues
#521: Red Teaming LLMs and GenAI with PyRIT (talkpython​.fm). Red Teaming LLMs with PyRIT: attacks, defenses, indirect prompt injection, and end-to-end testing in Microsoft’s AI Red Team
đź§© RAG and agentic workflows
RAG Explained: Reranking for Better Answers (towardsdatascience​.com). RAG workflow: two-stage retrieval using cosine similarity and cross-encoder reranking to surface relevant War and Peace chunks for LLM answers
How To Improve Your LLM Question-Answering System (Increasing Levels Of Complexity) (eamag​.me). Steps from OpenAI API use to local RL-tuned Qwen 3, DSPy prompt evolution, and Unsloth fine-tuning for QA on CURE-Bench
Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3 (blog​.reachsumit​.com). Adaptive RAG: confidence-based, consistency-based, and internal-state methods for deciding when LLMs need external retrieval
How to Build and Optimize AI Research Agents (thedataexchange​.media). Deep research with multi-agent AI, prompt evolution, GEPA, RAG limits, and enterprise foundations and evaluation
A Paradigm Shift: Reasoning at Enteprise Scale (nuit-blanche​.blogspot​.com). Reasoning-first retrieval stack for enterprise-scale documents using late-interaction models, PyLate, ModernBERT, FastPlaid, and PyLate-rs in browser-ready RAG workflows
A Simple Example of RAG (Retrieval-Augmented Generation) Using the OpenAI Responses API (jamesmccaffreyblog​.com). A practical, code-driven walkthrough of RAG using OpenAI Responses API and a demo vector store with MERCURY data
Implementing a fully local AI coding agent is hard (svana​.name). Local embeddings with nomic-embed-text-v1.5, ONNX Runtime, and choices between Mistral, OpenAI, or self-hosted LLMs for FileChat's privacy-focused coding assistant
🚀 Inference, kernels, and GPUs
We reverse-engineered Flash Attention 4 (modal​.com). Reverse-engineering Flash Attention 4: asynchronous pipelines, warp specialization, and faster softmax on Blackwell GPUs
Meta’s Infrastructure Evolution and the Advent of AI (engineering​.fb​.com). Meta describes infrastructure evolution for AI, GPU clusters, MTIA silicon, 129k H100s, memory disaggregation, and open standards
Compiling Python to Run Anywhere (blog​.codingconfessions​.com). Guest post by Yusuf Olokoba and Abhinav exploring ahead-of-time Python compilation to C++, AST tracing, type propagation, AI-assisted codegen, and runtime performance optimization across ARM, Apple Silicon, and WebAssembly
Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput (lmsys​.org). DeepSeek on GB200 NVL72: FP8 attention, NVFP4 MoE, large-scale EP, prefill/decode disaggregation, 3.8x/4.8x speedups
SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention (lmsys​.org). SGLang adds Day 0 DeepSeek-V3.2 sparse attention support with DSAs, multi-TPU/AMD/NPU deployment options and 128K context efficiency
Serverless Inference with Together AI (debuggercafe​.com). Serverless inference for text, vision, and chat with Together AI using Gradio UI and Python scripts
đź§ Reasoning, RL, and methods
Why We Think (lilianweng​.github​.io). Thinking time, CoT, latent variables, RLHF, ReAct, external tools, PRM, MCTS, RL for reasoning, and test-time computation in LLMs
Game over for pure LLMs. Even Turing Award Winner Rich Sutton has gotten off the bus. (garymarcus​.substack​.com). Rich Sutton and other AI thinkers pivot from pure scaling to world models, neurosymbolic ideas, and mixed approaches
ShinkaEvolve: Evolving New Algorithms with LLMs, Orders of Magnitude More Efficiently (sakana​.ai). ShinkaEvolve uses LLMs for sample-efficient evolutionary programming across circle packing, AIME math, ALE-Bench, and MoE training design
REINFORCE: Easy Online RL for LLMs (cameronrwolfe​.substack​.com). Easy online RL for LLMs using REINFORCE and RLOO as simpler alternatives to PPO for online policy optimization in RLHF/RLVR contexts
supplement for supplement for 0.412 (aarnphm​.xyz). Decoder-only transformers dissected: embeddings, attention, residuals, gating, MOE, and Pareto-sized design considerations
Thoughts on Richard Sutton’s interview on Dwarkesh Podcast (alexdong​.com). Reflections on Sutton's RL perspectives, LLMs as world models, ground truth, knowledge transfer, and leveraging priors for efficient learning
Is OpenAI's Reinforcement Fine-Tuning (RFT) Worth It? (tensorzero​.com). RFT versus SFT: cost, performance, and real-world tasks for data extraction, agentic coding, and customer service
📚 Academic Research
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation (arxiv:cs). EAGLE: a lightweight black-box framework for attributing autoregressive token generation in multimodal LLMs to perceptual regions and language priors
Explaining multimodal LLMs via intra-modal token interactions (arxiv:cs). Enhancing multimodal LLM interpretability with Multi-Scale Explanation Aggregation for vision and Activation Ranking Correlation for text
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration (arxiv:cs). UniMIC: token-based multimodal interactive coding for edge-cloud AI collaboration with low-bitrate transmission
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models (arxiv:cs). Uni-X uses a two-end-separated architecture with middle-shared layers to mitigate modality conflicts in unified multimodal models
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation (arxiv:cs). Elastic-MoT unified diffusion model for multimodal understanding, image editing, and high-resolution text-to-image generation
đź‘‹ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!