🧠

Generative AI: 9th September 2025

Published 9th September 2025

📣 Headlines

• The DuckDuckGo subscription now offers GPT‑4o/GPT‑5, Claude Sonnet 4 and Llama Maverick access for $9.99/month, giving users direct access to multiple cutting‑edge LLMs.

• OpenAI acquired experimentation platform Statsig for $1.1B and named its CEO CTO of Applications, signaling tighter A/B testing and real‑time decisioning integration across OpenAI products.

• Companies are advancing AI doppelgängers and video avatars to scale personal knowledge and meetings while Synthesia's Express‑2 avatars mimic real speakers and could enable real‑time interactivity, pushing lifelike agent use in sales, coaching and media.

• New research and reporting highlight risks: chatbots and AI companions can be manipulative and raise fairness concerns, experts warn of mental‑health harms linked to AI use and legal scrutiny follows with calls for better parental controls and concerns about therapists secretly using ChatGPT; safety researchers also sounded alarms about broader systemic risks (https://www.theguardian.com/technology/2025/sep/08/chatbots-mental-health-warning-super-intelligent-ai-nate-soares).

• Security analysts warn that AI‑driven development could make subtle backdoors in open‑source projects harder to detect, prompting calls for stronger maintainer support and supply‑chain defenses.

• Firefox Nightly added Microsoft Copilot to its sidebar, bringing voice, image and document analysis modes into the browser for users and developers to test.

• VCs are pouring hundreds of millions into AI‑powered customer service startups, accelerating automation of support workflows and the deployment of agentic AI in CX.

• Researchers are weighing the pros and cons of synthetic data for privacy, bias mitigation and model testing, noting toolchains like Synthetic Data Vault and tradeoffs around validation and realism.

🔧 Company Engineering Blogs

Using AI to perceive the universe in greater depth (deepmind.google). Deep Loop Shaping uses reinforcement learning in frequency-domain rewards to reduce control noise in LIGO’s mirror systems, improving gravitational-wave measurement

A New Ranking Framework for Better Notification Quality on Instagram (engineering.fb.com). Diversity-aware notification ranking using multiplicative demotion, MM R-based similarity across content, author, type, and product surface, with adjustable weights and potential for LLM integration

Building Sustainable Enterprise AI Adoption: Cultural Strategies That Achieved 95% Developer Engagement (engineering.salesforce.com). Salesforce shares how to scale AI adoption beyond code generation, tackling monolithic codebases, modular loading, and enterprise-wide cultural change

Spec-driven development with AI: Get started with a new open source toolkit (github.blog). Spec Kit enables spec-driven development with GitHub Copilot, Claude Code, and Gemini CLI to turn specs into executable artifacts

Welcome EmbeddingGemma, Google's new efficient embedding model (huggingface.co). EmbeddingGemma: Google's 308M multilingual on-device text embeddings, MMTEB/MMTEB v2 benchmarks, MRl truncation, 2K context, on‑device RAG, Sentence Transformers, LangChain, LlamaIndex, Haystack, txtai, TEI, ONNX, FAISS

🎨 Applied AI: creative, education, and genomics

When Machines that Simulate Intelligence Seemed Like a Summer Project (tensorlabbet.com). Explores Dartmouth 1956 proposal, seven themes, and how early AI ideas compare with modern LLMs, diffusion, and self-improvement concepts

Stumbling into AI: Part 2—Models (rmoff.net). Overview of LLMs, tokens, context windows, weights, clients, tools (MCP), and routers like OpenRouter and Raycast in the AI ecosystem

Conversations with Large Language Models: Battle Decks (aaronland.info). Generative systems in museums: revisiting collections, storytelling, vibes, and playful infrastructure using artifacts, Muppets, and lava-lamp metaphors

DNA Foundation Models and Their Applications (aditharun.com). DNA Foundation Models generate DNA sequences and predict genomic properties; Evo2, AlphaGenome, Caduceus; tissue-specific promoters; in silico mutagenesis; VUS resolution; biosecurity; benchmarking; data quality; RC-equivariance

From Static Textbooks to Living Systems: How I Tried to Turn My Brain into AI Agents (blog.crackinglanguage.com). Living systems for learning: RAG, edge tools, BYOK, Thai syllable analysis, and a dynamic, personalized teaching platform

⚙️ Infra, LLMOps, and hardware trends

A Technical History of Generative Media — with Gorkem and Batuhan from Fal.ai (latent.space). Fal.ai's pivot from a Python cloud runtime to optimized diffusion inference, CUDA kernels, and multi-model hosting for 2M developers and 350 models

AI Operations Under the Hood: Challenges and Best Practices (towardsdatascience.com). A practical framework for LLMOps and GenAI, focusing on data prep, RAG, evaluation, monitoring, and safety

Google’s Nano Banana is the start of a Massive AI Trend [Markets] (artificialintelligencemadesimple.substack.com). Nano Banana diffusion models,, four choke points, memory/packaging, HBM/CoWoS, p99 latency, ASICs, porting tax, CUDA moat, deterministic silicon, edge, video, supply chains

Build Production-Ready Agentic-RAG Applications From Scratch Course: What we are going to build (newsletter.theaiedge.io). Hands-on course building production-ready Agentic-RAG apps with LangGraph, FastAPI, React, Pinecone, Langsmith on GCP

📏 Evals, embeddings, and model quality

How big are our embeddings now and why? (newsletter.vickiboykis.com). Trends in embedding sizes from 300 to 1536+; BERT 768 baseline; GPT-3/2/CLIP; HuggingFace; OpenAI matryoshka; vector databases; MTEB benchmarks

llm-eval-simple a simple way to evaluate LLM for your use case (grigio.org). Evaluate OpenAI-compatible APIs with prompts and metrics across models like gemma-3-27b-it-qat-q4_0-q3_k_m, gpt-oss-20b-mxfp4, and Qwen3-4B-IQ4_NL

Gemini AI in Gmail is terrible (nelsonslog.wordpress.com). Gemini-in-Gmail shows limited email access, poor RAG retrieval, and disruptive AI UI in Gmail

In Defense of AI Evals, for Everyone (sh-reya.com). Defends AI evals as systematic, continuous quality measurements across posttraining and practical dogfooding, with examples in coding, document processing, and policing data

🧲 RAG engineering and retrieval systems

How Dropbox Built an AI Product Dash with RAG and AI Agents (blog.bytebytego.com). Dropbox Dash uses RAG and AI Agents to unify data across Gmail, Slack, Notion, Jira, and Dropbox with a custom interpreter for safe AI execution

How to Scale Your AI Search to Handle 10M Queries with 5 Powerful Techniques (towardsdatascience.com). Scaling AI search with RAG, contextual retrieval, BM25, router agents, and evaluations for 10M queries

Generate Dataframe Summaries With Python (fundor333.com). Generate dataframe summaries with Python, LangChain, Ollama, Mistral, Pandas, and custom context-driven reports for Cirrhosis patient data analysis

Chroma: RAG is Dead; Long Live Context Engineering (cto4.ai). Chroma shifts focus from RAG to context engineering for grounding AI with embeddings and metadata

The AI Architect's Guide to RAG Debugging: A 3-Step Process to Fix Hallucinations in Minutes, Not Days (mikulskibartosz.name). 3-step RAG debugging guide: retrieval cascade, hybrid search, reranking, prompt engineering, HyDE, RRF, BM25, bi-encoders, cross-encoders, and observability for LLMs

🧠 LLM internals: scaling, training, and architecture

The wall confronting large language models (arxiv.org). Analysis of barriers to scaling LLMs, alignment, safety, computation, data, and governance with practical mitigations

Understanding and Implementing Qwen3 From Scratch (sebastianraschka.com). Hands-on Qwen3 from scratch in PyTorch: architecture, components, and building blocks for open-weight models

Gemma 3 Explained (opencv.org). Gemma 3 introduces multimodal vision, 128k context, GQA, RoPE, local-global attention, and a decoder-only Transformer with post-training and API call capabilities

Online versus Offline RL for LLMs (cameronrwolfe.substack.com). Online vs offline RL for LLMs; analyzes PPO-based RLHF online training, offline DPO, SFT variants, rejection sampling, and semi-online approaches across Llama-2 and SafeRLHF data

The Physics of AI Hallucination: New Research Reveals the Tipping Point for Large Language Models (firstprinciples.org). Physicist Neil Johnson maps tipping point in LLMs, uses spin model, gap cooling, and attention head dynamics to predict hallucinations

📚 Academic Research

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey (arxiv:cs). Survey of Agentic RL for LLMs: planning, tool use, memory, reasoning, self-improvement, perception, POMDPs, benchmarks, open-source frameworks, and five hundred works

OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds (arxiv:cs). OmniActor: Layer-heterogeneity MoE, GUI and embodied data synergy, 2D GUI and 3D embodied worlds, generalist agent, cross-domain training

Symbolic Graphics Programming with Large Language Models (arxiv:cs). RL with verifiable rewards improves SVG generation for symbolic graphics programming using SVGs with SigLIP and DINO encoders

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization (arxiv:cs). Overview of aligning large vision-language models via Deep Reinforcement Learning and Direct Preference Optimization for human-aligned multimodal systems

KVCompose: Efficient Structured KV Cache Compression with Composite Tokens (arxiv:cs). KV cache compression for long-context LLMs using attention-guided composite tokens and layer-adaptive allocation

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!