🧠

Generative AI: 23rd September 2025

Published 23rd September 2025

📣 Headlines

• Meta introduced Ray-Ban Display smart glasses with an in-lens screen and AI assistant plus a Neural Band wrist controller, signaling a more capable HUD-class wearable platform.

• Zoom unveiled AI Companion 3.0 with agentic capabilities spanning meetings, CX, marketing, sales, and frontline workflows.

• In healthcare, Akido Labs' ScopeAI runs appointments and drafts diagnoses under physician review, while more clinicians turn to ChatGPT for second opinions, raising benefit and privacy questions.

• As attackers weaponize AI, CrowdStrike pushed scaling defensive AI and backed Terra Security’s agentic offensive platform via its accelerator with Nvidia and AWS support.

• To curb GPU dependency, the industry is pursuing alternatives and open networks as firms seek to escape the 'Nvidia tax', highlighted by Upscale AI’s $100M seed for open-standards AI networking.

• Biosecurity spotlight: researchers used AI-designed DNA to create bacteriophages that infected and killed E. coli, demonstrating real-world bioactivity from AI-generated genomes.

• Materials discovery advance: MIT’s SCIGEN steers diffusion models to generate candidate quantum materials with target lattice geometries (e.g., Kagome, Archimedean).

• Microsoft expanded its data stack as Fabric adds a LinkedIn-derived native graph engine and real-time geospatial maps integrated with OneLake.

🔧 Company Engineering Blogs

Gemini achieves gold-level performance at the International Collegiate Programming Contest World Finals (deepmind.google). Gemini 2.5 Deep Think achieves gold-medal level at the 2025 ICPC World Finals, solving 10/12 problems with advanced reasoning and reinforcement learning techniques

Meet the GitHub MCP Registry: The fastest way to discover MCP Servers (github.blog). GitHub introduces the MCP Registry to centralize MCP server discovery for Copilot, agents, and MCP-enabled tools

Scaleway on Hugging Face Inference Providers 🔥 (huggingface.co). Scaleway joins Hugging Face Inference Providers, enabling serverless inference with Scaleway API keys and HF routing

Learn Your Way: Reimagining textbooks with generative AI (research.google). Google Research explores Learn Your Way, using GenAI to generate multimodal, personalized educational materials and measure learning efficacy

🤖 Agentic systems: data, operations, and real workflows

Supporting our AI overlords: Redesigning data systems to be Agent-first (muratbuffalo.blogspot.com). Agent-first data systems: LLM agent workloads, agentic speculation, multi-query optimization, memory stores, and neurosymbolic collaboration in DBMS redesign

Clouded Judgement 9.19.25 - The AI Shift: Static Software vs. Living AI Systems (cloudedjudgement.substack.com). AI products evolve like living systems, requiring continuous evaluation, observability, and hot-swappable models and prompts

Why Digital Work is the Perfect Training Ground for AI Agents (thedataexchange.media). Upwork CTO Andrew Rabinovich explains Uma, RLEF, RAG with knowledge graphs, and human-in-the-loop evaluation for AI agents in digital work

What happens when coding agents stop feeling like dialup? (martinalderson.com). Discusses AI coding agents, reliability, token speeds, OpenRouter data, Claude Code, Cerebras Code, Gemini CLI, and implications for developer workflow and pricing

⚙️ LLM performance engineering: inference, profiling, and embeddings

Lessons from the trenches: why llama.cpp works best (today) (visokio.com). llama.cpp beats vLLM for running GPT-OSS models locally, with reliability and interactive capabilities highlighted

Scaled dot-product attention profiling (aarnphm.xyz). Scaled dot-product attention profiling with naive, sdpa, and tensorboard tracing using UV, Modal, and PyTorch on CPU/CUDA

How to Reduce the costs of Running LLMs by 10-15x [Investigations] (artificialintelligencemadesimple.substack.com). Techniques for cost-efficient LLM inference: batching, compiler graphs, FlashAttention, quantization, KV caches, sparse architectures, MoE, and spec decoding

Qwen-8B Embeddings: Near-SOTA Performance at 600x the Speed (alexdong.com). Qwen-8B embeddings enable near-SOTA text classification, 600x faster than LLM classifiers, achieving MAP ~0.944 on Kaggle with simple MLP

🛠️ Hands-on builds and experiments: vLLM, Android RAG, diffusion, and personal projects

Summer 2025 in Review (bengubler.com). Summer 2025 recap of AI projects, tokenizers, and WebGPU shading library shade, plus dataset tooling and a LessWrong piece

How I Built the Database of my Dreams (blog.apiad.net). BeaverDB: a Pythonic, SQLite-backed multi-modal data store for vectors, text, lists, queues, pub-sub and more

Setting Up LLaVA/BakLLaVA with vLLM: Backend and API Integration (pyimagesearch.com). Guide to setting up vLLM with CUDA for LLaVA/BakLLaVA, offline Python inference, and OpenAI-compatible API serving

Running a RAG powered language model on Android using MediaPipe (darrylbayliss.net). Step-by-step guide using MediaPipe to run a RAG-powered language model on Android with Gemma, embeddings, and a local vector store

arkaine - an experiment in AI tooling (hlfshell.ai). Arkaine: an AI tooling framework for agents with tool calling, contexts, PythonEnv backend, Spellbook, and lessons learned

Diffusion models: image generation (konradb.substack.com). DIY diffusion-image generation with Flux, Hugging Face diffusers, and prompts automation in Colab

🔎 RAG in production: evaluation, selective retrieval, and vector stores

RAG talk recap from DevConf.US 2025 (major.io). RAG with LLMs explained through a Fellowship metaphor, failures, strategies, and practical lessons for production systems

Evaluating Your RAG Solution (towardsdatascience.com). RAG pipeline construction with OpenAlex abstracts, FAISS vector store, LangChain, and DeepEval for retriever and generator evaluation

Deciding When Not to Retrieve: Adaptive RAG, Part 2 (blog.reachsumit.com). Selective Retrieval in Adaptive RAG: pre-generation decisions using external features and popularity-based triggers

How do vector databases work? (hclimente.github.io). Vector embeddings, cosine similarity, UMAP visualizations, and HNSW-based vector databases (Qdrant) for RAG with LLMs

🧪 Rethinking learning: test-time diffusion, layer-wise decoding, and RL efficiency

Deep researcher with test-time diffusion (research.google). TTD-DR uses test-time diffusion with self-evolution and retrieval-denoising to draft and revise long-form research reports

Making LLMs more accurate by using all of their layers (research.google). SLED decoding uses all LLM layers to align outputs with factual knowledge without external data or fine-tuning

Prediction is hard, especially about the future (strangeloopcanon.com). Forecasting with tiny LLMs: Varro RL environment, GSPO training, semantic similarity, and daily headline predictions

The Extreme Inefficiency of RL for Frontier Models (tobyord.com). New scaling paradigm: RL’s information efficiency vs pre-training; long-horizon tasks, token-entropy, METR/HCAST, o1/o3/o3 models, latency and inference costs

The Shift to Reinforcement Learning Greatly Reduces Learning-Efficiency (tobyord.com). RL training learns far less per hour than pre-training, impacting scalability, generality, and frontier task efficiency in AI systems

📚 Academic Research

LLM-I: LLMs are Naturally Interleaved Multimodal Creators (arxiv:cs). LLMs orchestrate tools like online image search, diffusion generation, code execution, and image editing for interleaved multimodal creation

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation (arxiv:cs). Self-guided training for autoregressive image generation improves visual understanding and FID for LlamaGen models

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems (arxiv:stat). Hierarchical self-attention for multi-scale, multi-modal data using entropy-minimizing mechanics and dynamic-programming-accelerated transformers

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer (arxiv:cs). Manzano proposes a unified multimodal framework with a hybrid image tokenizer, shared vision encoder, dual adapters, and a unified LLM for text and image token generation

AToken: A Unified Tokenizer for Vision (arxiv:cs). AToken: a unified transformer-based visual tokenizer for images, videos, and 3D with 4D rotary embeddings and adversarial-free training

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!