Generative AI
Published 26th August 2025
📣 Headlines
• OpenAI announced its first India office in New Delhi and faces legal subpoenas from Meta in Elon Musk's takeover bid case, as the company expands globally while defending against acquisition attempts.
• Google launched its Pixel 10 lineup with Tensor G5 chips and AI-powered features, including Pro Res Zoom up to 100x and a new Gemini smart home speaker with spatial TV audio pairing.
• Enterprise AI agents gained traction with Cohere releasing Command A Reasoning, Druva introducing workload recovery agents, and Salesforce launching Agentforce for Public Sector on AWS with FedRAMP certification.
• AI models showed significant flaws in journalism applications with low overlap in scientific literature summaries, while AI polling diverged from real voter responses especially for demographic minorities.
• Microsoft's AI chief warned about AI potentially demanding rights as consciousness debates intensify, while Senate testimony highlighted the AI race with China and democratic AI governance concerns.
• U.S. billion-dollar startup exits are rebounding in 2025 with AI-focused IPOs expected in a convoy-like pattern, including potential listings from Figma and AI unicorns like Databricks.
• Arm hired Amazon's AI chip director for in-house chipmaking expansion, while SoftBank's AI bets generated $11 billion in two weeks through Nvidia and TSMC holdings.
• MIT researchers developed new AI models for molecular solubility prediction using machine learning and BigSolDB data, while AI agents are being tested as autonomous scientific co-authors in research conferences.
đź”§ Company Engineering Blogs
The AI-Native Engineer: How Salesforce’s Next Generation is Redefining Software Development (engineering​.salesforce​.com). AI-native practices at Salesforce: hiring shifts, bi-directional mentorship, RAG/LLMs, prompt techniques, Cursor, PRizm, Investigation Agent, Slack-driven knowledge sharing, Agentforce adoption, metrics on PR velocity and AI-generated code
New Nemotron Nano 2 Open Reasoning Model Tops Leaderboard and Delivers 6x Higher Throughput (huggingface​.co). Nemotron Nano 2 9B delivers 6x throughput on edge reasoning with Hybrid Transformer–Mamba architecture, thinking budget, pruning to 9B, post-training alignment, and vLLM deployment
How we built a high quality Q&A assistant (medium​.com/airtable-eng). Airtable Omni Q&A: LLM-driven multi-step reasoning, contextual schema exploration, planning and replanning, hybrid search with RAG, inline citations, token-efficient ID encoding, eval suites, and production-scale latency/ cost optimizations
From massive models to mobile magic: The tech behind YouTube real-time generative AI effects (research​.google). YouTube real-time AI effects on mobile: distilling large generative models with PTI inversion, UNet-MobileNet student, on-device MediaPipe pipelines, 30fps latency, 6–10 ms GPUs, datasets with Monk Skin Tone scaling, and effects like Never Blink, Toon 2, Risen zombie
AI-Driven Development at Instacart: Scaling Impact and Increasing Velocity (tech​.instacart​.com). AI-driven development at Instacart accelerates workflows with Project Tomato and Fizz, using Ava, Cursor, Glean; modular workspaces, prompts, model tuning, RCA automation, UI scaffolding from Figma, and best practices
🏠Industry Analysis & News
Will Giant Companies Always Have a Monopoly on Top AI Models? (aclu​.org). ACLU analysis examines data sourcing, pre-training costs, model scaling, DL training stages, data curation, Common Crawl, GDPR-like concerns, RLHF, SFT, retrieval-augmented generation, multi-modal data, distributed training, and Emergence in frontier LLMs
Is this the moment when the Generative AI bubble finally deflates? (garymarcus​.substack​.com). Generative AI hype, LLMs economics, GPT-5 expectations, Altman imagery, market enthusiasm decline, ROI concerns, gurus' reputations, practical use cases, and real-world value debates
AI #130: Talking Past The Sale (thezvi​.wordpress​.com). DeepSeek v3.1 reception; Meta AI restructuring; US-China chip tensions; GPT-5 perception; AI utility shifts; legal AI risks; OpenAI prompts optimizer; ElevenLabs audio; Claude updates; economic and societal AI impacts
Humans aren't going anywhere (bitsondata​.dev). GenAI, AGI, open models, retrieval augmentation, domain-specific workflows, memory in AI, energy use, Nvidia valuation, OpenAI, LLaMA, Mistral, Phi 2, OpenOrca, RAG, zero-backend memory, enterprise ROI
Import AI 426: Playable world models; circuit design AI; and ivory smuggling analysis (jack-clark​.net). Playable world models, AnalogSeeker circuit-design LLMs, Gemini/Gemma 3 tiny model, and ivory smuggling analysis using MM-Grounding-Dino
GenAI’s Impact: more like the Internet or the Metaverse? (doesitmatter​.ai). GenAI progress from Gemini agentic capabilities and energy use to safety updates, shared conversations risks, MIT insights, and enterprise deployment myths
🛠️ Development & Methodology
How I learn about generative AI (blog​.pamelafox​.org). Pamela Fox details her self-guided learning path for generative AI, citing books and videos by Chip Huyen, Sebastian Raschka, and Andrej Karpathy, plus James Briggs on vector search and practical projects
Systematic LLM Prompt Engineering Using DSPy Optimization (towardsdatascience​.com). Systematic LLM prompt optimization with DSPy: generator/judge prompts, LLM judges, MIPROv2, Signatures, Modules, ParallelProcessor, Gemini 1.5, Claude Opus, OpenAI GPT-3.5, dataset prep, evaluation metrics, logging, reproducibility
#517: Agentic Al Programming with Python (talkpython​.fm). Agentic AI programming shifts from autocomplete to collaboration, with tools like Cursor, Claude Code, and LangChain inside Python projects; emphasizes read-only analysis, plan-based edits, and constrained, testable diffs
Why Machine Learning Challenges Still Matter in the Age of Generative AI (medium​.datadriveninvestor​.com). Timeless ML challenges—data quality, quantity, representativeness, drift; model issues—overfitting, evaluation, compute costs, concept drift; explainability—black-boxes, emergent behavior, trust—applied to Generative AI, LLMs, fine-tuning, LoRA, adapters, factuality
🤖 Models & Technical Guides
DeepSeek 3.1 (simonwillison​.net). DeepSeek 3.1: a 685B hybrid reasoning model; Think variant comparable to DeepSeek-R1-0528 with faster responses; benchmarks include AIME 2025, GPQA Diamond, LiveCodeBench; prompt examples for coding, python, and search agents; pelican drawing via OpenRouter
llama.cpp guide: running gpt-oss with llama.cpp (simonwillison​.net). Guide to running gpt-oss with llama.cpp on macOS using llama-server, including ggml gpt-oss-20b-GGUF, homebrew setup, model cache, port 8080, and performance notes on M2 Macs
Evaluating LLMs for my personal use case (darkcoding​.net). Personal eval of LLMs for Rust, Python, Linux queries using 130 prompts; compares Qwen3, Gemini, GLM; Open Router API; latency, cost, and speed
The Illustrated GPT-OSS (newsletter​.languagemodels​.co). GPT-OSS open-source LLM from OpenAI; mixture-of-experts MoE architecture; tokenization notes; reasoning modes (low/medium/high); tool usage, attention visuals, and system/developer messages; comparisons to GPT-2, DeepSeek, Qwen, Kimi; tokenization of emoji, Arabic, etc.; architecture diagrams and course reference
⚙️ Implementation & Architecture
Writing Speed-of-Light Flash Attention for 5090 in CUDA C++ (gau-nernst​.github​.io). Flash Attention 2 in CUDA C++ for 5090 (sm120) using QKV tiling, online softmax, and MMAs; BF16/FP16, scale, and Tensor Core tricks; torch-based reference; Ampere features
"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma (latent​.space). Jeff Huber of Chroma discusses modern AI workloads, vector databases, context engineering, Retrieval-Augmented Language Models, Context Rot, Generative Benchmarking, and practical deployment tips for production search systems
Leverage LLM for Next-Gen Recommender Systems: Technical Deep Dive into LLM-Enhanced Recommender Architectures (lfaidata​.foundation). Technical blueprint for embedding LLMs in end-to-end recommender pipelines: uni-/multi-modality, knowledge-aware reasoning, LoRA, fine-tuning, prompt-based learning, ICL, DRDT, embeddings, rerankers, explainability, GenAI Commons
Building a Local AI Environment (0ut3r​.space). Local AI setup with Ollama, Llama3-chatqa, Qwen2.5-coder, custom aliases chat-ai and code-ai, Modelfile configurations, GPU drivers, RAG concepts, and test demos
📚 Academic Research
Intern-S1: A Scientific Multimodal Foundation Model (arxiv:cs). Shanghai AI Laboratory introduces 28B-parameter multimodal MoE model specialized for scientific domains, outperforming closed-source models in molecular synthesis and reaction prediction. This represents a major breakthrough in domain-specific foundation models
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR (arxiv:cs). UCLA and Microsoft researchers solve policy entropy collapse in RLVR training, achieving 18-23% improvements on competition-level math benchmarks. This addresses a fundamental limitation in reinforcement learning for LLM reasoning
Efficient Mixed-Precision Large Language Model Inference with TurboMind (arxiv:cs). Shanghai AI Lab achieves 61% lower latency and 156% higher throughput for LLM inference through mixed-precision optimization. This breakthrough significantly reduces deployment costs and improves real-world LLM performance
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs (arxiv:cs). Mila and McGill researchers provide crucial insights into when RL fine-tuning works versus fails, revealing how SFT overfitting affects recovery. This offers practical guidance for choosing between SFT and RL approaches
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents (arxiv:cs). Tsinghua University achieves new SOTA 48.1% accuracy on computer automation tasks using distributed RL infrastructure and API-GUI paradigm. This advances the frontier of AI agents that can control computers autonomously
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications (arxiv:cs). Alibaba releases comprehensive framework for building agentic applications with ReAct paradigm, asynchronous design, and runtime sandbox. This provides practical tools for developers to create production-ready AI agents
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent (arxiv:cs). Florida State researchers develop more effective jailbreaking method using exponentiated gradient descent, achieving higher success rates across multiple LLMs. This highlights ongoing security challenges in LLM alignment and safety
đź‘‹ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!