🧠

Generative AI: 29th July 2025

Published 29th July 2025

📣 Headlines

• OpenAI partnered with the UK government to enhance AI infrastructure and explore AI applications in public services including education, defense, and justice, expanding research collaborations and their London office.

• Meta appointed Shengjia Zhao, former OpenAI GPT-4 co-creator, as Chief Scientist of Superintelligence Labs to lead AI innovation efforts.

• Google launched Opal, a vibe-coding tool that enables rapid web app creation through text prompts, targeting a wider audience than traditional coding approaches.

• Amazon acquired wearable AI startup Bee for an undisclosed sum to enhance personal AI services, following their earlier acquisition of AI company Bee mentioned in gear news.

• AI technology is enabling terrorists to create biological weapons more easily, reducing reliance on traditional labs and raising significant security concerns about bioterrorism.

• A new AI coding challenge published disappointing results with the winner achieving only 7.5% correct answers, highlighting challenges in AI coding benchmarks and gaps between open and proprietary models.

• Global startup funding reached $91 billion in Q2 2025, driven by AI investments, while M&A activity experienced a 155% year-over-year increase with emphasis on cybersecurity and fintech.

• Chinese universities are embracing AI use among students, focusing on education and productivity with initiatives like DeepSeek, contrasting with more restrictive approaches in Western institutions.

🔧 Company Engineering Blogs

Aeneas transforms how historians connect the past (deepmind.google). Aeneas, a groundbreaking AI model, aids historians in interpreting, restoring, and contextualizing ancient inscriptions through advanced analysis of Latin texts

How Meta keeps its AI hardware reliable (engineering.fb.com). Meta's AI hardware reliability is ensured through detecting silent data corruptions and implementing advanced diagnostic strategies across its global infrastructure

How AI Revolutionized Performance Engineering: Hours to Minutes Analysis (engineering.salesforce.com). Salesforce transforms performance engineering with AI, enhancing productivity and optimizing testing processes through tools like Cursor and the MCP server

Solving the inference problem for open source AI projects with GitHub Models (github.blog). GitHub Models offers a free inference API for open source AI projects, eliminating API key barriers and enabling seamless integration with tools like OpenAI SDK

LLM Embeddings Explained: A Visual and Intuitive Guide (huggingface.co). Explore how LLMs convert text to meaning, covering techniques, embeddings, tools, and visualization in natural language processing

🚀 AI Architecture & Development Trends

Digging in the crates (argmin.net). Exploring the argmin archives, highlighting themes on forecasting, academic gatekeeping, machine learning, and tips for finding older blog content

Output Latent Spaces in Multihead Attention (mccormickml.com). Exploration of shared output latent spaces in Multihead Latent Attention models, enhancing efficiency in deep learning with techniques like SVD and model compression

The State-of-the-art in AI Is No Longer About the Models, It Is About the Tooling Available Around the Models (daniel.industries). AI advancement is shifting focus from models to the essential tooling needed for effective management, incorporating techniques like prompt engineering and memory management

LLMs must evolve from scaling to full orchestration (victorwynne.com). LLMs are evolving towards full orchestration for managing complex tasks autonomously, reducing user input and enhancing productivity with tools like LangChain and CrewAI

Attention Regression Parameterized by Embedding Dim and Weights Dim (jamesmccaffrey.wordpress.com). Exploration of an attention regression algorithm using neural attention, PyTorch, with synthetic data for regression problems in machine learning

🧠 LLM Capabilities & Understanding

A Practical (and Incomplete) History of Language Models (obrhubr.org). Explores the evolution of language models from Markov Chains to modern LLMs, including techniques like BPE and the function of self-attention in GPTs

When LLMs Try to Reason: Experiments in Text and Vision-Based Abstraction (towardsdatascience.com). Exploring LLM capabilities in abstract reasoning through grid transformations with o3-mini and gpt-4.1 using ARC benchmark datasets

Reviewing emergent computational abilities in Large Language Models (condensedconcepts.blogspot.com). Examines emergent computational abilities in large language models, discussing in-context learning, modular structures, and implications for control and predictability in AI

Introduction to BAGEL: An Unified Multimodal Model (debuggercafe.com). BAGEL is an open-source unified multimodal model that integrates understanding and generation for images, leveraging a unique Mixture-of-Transformers architecture

🛠️ Practical Implementation & Engineering

RAG with Spring AI (golb.hplar.ch). Implement a RAG system with Spring AI, PostgreSQL, and local models like Ollama for efficient retrieval and generation workflows

The Complete 2025 Prompt Engineering Guide: From Prompts to Context (magnus919.com). Master context engineering for AI systems, leveraging techniques like Few-Shot Prompting, Chain-of-Thought, and decomposition for enhanced performance and ROI

Context engineering for AI agents (manus.im). Lessons on context engineering for AI agents, focusing on KV-cache optimization and the shift from fine-tuning to in-context learning in AI development

Who needs git when you have gemini-2.5-pro? (alexmolas.com). Developer recovers lost machine learning code using Gemini-2.5-Pro LLM, showcasing its long-context memory benefits despite Git missteps

⚡ Performance & Hardware Optimization

Dave Airlie (blogspot): ramalama/mesa : benchmarks on my hardware and open source vs proprietary (airlied.blogspot.com). Benchmarks on RTX, A770, RX7900XT showcase complexities in GPU stacks; NVK aims to bridge gaps with co-op matrix support

Runtime Notes – Qwen 3 Coder 480B A35B Llama.cpp and ik_llama.cpp (digitalspaceport.com). Qwen 3 Coder 480B's performance and configuration insights leveraging Llama.cpp and ik_llama.cpp, discussing GPU optimization and runtime benchmarks

Benchmarking LLMs agents for vulnerability research (fuzzinglabs.com). FuzzingLabs benchmarks 12 LLMs for vulnerability detection in code, highlighting performance metrics, accuracy challenges, and the importance of training methodologies

How the KV Cache works (pierce.dev). Explores KV cache optimization in transformers, focusing on reducing computational redundancy in autoregressive token generation using caching techniques and attention mechanisms

🔧 Training & Fine-tuning Methods

Lessons From Failing To Fine-tune A Small LLM On My Laptop (blog.codonomics.com). Challenges faced while attempting to fine-tune a small LLM on constrained hardware, including OOM errors and memory-intensive operations

Direct Preference Optimization (DPO) (cameronrwolfe.substack.com). Direct Preference Optimization (DPO) enhances LLM alignment using gradient descent, reducing complexity compared to RLHF, focusing on human preference data and reward models

Character.AI Open Sources pipeling-sft: A Scalable Framework for Fine-Tuning MoE LLMs like DeepSeek V3 (blog.character.ai). Character.AI releases pipeling-sft, a framework for fine-tuning Mixture-of-Experts LLMs like DeepSeek V3, enhancing training efficiency and stability

GSPO: Towards Scalable Reinforcement Learning for Language Models (qwenlm.github.io). GSPO enhances reinforcement learning for language models, addressing instability issues in training dynamics with sequence-level optimization over traditional token-level approaches

📚 Academic Research

SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews (arxiv:cs). Benchmark dataset SESR-Eval evaluates LLMs for title-abstract screening in software engineering systematic reviews, highlighting performance limitations and cost-effectiveness

Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges (arxiv:cs). Collaborative edge-cloud strategies for LLMs and SLMs enhance inference, training, privacy, personalization, and deployment efficiency through adaptive scheduling and model optimization techniques

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework (arxiv:cs). NA-PDD algorithm identifies pre-training data in LLMs using neuron activation patterns, enhancing detection accuracy over existing methods and addressing dataset concerns

Understanding the Supply Chain and Risks of Large Language Model Applications (arxiv:cs). Examines risks in Large Language Model supply chains, focusing on dependencies and vulnerabilities across models, datasets, and libraries to enhance LLM security

VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL (arxiv:cs). VeriMinder mitigates analytical vulnerabilities in NLIDBs using a semantic mapping framework, guiding users to formulate bias-free analytical questions and improve analysis quality

DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs (arxiv:cs). DistrAttention enhances self-attention with efficient, flexible mechanisms on modern GPUs, achieving 37% faster performance than FlashAttention-2 while maintaining accuracy

Scaling Linear Attention with Sparse State Expansion (arxiv:cs). Proposes Sparse State Expansion for efficient linear attention in Transformers, enhancing long-context modeling via row-sparse updates and achieving state-of-the-art performance in reasoning tasks

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models (arxiv:cs). GraDe integrates sparse dependency graphs into LLMs' attention for improved tabular data generation, enhancing feature interaction focus and performance on complex datasets

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!