Generative AI: 29th July 2025
Published 29th July 2025
📣 Headlines
• OpenAI partnered with the UK government to enhance AI infrastructure and explore AI applications in public services including education, defense, and justice, expanding research collaborations and their London office.
• Meta appointed Shengjia Zhao, former OpenAI GPT-4 co-creator, as Chief Scientist of Superintelligence Labs to lead AI innovation efforts.
• Google launched Opal, a vibe-coding tool that enables rapid web app creation through text prompts, targeting a wider audience than traditional coding approaches.
• Amazon acquired wearable AI startup Bee for an undisclosed sum to enhance personal AI services, following their earlier acquisition of AI company Bee mentioned in gear news.
• AI technology is enabling terrorists to create biological weapons more easily, reducing reliance on traditional labs and raising significant security concerns about bioterrorism.
• A new AI coding challenge published disappointing results with the winner achieving only 7.5% correct answers, highlighting challenges in AI coding benchmarks and gaps between open and proprietary models.
• Global startup funding reached $91 billion in Q2 2025, driven by AI investments, while M&A activity experienced a 155% year-over-year increase with emphasis on cybersecurity and fintech.
• Chinese universities are embracing AI use among students, focusing on education and productivity with initiatives like DeepSeek, contrasting with more restrictive approaches in Western institutions.
đź”§ Company Engineering Blogs
Aeneas transforms how historians connect the past (deepmind​.google). Aeneas, a groundbreaking AI model, aids historians in interpreting, restoring, and contextualizing ancient inscriptions through advanced analysis of Latin texts
How Meta keeps its AI hardware reliable (engineering​.fb​.com). Meta's AI hardware reliability is ensured through detecting silent data corruptions and implementing advanced diagnostic strategies across its global infrastructure
How AI Revolutionized Performance Engineering: Hours to Minutes Analysis (engineering​.salesforce​.com). Salesforce transforms performance engineering with AI, enhancing productivity and optimizing testing processes through tools like Cursor and the MCP server
Solving the inference problem for open source AI projects with GitHub Models (github​.blog). GitHub Models offers a free inference API for open source AI projects, eliminating API key barriers and enabling seamless integration with tools like OpenAI SDK
LLM Embeddings Explained: A Visual and Intuitive Guide (huggingface​.co). Explore how LLMs convert text to meaning, covering techniques, embeddings, tools, and visualization in natural language processing
🚀 AI Architecture & Development Trends
Digging in the crates (argmin​.net). Exploring the argmin archives, highlighting themes on forecasting, academic gatekeeping, machine learning, and tips for finding older blog content
Output Latent Spaces in Multihead Attention (mccormickml​.com). Exploration of shared output latent spaces in Multihead Latent Attention models, enhancing efficiency in deep learning with techniques like SVD and model compression
The State-of-the-art in AI Is No Longer About the Models, It Is About the Tooling Available Around the Models (daniel​.industries). AI advancement is shifting focus from models to the essential tooling needed for effective management, incorporating techniques like prompt engineering and memory management
LLMs must evolve from scaling to full orchestration (victorwynne​.com). LLMs are evolving towards full orchestration for managing complex tasks autonomously, reducing user input and enhancing productivity with tools like LangChain and CrewAI
Attention Regression Parameterized by Embedding Dim and Weights Dim (jamesmccaffrey​.wordpress​.com). Exploration of an attention regression algorithm using neural attention, PyTorch, with synthetic data for regression problems in machine learning
đź§ LLM Capabilities & Understanding
A Practical (and Incomplete) History of Language Models (obrhubr​.org). Explores the evolution of language models from Markov Chains to modern LLMs, including techniques like BPE and the function of self-attention in GPTs
When LLMs Try to Reason: Experiments in Text and Vision-Based Abstraction (towardsdatascience​.com). Exploring LLM capabilities in abstract reasoning through grid transformations with o3-mini and gpt-4.1 using ARC benchmark datasets
Reviewing emergent computational abilities in Large Language Models (condensedconcepts​.blogspot​.com). Examines emergent computational abilities in large language models, discussing in-context learning, modular structures, and implications for control and predictability in AI
Introduction to BAGEL: An Unified Multimodal Model (debuggercafe​.com). BAGEL is an open-source unified multimodal model that integrates understanding and generation for images, leveraging a unique Mixture-of-Transformers architecture
🛠️ Practical Implementation & Engineering
RAG with Spring AI (golb​.hplar​.ch). Implement a RAG system with Spring AI, PostgreSQL, and local models like Ollama for efficient retrieval and generation workflows
The Complete 2025 Prompt Engineering Guide: From Prompts to Context (magnus919​.com). Master context engineering for AI systems, leveraging techniques like Few-Shot Prompting, Chain-of-Thought, and decomposition for enhanced performance and ROI
Context engineering for AI agents (manus​.im). Lessons on context engineering for AI agents, focusing on KV-cache optimization and the shift from fine-tuning to in-context learning in AI development
Who needs git when you have gemini-2.5-pro? (alexmolas​.com). Developer recovers lost machine learning code using Gemini-2.5-Pro LLM, showcasing its long-context memory benefits despite Git missteps
⚡ Performance & Hardware Optimization
Dave Airlie (blogspot): ramalama/mesa : benchmarks on my hardware and open source vs proprietary (airlied​.blogspot​.com). Benchmarks on RTX, A770, RX7900XT showcase complexities in GPU stacks; NVK aims to bridge gaps with co-op matrix support
Runtime Notes – Qwen 3 Coder 480B A35B Llama.cpp and ik_llama.cpp (digitalspaceport​.com). Qwen 3 Coder 480B's performance and configuration insights leveraging Llama.cpp and ik_llama.cpp, discussing GPU optimization and runtime benchmarks
Benchmarking LLMs agents for vulnerability research​ (fuzzinglabs​.com). FuzzingLabs benchmarks 12 LLMs for vulnerability detection in code, highlighting performance metrics, accuracy challenges, and the importance of training methodologies
How the KV Cache works (pierce​.dev). Explores KV cache optimization in transformers, focusing on reducing computational redundancy in autoregressive token generation using caching techniques and attention mechanisms
đź”§ Training & Fine-tuning Methods
Lessons From Failing To Fine-tune A Small LLM On My Laptop (blog​.codonomics​.com). Challenges faced while attempting to fine-tune a small LLM on constrained hardware, including OOM errors and memory-intensive operations
Direct Preference Optimization (DPO) (cameronrwolfe​.substack​.com). Direct Preference Optimization (DPO) enhances LLM alignment using gradient descent, reducing complexity compared to RLHF, focusing on human preference data and reward models
Character.AI Open Sources pipeling-sft: A Scalable Framework for Fine-Tuning MoE LLMs like DeepSeek V3 (blog​.character​.ai). Character.AI releases pipeling-sft, a framework for fine-tuning Mixture-of-Experts LLMs like DeepSeek V3, enhancing training efficiency and stability
GSPO: Towards Scalable Reinforcement Learning for Language Models (qwenlm​.github​.io). GSPO enhances reinforcement learning for language models, addressing instability issues in training dynamics with sequence-level optimization over traditional token-level approaches
📚 Academic Research
SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews (arxiv:cs). Benchmark dataset SESR-Eval evaluates LLMs for title-abstract screening in software engineering systematic reviews, highlighting performance limitations and cost-effectiveness
Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges (arxiv:cs). Collaborative edge-cloud strategies for LLMs and SLMs enhance inference, training, privacy, personalization, and deployment efficiency through adaptive scheduling and model optimization techniques
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework (arxiv:cs). NA-PDD algorithm identifies pre-training data in LLMs using neuron activation patterns, enhancing detection accuracy over existing methods and addressing dataset concerns
Understanding the Supply Chain and Risks of Large Language Model Applications (arxiv:cs). Examines risks in Large Language Model supply chains, focusing on dependencies and vulnerabilities across models, datasets, and libraries to enhance LLM security
VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL (arxiv:cs). VeriMinder mitigates analytical vulnerabilities in NLIDBs using a semantic mapping framework, guiding users to formulate bias-free analytical questions and improve analysis quality
DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs (arxiv:cs). DistrAttention enhances self-attention with efficient, flexible mechanisms on modern GPUs, achieving 37% faster performance than FlashAttention-2 while maintaining accuracy
Scaling Linear Attention with Sparse State Expansion (arxiv:cs). Proposes Sparse State Expansion for efficient linear attention in Transformers, enhancing long-context modeling via row-sparse updates and achieving state-of-the-art performance in reasoning tasks
Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models (arxiv:cs). GraDe integrates sparse dependency graphs into LLMs' attention for improved tabular data generation, enhancing feature interaction focus and performance on complex datasets
đź‘‹ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!