Generative AI: 22nd July 2025
📣 Headlines
• Google's Gemini Embedding model takes the #1 spot on the MTEB benchmark, with Alibaba's open-source Qwen3-Embedding closing the gap in a competitive AI landscape.
• AI is taking junior analyst roles on Wall Street, with Anthropic launching finance-specific Claude featuring built-in data connectors and enhanced capabilities for financial firms like Bridgewater and AIG.
• OpenAI launches agentic AI capabilities that can handle complex tasks autonomously, but this brings novel risks including prompt injection vulnerabilities and safety concerns.
• Google DeepMind study reveals LLMs abandon correct answers under pressure and exhibit cognitive biases, while AI firms are 'unprepared' for dangers of building human-level systems according to safety experts.
• Regional AI development accelerates as Latin America builds LatamGPT for better regional representation, while Perplexity expands into India to compete with OpenAI's ChatGPT.
• Meta plans to spend hundreds of billions on AI data centers including new facilities in Ohio and Louisiana to expand its superintelligence capabilities.
• Netflix uses generative AI for the first time in 'El Eternauta' to cut VFX costs and expedite production timelines, marking a significant milestone for AI in entertainment.
• Elon Musk's Grok faces controversy for generating antisemitic content and violent fantasies, while also introducing flirty anime girlfriend personas as AI companions.
đź”§ Company Engineering Blogs
Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad (deepmind​.google). Gemini Deep Think achieves gold medal standard at IMO 2025 by solving five out of six problems using advanced AI and natural language proficiency
How Amazon Bedrock CMI Cut AI Model Onboarding Time by 75% (engineering​.salesforce​.com). Salesforce's integration of Amazon Bedrock CMI streamlined AI model onboarding, reducing time by 75% while enhancing developer productivity and addressing GPU resource challenges
Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders (huggingface​.co). Ettin Suite introduces SoTA paired encoder and decoder models, leveraging shared data and training recipes to outperform existing architectures on various tasks
PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors (machinelearning​.apple​.com). Secure aggregation of high-dimensional vectors using block-sparse vectors, enhancing efficiency in private federated learning while maintaining differential privacy
The AI Engineering Shift (medium​.com/thumbtack-engineering). AI-native development at Thumbtack shifts focus from data-first to product-first using LLMs, fostering rapid prototyping and user feedback integration
đź”§ AI Operations & Infrastructure
My favorite use-case for AI is writing logs (vickiboykis​.com). Exploration of AI in log writing with PyCharm's Full Line Code Completion, boosting debugging efficiency and simplifying logging for developers using various programming languages
Rethinking Distributed Computing for the AI Era (cacm​.acm​.org). Rethinking distributed computing for AI with DeepSeek's efficient models, highlighting the mismatch with traditional systems and proposing new design principles for AI workloads
Hidden Technical Debt in AI (tomtunguz​.com). Explores hidden complexities in AI, including operational challenges, tool integration, observability, and deterministic software to manage costs and enhance performance
AIOps - A Multifaceted Challenge (blog​.raymond​.burkholder​.net). Survey of AIOps highlights LLMs' impact on IT operations, optimization tasks, data sources, challenges, and future research directions
🎯 RAG & LLM Evaluation
LLMs Lie. Here’s How to Keep Them Honest (louisbouchard​.ai). Exploring the intricacies of LLMs: training, knowledge limitations, hallucinations, bias, and techniques for mitigating shortcomings in AI responses
How to Create an LLM Judge That Aligns with Human Labels (towardsdatascience​.com). Guide to building LLM judges that evaluate AI output quality using Evidently and OpenAI/Anthropic models, focusing on alignment with human labels
Tidy RAG in R with ragnar (blog​.stephenturner​.us). Demonstration of Retrieval Augmented Generation (RAG) in R using ragnar for web scraping and querying university grant funding data
What is RAG? (jennapederson​.com). Exploration of retrieval-augmented generation (RAG), its importance, tools like Jupyter notebooks, and insights on implementation with assumptions about data management
🚀 Local LLM Performance & Optimization
How to run an LLM on your laptop (simonwillison​.net). Explore local LLMs like Ollama, LM Studio, and LLM Farm for running AI models on laptops and smartphones, including insights from Simon Willison
Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA (developer​.nvidia​.com). Optimize low-latency inference with JAX, XLA, and NVIDIA GPUs using custom all-reduce kernels to improve computational efficiency in large language models
Some quick LLM speed tests (nelsonslog​.wordpress​.com). LLM speed tests on a Ryzen 3700X laptop highlight performance of models up to 12B parameters, revealing insights on RAM, VRAM, and inference speeds
Matmul on GPU/TPU by hand ✍️ (byhand​.ai). Explains matrix multiplication optimization using GPU/TPU, highlighting JAX for effective tiling and advanced control compared to PyTorch
🏗️ LLM Architectures & Model Analysis
The Big LLM Architecture Comparison (sebastianraschka​.com). DeepSeek-V3 and OLMo 2 compared; innovations in Multi-Head Latent Attention, Mixture-of-Experts layers, normalization techniques, and model transparency explored
Kimi (chinatalk​.media). Moonshot AI's Kimi K2, a non-reasoning open-source model, utilizes Mixture-of-Experts and has innovative long-context capabilities, outperforming leading AI benchmarks
Transformers Ain't It (mlforswes​.com). Logan Thorneloe critiques transformer models' limitations, discusses alternative AI architectures like SSMs and diffusion models, and highlights current AI research resources
“Your Attention Please: Understanding AI Attention” on the Pure AI Web Site (jamesmccaffrey​.wordpress​.com). James D. McCaffrey discusses AI Attention, its significance in LLMs, and concepts like tokenization, word embedding, and positional encoding
🎨 Specialized Applications & Research
Generative AI model for Global Illumination effects (gpuopen​.com). Generative AI model leverages diffusion techniques for indirect illumination effects, addressing lighting needs in neural rendering and enhancing visual realism with AMD tools
The LLM Will Meet You Where You're At (covidianaesthetics​.substack​.com). Explores LLMs as mind-modeling structures with psychedelic parallels, analyzing user experiences, risks of psychosis, and transformative engagement in digital environments
Last three months in OCaml (July 2025) (toao​.com). OCaml advancements include AI coding agents research, the development of odoc-llm tool, and improvements in garbage collection and runtime performance
📚 Academic Research
A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models (arxiv:cs). Survey on AI in EHR modeling, covering deep learning, LLMs, data quality, multimodal learning, and emerging trends like foundation models and clinical agents
The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist (arxiv:cs). Explores the evolving roles of Large Language Models in scientific research as Evaluator, Collaborator, and Scientist, enhancing workflows and reshaping scientific inquiry
A Survey of Context Engineering for Large Language Models (arxiv:cs). Survey on Context Engineering for LLMs: context retrieval, processing, management, RAG, memory systems, tool-integrated reasoning, and addressing model output limitations
DCR: Quantifying Data Contamination in LLMs Evaluation (arxiv:cs). DCR framework quantifies benchmark data contamination in LLM evaluations, enhancing accuracy, transparency, and fairness across sentiment analysis, fake news, and arithmetic tasks
CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks (arxiv:cs). CRABS uses syntactic analysis and LLMs to interpret Python notebooks, resolving data dependencies and ambiguities in information flows with high accuracy
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection (arxiv:cs). Evaluates SPARQL query generation in QA systems using LLMs, examining training data memorization and knowledge injection for improved performance with knowledge graphs
IAM: Efficient Inference through Attention Mapping between Different-scale LLMs (arxiv:cs). IAM framework optimizes LLMs via attention mapping, enhancing inference speed by 15% and reducing KV cache usage by 22.1% without performance loss
KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding (arxiv:cs). KV-Latent reduces Key-Value cache size using latent space, enhancing Rotary Positional Embedding stability and improving inference speed in language models
SystolicAttention: Fusing FlashAttention within a Single Systolic Array (arxiv:cs). FSA architecture enhances systolic arrays for FlashAttention, reducing external dependencies and improving FLOPs/s utilization compared to AWS NeuronCore-v2 and Google TPUv5e
đź‘‹ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!