🧠

Generative AI: 20th May 2025

Published 20th May 2025

In the news

Google DeepMind's AlphaEvolve can invent new algorithms and solve complex math problems, reclaiming 0.7% of Google's compute resources while advancing applications in data centers and chip design.
Microsoft announced over 50 AI tools to build the 'agentic web' at Build 2025, while also releasing NLWeb, an open-source tool for integrating generative AI search into websites, and launching Windows AI Foundry for local model development on AI PCs.
Nvidia and Microsoft accelerate AI processing on PCs with TensorRT for RTX AI PCs, enhancing performance and simplifying deployment, while Taiwan's new Nvidia-powered supercomputer will advance AI, climate science, and quantum computing research.
Vectara introduced guardian agents that automatically correct AI hallucinations, reducing their occurrence to below 1% in enterprise AI workflows through a sophisticated system of detection and correction models.
LangChain's LangGraph Platform allows organizations to deploy stateful agents with one-click deployment and horizontal scaling, offering model choice and open-source flexibility where closed vendors cannot.
OpenAI and Google strengthen their positions in the AI market while Anthropic declines, with specialized reasoning models capturing 10% of the market according to Poe's report.
Cerebras Systems launched Qwen3-32B, an open-weight LLM that enables real-time AI reasoning in under two seconds, outperforming Nvidia solutions with its WSE-3 processor architecture.
You.com's ARI Enterprise outperforms OpenAI in 76% of comparative tests, achieving 80% accuracy on the FRAMES benchmark with enhanced research capabilities through enterprise data integration.

📰 AI News & Policy

AI #116: If Anyone Builds It, Everyone Dies (thezvi.wordpress.com, 2025-05-15). Eliezer Yudkowsky and Nate Sores announce their new book warning against superintelligence risks and share insights on AI policy, upcoming tools like Claude 4, and DeepMind's Gemini 2.5 Pro improvements and regressions

Spiky Superhuman AI is here — what’s next? (medium.com/@danieldkang, 2025-05-19). AlphaEvolve, a new AI from Google DeepMind, demonstrates superhuman capabilities including advanced algorithms and a substantial training speedup, marking the arrival of spiky superhuman AI (SSAI) with uneven progress across domains

Using Llama Models in the EU (zansara.dev, 2025-05-16). Llama models, particularly Llama 4, face a ban in the EU due to Meta's non-compliance with EU AI regulations, affecting researchers and companies interested in using these multimodal models

Import AI 413: 40B distributed training run; avoiding the ‘One True Answer’ fallacy of AI safety; Google releases a content classification model (jack-clark.net, 2025-05-19). Google launched ShieldGemma2, an image safety classifier; distributed training advancements include Nous's 40B model built with Psyche; Prime Intellect's training insights underscore evolving AI model safety and benchmarking

Steering AI: New Technique Offers More Control Over Large Language Models (datascience.ucsd.edu, 2025-05-13). Researchers at UC San Diego have developed a nonlinear feature learning technique to precisely steer large language models, enhancing safety, reliability, and efficiency while reducing harmful outputs in AI systems like ChatGPT and Google Gemini

🚀 LLM Use Cases & Field Reports

Takeaways from Field Testing LLMs for Marketing and Advertising: The Jobs They're Stealing and the Jobs They're Not (sharedphysics.com, 2025-05-13). Testing Claude 3.7 and ChatGPT 4o revealed LLMs excel at routine marketing tasks but struggle with nuanced insights—effective as assistants, yet potentially risky as standalone tools

Is AI-assisted coding an incident magnet? (leaddev.com, 2025-05-15). Rapid adoption of AI-assisted coding boosts developer productivity but raises concerns for Site Reliability Engineering (SRE) teams, as increased code volume and flakiness lead to potential incidents and reliance on AI SRE tools for resolution

Beyond Text-only AI (blog.lmorchard.com, 2025-05-19). Exploring on-demand UI generation with LLMs, enhancing user interaction by providing relevant interface components dynamically, improving workflows, and reducing navigation complexity

Out-of-the-box LLMs are not ready for conservation decision making (anil.recoil.org, 2025-05-16). Out-of-the-box LLMs underperform in conservation decision making; domain-specific training with databases enhances their capability, yielding expert-level evidence retrieval potential with significantly faster results

🧠 AI Model Innovations

Diffusion Models Explained Simply (seangoedecke.com, 2025-05-19). Diffusion models, unlike transformer models, utilize noise addition and removal processes for image generation, leveraging techniques like variational auto-encoders and classifier-free guidance. They also explore video and text generation through embeddings

The BabyLM Challenge: In search of more efficient learning algorithms, researchers look to infants (thetransmitter.org, 2025-05-19). The BabyLM Challenge seeks to improve language models by using smaller datasets, mimicking infant learning patterns, while exploring techniques like curriculum learning, GPT-BERT mixing, and multimodal approaches to enhance data efficiency

Diffusion models explained simply (seangoedecke.com, 2025-05-19). Diffusion models use techniques like variational autoencoders and classifier-free guidance to transform noise into coherent images, videos, and text embeddings, highlighting their distinct processes compared to transformer-based models

Machine Learning (blog.raymond.burkholder.net, 2025-05-14). Explore deep reinforcement learning, intelligent load balancing systems, TSLAM-Mini with QLoRA, and a graph-based indoor positioning approach utilizing Wi-Fi fingerprint trajectories for enhanced accuracy and robustness in localization tasks

🔧 LLM Tooling & Plugins

LLM 0.26a0 adds support for tools! (simonwillison.net, 2025-05-14). LLM 0.26a0 introduces tool support, enabling users to execute functions like multiplication via the command line and integrate custom tools with a new plugin hook in the LLM Python library

Building software on top of Large Language Models (simonwillison.net, 2025-05-15). PyCon workshop explores building software with Large Language Models, covering model selection, semantic search, RAG, tool usage, prompt injection risks, and evaluation techniques

👏 A Practical Approach to Building LLM Applications with Liron Itzhaki Allerhand (dagshub.com, 2025-05-13). Liron Itzhaki Allerhand discusses productionizing LLMs through data preparation, prompt design, performance evaluation, and managing sensitive data, alongside emerging trends like in-context learning and decoupling foundation models from applications

How I Built a Tool-Calling Llama Agent with a Custom MCP Server (levelup.gitconnected.com, 2025-05-19). Learn to develop a local tool-calling agent using Llama 3.2 and a custom MCP server for connecting to an Obsidian knowledge base, focusing on tool invocation, small language models, and privacy

Using llm to handle large input context (danielcorin.com, 2025-05-16). Using llm, the author addresses challenges faced when analyzing a large codebase with approximately 500,000 tokens, utilizing a bash one-liner to obtain meaningful suggestions for code improvements

Reflecting on FastMCP at 10k stars 🌟 (jlowin.dev, 2025-05-16). FastMCP quickly reached 10,000 stars, revolutionizing developer experience for MCP servers and simplifying interactions with the Model Context Protocol, leading to a robust ecosystem with version 2.0 featuring server proxying and composition

🛠️ Building LLM Applications

The Vibes (taoofmac.com, 2025-05-13). Explore coding workflows with LLMs like Claude and Gemini, using tools like Visual Studio Code, Copilot, and aider, focusing on project structure, logging, and context refinement for better outcomes

ODSC East: LLMs that think - Demystifying Reasoning Models (zansara.dev, 2025-05-14). At ODSC East 2025, Sara Zan explored reasoning models in LLMs, addressing their taxonomy, nuances compared to simple prompts, and the balance between effective reasoning and cognitive overthinking

Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery (andlukyane.com, 2025-05-15). AlphaEvolve enhances coding agents' capabilities using iterative edits and evaluator feedback, optimizing tasks like data center scheduling and discovering novel algorithms, including a breakthrough in matrix multiplication efficiency

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 (twimlai.com, 2025-05-13). Mahesh Sathiamoorthy discusses how reinforcement learning (RL) enhances custom AI agents, emphasizing data curation, evaluation, and error analysis. Tools like Curator, MiniCheck, and MiniChart illustrate RL's advantages over supervised fine-tuning

🔗 RAG Techniques & Tools

RAG's big blindspot (softwaredoug.com, 2025-05-16). RAG apps are hindered by a lack of user engagement metrics, relying instead on LLM evaluations and traditional search metrics like clicks and conversions, essential for understanding user preferences in search behavior

Evaluating RAG Pipelines (neptune.ai, 2025-05-15). Evaluating RAG pipelines involves assessing performance, cost, and latency, utilizing metrics like Recall@k and F1 score, optimizing pre-processing, processing, and post-processing for efficient output generation

Introducing the Graph RAG Project and GraphRetriever: Layering Connected Knowledge onto Your RAG… (medium.com/building-the-open-data-stack, 2025-05-13). The Graph RAG Project introduces GraphRetriever, enhancing retrieval-augmented generation (RAG) by enabling connected knowledge retrieval through metadata-driven graph traversal, improving AI applications using large language models (LLMs)

Retrieval-Augmented Generation (RAG): Recent Research and Challenges (alimbekov.com, 2025-05-16). Retrieval-Augmented Generation (RAG) enhances AI by merging retrieval with generative models, utilizing tools like Ragas, LongRAG, ChunkRAG, HtmlRAG, and FastRAG, while addressing challenges in accuracy and context preservation

🔍 LLM Theory & Reflection

Why do LLMs work? (birchtree.me, 2025-05-19). A concise exploration of why large language models (LLMs) function effectively, highlighting the mystery of their operation despite widespread use in tech since 2010 and recent discussions in the field

LLM Memory (grantslatton.com, 2025-05-19). Explores LLM memory challenges in generating coherent narratives, discussing context windows, reference frames, vector embeddings, and knowledge graphs for better management of temporally and spatially organized information

AI winter, again (blog.lmorchard.com, 2025-05-19). The utility of LLMs in programming is debated, with developers finding value in reduced research time, but profitability and future improvements remain uncertain amidst potential AI winter and increased operational costs

On LLMs (kaukas.mataroa.blog, 2025-05-16). Discussion on LLMs, their capabilities, and challenges including hallucinations, the need for better systems, and potential improvements like property-based testing and new programming languages suitable for AI-generated code

Maybe another example of my favorite weakness of LLMs? (noswampcoolers.blogspot.com, 2025-05-13). A tech enthusiast discusses experimenting with Llama3.2:1b and codegemma:2B LLMs on a Raspberry Pi 400, noting errors in information about the band Rush and analyzing the model's attention system

LLMs: From having thoughts to managing them (ellyloel.com, 2025-05-17). Large language models embody a cultural obsession with capitalism, reflecting a desire among tech influencers to harness others' creativity for profit, addressing a cultural need for validation and success

AI is not magic (philliprhodes.name, 2025-05-14). AI development requires deep knowledge, expertise, and engineering skill, not merely a simple invocation of 'Use AI'. Building complex systems involves trade-offs in token usage, answer quality, and latency considerations

⚙️ Deep Learning Implementations

Using CUDA Deep Neural Network (cuDNN) in Python (stephendiehl.com, 2025-05-15). Implement scaled dot product attention in Python using cuDNN API, leveraging FlashAttention-2 algorithm. Requires SM80 GPU architecture. Explores tensor operations, memory layout, and optimizing neural network computations

Boost 2-Bit LLM Accuracy with EoRA (towardsdatascience.com, 2025-05-15). EoRA enhances 2-bit LLMs by compensating quantization errors, improving performance with tools like GPTQ and AutoRound, and leveraging eigenspace low-rank approximation without requiring training, resulting in significant accuracy gains

Writing an LLM from scratch, part 14 -- the complexity of self-attention at scale (gilesthomas.com, 2025-05-14). Exploring self-attention scaling in LLMs, focusing on O(n^2) complexity regarding context length, space requirements, and computational time for large models like GPT-4.1 and Gemini 1.5 Pro

Adding a Transformer Module to a PyTorch Regression Network – Classic NLP Positional Encoding (jamesmccaffrey.wordpress.com, 2025-05-14). James D. McCaffrey explores adding a Transformer module and a custom Attention module to a PyTorch regression network, comparing classic and simplified positional encoding techniques

📚 Scholarly Research

LLMs get lost in multi-turn conversation (arxiv.org, 2025-05-15). Large Language Models (LLMs) struggle with maintaining coherence in multi-turn conversations, impacting their efficacy in natural language processing tasks involving dialogue exchanges

Large Language Models Are More Persuasive Than Incentivized Human Persuaders (arxiv.org, 2025-05-17). Large language models outperform incentivized human persuaders in persuasive tasks, demonstrating their efficacy within the realm of computational linguistics and language processing techniques

Will AI systems perform poorly due to AI-generated material in training data? (cacm.acm.org, 2025-05-16). Concerns arise over model collapse in AI systems as large language models increasingly train on AI-generated content, risking poorer performance and statistical mismatches between training and real-world data distributions

ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention (arxiv:cs, 2025-05-15). ComplexFormer introduces Complex Multi-Head Attention, enabling independent semantic and positional modeling via complex vectors. It employs per-head Euler transformations and adaptive rotations, outperforming baselines in language tasks with enhanced parameter efficiency

TAIJI: MCP-based Multi-Modal Data Analytics on Data Lakes (arxiv:cs, 2025-05-16). Proposes TAIJI, an MCP-based architecture for multi-modal data analytics in data lakes, integrating semantic operator hierarchies and AI-agent-powered NL2Operator translators to enhance efficiency, accuracy, and data freshness

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction (arxiv:cs, 2025-05-16). Delta Attention corrects distributional shifts in sparse attention, achieving 36% performance gains and 88% accuracy of quadratic attention on the 131K RULER benchmark while maintaining 98.5% sparsity and 32x speedup over Flash Attention 2

Statistical Modeling and Uncertainty Estimation of LLM Inference Systems (arxiv:cs, 2025-05-14). Proposes an Analytical with Learning Augmentation framework that combines analytical modeling and machine learning for robust statistical prediction and uncertainty estimation in diverse LLM inference workloads, achieving low median errors and adaptability

Campus AI vs Commercial AI: A Late-Breaking Study on How LLM As-A-Service Customizations Shape Trust and Usage Patterns (arxiv:cs, 2025-05-15). This study explores how user-salient customizations of Large Language Models as-a-Service affect trust and usage patterns among university users, setting the stage for a larger investigation comparing institutional LLMaaS with ChatGPT

Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI (arxiv:cs, 2025-05-15). This study evaluates five general-purpose and three medical Large Language Models using mixed-methods, revealing tensions between linguistic quality, safety, and accessibility in generating accurate cancer-related information for patient understanding

Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M (arxiv:cs, 2025-05-15). A study investigates memorization of the MovieLens-1M dataset by Large Language Models (GPT and Llama), analyzing its effect on recommendation performance and identifying the relationship between model architecture and memorization levels

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!