Generative AI: 20th May 2025
In the news
- Google DeepMind's AlphaEvolve can invent new algorithms and solve complex math problems, reclaiming 0.7% of Google's compute resources while advancing applications in data centers and chip design.
- Microsoft announced over 50 AI tools to build the 'agentic web' at Build 2025, while also releasing NLWeb, an open-source tool for integrating generative AI search into websites, and launching Windows AI Foundry for local model development on AI PCs.
- Nvidia and Microsoft accelerate AI processing on PCs with TensorRT for RTX AI PCs, enhancing performance and simplifying deployment, while Taiwan's new Nvidia-powered supercomputer will advance AI, climate science, and quantum computing research.
- Vectara introduced guardian agents that automatically correct AI hallucinations, reducing their occurrence to below 1% in enterprise AI workflows through a sophisticated system of detection and correction models.
- LangChain's LangGraph Platform allows organizations to deploy stateful agents with one-click deployment and horizontal scaling, offering model choice and open-source flexibility where closed vendors cannot.
- OpenAI and Google strengthen their positions in the AI market while Anthropic declines, with specialized reasoning models capturing 10% of the market according to Poe's report.
- Cerebras Systems launched Qwen3-32B, an open-weight LLM that enables real-time AI reasoning in under two seconds, outperforming Nvidia solutions with its WSE-3 processor architecture.
- You.com's ARI Enterprise outperforms OpenAI in 76% of comparative tests, achieving 80% accuracy on the FRAMES benchmark with enhanced research capabilities through enterprise data integration.
📰 AI News & Policy
AI #116: If Anyone Builds It, Everyone Dies (thezvi.wordpress.com, 2025-05-15). Eliezer Yudkowsky and Nate Sores announce their new book warning against superintelligence risks and share insights on AI policy, upcoming tools like Claude 4, and DeepMind's Gemini 2.5 Pro improvements and regressions
Spiky Superhuman AI is here — what’s next? (medium.com/@danieldkang, 2025-05-19). AlphaEvolve, a new AI from Google DeepMind, demonstrates superhuman capabilities including advanced algorithms and a substantial training speedup, marking the arrival of spiky superhuman AI (SSAI) with uneven progress across domains
Using Llama Models in the EU (zansara.dev, 2025-05-16). Llama models, particularly Llama 4, face a ban in the EU due to Meta's non-compliance with EU AI regulations, affecting researchers and companies interested in using these multimodal models
Import AI 413: 40B distributed training run; avoiding the ‘One True Answer’ fallacy of AI safety; Google releases a content classification model (jack-clark.net, 2025-05-19). Google launched ShieldGemma2, an image safety classifier; distributed training advancements include Nous's 40B model built with Psyche; Prime Intellect's training insights underscore evolving AI model safety and benchmarking
Steering AI: New Technique Offers More Control Over Large Language Models (datascience.ucsd.edu, 2025-05-13). Researchers at UC San Diego have developed a nonlinear feature learning technique to precisely steer large language models, enhancing safety, reliability, and efficiency while reducing harmful outputs in AI systems like ChatGPT and Google Gemini
🚀 LLM Use Cases & Field Reports
Takeaways from Field Testing LLMs for Marketing and Advertising: The Jobs They're Stealing and the Jobs They're Not (sharedphysics.com, 2025-05-13). Testing Claude 3.7 and ChatGPT 4o revealed LLMs excel at routine marketing tasks but struggle with nuanced insights—effective as assistants, yet potentially risky as standalone tools
Is AI-assisted coding an incident magnet? (leaddev.com, 2025-05-15). Rapid adoption of AI-assisted coding boosts developer productivity but raises concerns for Site Reliability Engineering (SRE) teams, as increased code volume and flakiness lead to potential incidents and reliance on AI SRE tools for resolution
Beyond Text-only AI (blog.lmorchard.com, 2025-05-19). Exploring on-demand UI generation with LLMs, enhancing user interaction by providing relevant interface components dynamically, improving workflows, and reducing navigation complexity
Out-of-the-box LLMs are not ready for conservation decision making (anil.recoil.org, 2025-05-16). Out-of-the-box LLMs underperform in conservation decision making; domain-specific training with databases enhances their capability, yielding expert-level evidence retrieval potential with significantly faster results
🧠 AI Model Innovations
Diffusion Models Explained Simply (seangoedecke.com, 2025-05-19). Diffusion models, unlike transformer models, utilize noise addition and removal processes for image generation, leveraging techniques like variational auto-encoders and classifier-free guidance. They also explore video and text generation through embeddings
The BabyLM Challenge: In search of more efficient learning algorithms, researchers look to infants (thetransmitter.org, 2025-05-19). The BabyLM Challenge seeks to improve language models by using smaller datasets, mimicking infant learning patterns, while exploring techniques like curriculum learning, GPT-BERT mixing, and multimodal approaches to enhance data efficiency
Diffusion models explained simply (seangoedecke.com, 2025-05-19). Diffusion models use techniques like variational autoencoders and classifier-free guidance to transform noise into coherent images, videos, and text embeddings, highlighting their distinct processes compared to transformer-based models
Machine Learning (blog.raymond.burkholder.net, 2025-05-14). Explore deep reinforcement learning, intelligent load balancing systems, TSLAM-Mini with QLoRA, and a graph-based indoor positioning approach utilizing Wi-Fi fingerprint trajectories for enhanced accuracy and robustness in localization tasks
🔧 LLM Tooling & Plugins
LLM 0.26a0 adds support for tools! (simonwillison.net, 2025-05-14). LLM 0.26a0 introduces tool support, enabling users to execute functions like multiplication via the command line and integrate custom tools with a new plugin hook in the LLM Python library
Building software on top of Large Language Models (simonwillison.net, 2025-05-15). PyCon workshop explores building software with Large Language Models, covering model selection, semantic search, RAG, tool usage, prompt injection risks, and evaluation techniques
👏 A Practical Approach to Building LLM Applications with Liron Itzhaki Allerhand (dagshub.com, 2025-05-13). Liron Itzhaki Allerhand discusses productionizing LLMs through data preparation, prompt design, performance evaluation, and managing sensitive data, alongside emerging trends like in-context learning and decoupling foundation models from applications
How I Built a Tool-Calling Llama Agent with a Custom MCP Server (levelup.gitconnected.com, 2025-05-19). Learn to develop a local tool-calling agent using Llama 3.2 and a custom MCP server for connecting to an Obsidian knowledge base, focusing on tool invocation, small language models, and privacy
Using llm
to handle large input context (danielcorin.com, 2025-05-16). Using llm
, the author addresses challenges faced when analyzing a large codebase with approximately 500,000 tokens, utilizing a bash one-liner to obtain meaningful suggestions for code improvements
Reflecting on FastMCP at 10k stars 🌟 (jlowin.dev, 2025-05-16). FastMCP quickly reached 10,000 stars, revolutionizing developer experience for MCP servers and simplifying interactions with the Model Context Protocol, leading to a robust ecosystem with version 2.0 featuring server proxying and composition
🛠️ Building LLM Applications
The Vibes (taoofmac.com, 2025-05-13). Explore coding workflows with LLMs like Claude and Gemini, using tools like Visual Studio Code, Copilot, and aider, focusing on project structure, logging, and context refinement for better outcomes
ODSC East: LLMs that think - Demystifying Reasoning Models (zansara.dev, 2025-05-14). At ODSC East 2025, Sara Zan explored reasoning models in LLMs, addressing their taxonomy, nuances compared to simple prompts, and the balance between effective reasoning and cognitive overthinking
Paper Review: AlphaEvolve: A coding agent for scientific and algorithmic discovery (andlukyane.com, 2025-05-15). AlphaEvolve enhances coding agents' capabilities using iterative edits and evaluator feedback, optimizing tasks like data center scheduling and discovering novel algorithms, including a breakthrough in matrix multiplication efficiency
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 (twimlai.com, 2025-05-13). Mahesh Sathiamoorthy discusses how reinforcement learning (RL) enhances custom AI agents, emphasizing data curation, evaluation, and error analysis. Tools like Curator, MiniCheck, and MiniChart illustrate RL's advantages over supervised fine-tuning
🔗 RAG Techniques & Tools
RAG's big blindspot (softwaredoug.com, 2025-05-16). RAG apps are hindered by a lack of user engagement metrics, relying instead on LLM evaluations and traditional search metrics like clicks and conversions, essential for understanding user preferences in search behavior
Evaluating RAG Pipelines (neptune.ai, 2025-05-15). Evaluating RAG pipelines involves assessing performance, cost, and latency, utilizing metrics like Recall@k and F1 score, optimizing pre-processing, processing, and post-processing for efficient output generation
Introducing the Graph RAG Project and GraphRetriever: Layering Connected Knowledge onto Your RAG… (medium.com/building-the-open-data-stack, 2025-05-13). The Graph RAG Project introduces GraphRetriever, enhancing retrieval-augmented generation (RAG) by enabling connected knowledge retrieval through metadata-driven graph traversal, improving AI applications using large language models (LLMs)
Retrieval-Augmented Generation (RAG): Recent Research and Challenges (alimbekov.com, 2025-05-16). Retrieval-Augmented Generation (RAG) enhances AI by merging retrieval with generative models, utilizing tools like Ragas, LongRAG, ChunkRAG, HtmlRAG, and FastRAG, while addressing challenges in accuracy and context preservation
🔍 LLM Theory & Reflection
Why do LLMs work? (birchtree.me, 2025-05-19). A concise exploration of why large language models (LLMs) function effectively, highlighting the mystery of their operation despite widespread use in tech since 2010 and recent discussions in the field
LLM Memory (grantslatton.com, 2025-05-19). Explores LLM memory challenges in generating coherent narratives, discussing context windows, reference frames, vector embeddings, and knowledge graphs for better management of temporally and spatially organized information
AI winter, again (blog.lmorchard.com, 2025-05-19). The utility of LLMs in programming is debated, with developers finding value in reduced research time, but profitability and future improvements remain uncertain amidst potential AI winter and increased operational costs
On LLMs (kaukas.mataroa.blog, 2025-05-16). Discussion on LLMs, their capabilities, and challenges including hallucinations, the need for better systems, and potential improvements like property-based testing and new programming languages suitable for AI-generated code
Maybe another example of my favorite weakness of LLMs? (noswampcoolers.blogspot.com, 2025-05-13). A tech enthusiast discusses experimenting with Llama3.2:1b and codegemma:2B LLMs on a Raspberry Pi 400, noting errors in information about the band Rush and analyzing the model's attention system
LLMs: From having thoughts to managing them (ellyloel.com, 2025-05-17). Large language models embody a cultural obsession with capitalism, reflecting a desire among tech influencers to harness others' creativity for profit, addressing a cultural need for validation and success
AI is not magic (philliprhodes.name, 2025-05-14). AI development requires deep knowledge, expertise, and engineering skill, not merely a simple invocation of 'Use AI'. Building complex systems involves trade-offs in token usage, answer quality, and latency considerations
⚙️ Deep Learning Implementations
Using CUDA Deep Neural Network (cuDNN) in Python (stephendiehl.com, 2025-05-15). Implement scaled dot product attention in Python using cuDNN API, leveraging FlashAttention-2 algorithm. Requires SM80 GPU architecture. Explores tensor operations, memory layout, and optimizing neural network computations
Boost 2-Bit LLM Accuracy with EoRA (towardsdatascience.com, 2025-05-15). EoRA enhances 2-bit LLMs by compensating quantization errors, improving performance with tools like GPTQ and AutoRound, and leveraging eigenspace low-rank approximation without requiring training, resulting in significant accuracy gains
Writing an LLM from scratch, part 14 -- the complexity of self-attention at scale (gilesthomas.com, 2025-05-14). Exploring self-attention scaling in LLMs, focusing on O(n^2) complexity regarding context length, space requirements, and computational time for large models like GPT-4.1 and Gemini 1.5 Pro
Adding a Transformer Module to a PyTorch Regression Network – Classic NLP Positional Encoding (jamesmccaffrey.wordpress.com, 2025-05-14). James D. McCaffrey explores adding a Transformer module and a custom Attention module to a PyTorch regression network, comparing classic and simplified positional encoding techniques
📚 Scholarly Research
LLMs get lost in multi-turn conversation (arxiv.org, 2025-05-15). Large Language Models (LLMs) struggle with maintaining coherence in multi-turn conversations, impacting their efficacy in natural language processing tasks involving dialogue exchanges
Large Language Models Are More Persuasive Than Incentivized Human Persuaders (arxiv.org, 2025-05-17). Large language models outperform incentivized human persuaders in persuasive tasks, demonstrating their efficacy within the realm of computational linguistics and language processing techniques
Will AI systems perform poorly due to AI-generated material in training data? (cacm.acm.org, 2025-05-16). Concerns arise over model collapse in AI systems as large language models increasingly train on AI-generated content, risking poorer performance and statistical mismatches between training and real-world data distributions
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention (arxiv:cs, 2025-05-15). ComplexFormer introduces Complex Multi-Head Attention, enabling independent semantic and positional modeling via complex vectors. It employs per-head Euler transformations and adaptive rotations, outperforming baselines in language tasks with enhanced parameter efficiency
TAIJI: MCP-based Multi-Modal Data Analytics on Data Lakes (arxiv:cs, 2025-05-16). Proposes TAIJI, an MCP-based architecture for multi-modal data analytics in data lakes, integrating semantic operator hierarchies and AI-agent-powered NL2Operator translators to enhance efficiency, accuracy, and data freshness
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction (arxiv:cs, 2025-05-16). Delta Attention corrects distributional shifts in sparse attention, achieving 36% performance gains and 88% accuracy of quadratic attention on the 131K RULER benchmark while maintaining 98.5% sparsity and 32x speedup over Flash Attention 2
Statistical Modeling and Uncertainty Estimation of LLM Inference Systems (arxiv:cs, 2025-05-14). Proposes an Analytical with Learning Augmentation framework that combines analytical modeling and machine learning for robust statistical prediction and uncertainty estimation in diverse LLM inference workloads, achieving low median errors and adaptability
Campus AI vs Commercial AI: A Late-Breaking Study on How LLM As-A-Service Customizations Shape Trust and Usage Patterns (arxiv:cs, 2025-05-15). This study explores how user-salient customizations of Large Language Models as-a-Service affect trust and usage patterns among university users, setting the stage for a larger investigation comparing institutional LLMaaS with ChatGPT
Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI (arxiv:cs, 2025-05-15). This study evaluates five general-purpose and three medical Large Language Models using mixed-methods, revealing tensions between linguistic quality, safety, and accessibility in generating accurate cancer-related information for patient understanding
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M (arxiv:cs, 2025-05-15). A study investigates memorization of the MovieLens-1M dataset by Large Language Models (GPT and Llama), analyzing its effect on recommendation performance and identifying the relationship between model architecture and memorization levels
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!