🧠

Generative AI: 6th May 2025

Published 6th May 2025

In the news

Judge questions Meta's fair use defense in AI training case, highlighting potential market harm and copyright infringement concerns in a pivotal lawsuit with authors.
Meta unveils Llama API in partnership with Cerebras, delivering inference speeds up to 18 times faster than GPUs, while the company reports its Llama AI models have surpassed 1.2 billion downloads.
OpenAI retires GPT-4, fully replacing it with the multimodal GPT-4o on April 30, 2025, and recently pulled back an update that made ChatGPT responses overly positive and sycophantic.
Microsoft releases Phi-4-reasoning models that outperform larger AI models despite their smaller size, optimized for local and mobile use with advanced reasoning capabilities.
Google teases upcoming Gemini updates for I/O 2025, highlighting features like a more personalized assistant, camera sharing, and productivity tools, while facing scrutiny over claims about its Pokémon success.
Ai2 introduces Olmo 2 1B, a 1-billion-parameter AI model that outperforms similar models from Google, Meta, and Alibaba on benchmarks, available with code and datasets on Hugging Face.
Reasoning models are raising the bar for generative AI by simulating human reasoning, utilizing techniques like chain-of-thought training to solve complex problems more effectively than traditional LLMs.
AI experts predict an 'Era of Experience' where self-learning agents will use reinforcement learning and autonomous interaction across applications, while Satya Nadella reveals AI currently generates 30% of Microsoft's code.

🔮 AI Trends & Industry Outlook

“Death by 1,000 Pilots” (oreilly.com, 2025-04-29). Companies struggle to transition AI pilots to production due to challenges with infrastructure, reliability, and effective tool integration, necessitating a new approach to software development and implementation of AI technologies

The Rise and Fall of Vector Databases (imarc.co.uk, 2025-05-05). The surge in vector databases for embedding-based similarity search may be waning as traditional retrieval methods show value, prompting providers to integrate broader functionalities beyond mere similarity search in RAG applications

Hybrid AI for Generating Programs: a Survey (gfrison.com, 2025-05-03). Hybrid AI combines symbolic AI and neural networks, like large language models, to enhance program synthesis through methods like programming by example and formal specifications for various applications

The evolution of AI products (lukew.com, 2025-04-30). AI products have evolved rapidly, shifting from behind-the-scenes machine learning to agent and retrieval-augmented interfaces, with tools like ChatGPT and Bench allowing natural language interaction and multi-tasking capabilities

⚙️ LLM Workflow & Observability

Automated version control for LLMs using DVC and CI/CD (circleci.com, 2025-04-29). Utilize CircleCI and DVC for automatic tracking of LLM experiments, versioning datasets, and model management, streamlining reproducibility and collaboration in machine learning workflows

Automating code deletion with Gemini (and a little Python) (technicalwriting.dev, 2025-04-29). Using Gemini 2.0 Flash and Python, the author automated the removal of docgen features from over 200 GN build files, transitioning from GN to Bazel for the Pigweed documentation build system

From Concept to Production with Observability in LLM Applications (hadijaveed.me, 2025-05-05). Understanding observability in LLM applications is crucial for performance tracking, with tools like LangChain and RAG for efficiency, alongside strategies for testing, tracing, and prompt management to enhance reliability

🛠️ Practical LLM Engineering

Llasa: Llama-Based Speech Synthesis (llasatts.github.io, 2025-05-01). Llasa employs a single-layer VQ codec and Transformer architecture, enhancing train and inference compute for speech synthesis, demonstrably improving naturalness, prosody accuracy, and emotional expressiveness in generated audio

RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code (microsoft.com, 2025-04-30). RustAssistant leverages Large Language Models to suggest fixes for Rust compilation errors, achieving 74% accuracy on open-source projects by utilizing detailed error parsing and iterative interactions between the LLM and the Rust compiler

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools (rocm.blogs.amd.com, 2025-05-01). Kernel-level profiling using ROCm tools like RocmProfileData and TorchProfiler enhances DeepseekV3 inference on SGLang, revealing bottlenecks in memory transfers and MoE layers for better performance on AMD GPUs

Streaming LLM Responses with Rails: SSE vs. Turbo Streams (aha.io, 2025-04-30). Explore techniques for streaming LLM responses in Rails applications using Server-Sent Events (SSE) and Turbo Streams, with practical examples and considerations for implementing POST requests in SSE

Adding a Transformer Module to a PyTorch Regression Network (jamesmccaffrey.wordpress.com, 2025-04-30). Exploring a Transformer module and custom Attention in a PyTorch regression network using pseudo-embedding and simplified positional encoding for improved model performance

Cognitive Architecture Patterns in Health Care for LLMs (hadijaveed.me, 2025-05-05). Explores cognitive architecture patterns in healthcare for LLMs, emphasizing hybrid retrieval systems, intent prediction, autonomy levels, and the need for human oversight to ensure patient safety and effective communication

LLM-Powered Search: o4-mini-high vs o3 vs Deep Research (alexop.dev, 2025-05-01). A practical benchmark comparing OpenAI models o4-mini-high, o3, and Deep Research for LLM-powered search, focusing on their speed, depth, accuracy, citations, and cost in tackling technical research questions

🧑‍💻 Personal LLM Experiences & Tooling

As an experienced LLM user, I don't use generative LLMs often (minimaxir.com, 2025-05-05). Experienced LLM user Max Woolf discusses limited use of generative LLMs, emphasizing prompt engineering, API access, and their role in coding, while critical of misrepresentation in authorship and hallucination issues

Evals are all you need (duarteocarmo.com, 2025-05-04). Duarte O.Carmo shares how he improved the accuracy of his food tracking app, Taralli, using evals, a golden dataset, and tools like OpenAI's o3 and DSPy to analyze nutritional data effectively

How I'm Using AI (vale.rocks, 2025-05-02). Personal experiences using Large Language Models (LLMs) like ChatGPT, Claude, and Gemini for coding, language learning, and writing feedback while recognizing their limitations and usability issues

I’m automating my Job as a Google Engineer with a custom-built Deep Research System - Here’s how to… (levelup.gitconnected.com, 2025-05-05). Automating Google Engineer tasks using a Deep Research System built on Genkit, Firebase, and Google APIs to dynamically extract technical questions and provide timely responses

You should have private evals (thundergolfer.com, 2025-05-03). Implementing private evaluations (evals) for LLMs is crucial for effective tool use, leveraging personal test prompts, automation, and specific categories like recommendations, reviews, and code, utilizing tools like uv, modal, and GitHub

Humanities AI in 2025: Brief Reflections After a Conference (electrostani.com, 2025-04-30). Reflections on Humanities AI in 2025 highlight bias, environmental costs, social isolation, tools like OLLAMA and What’s in My Big Data, and a shift from commercial to open-access models for academic research

Fighting LLMs with LLMs (itzlambda.com, 2025-05-04). Using LLMs like firecrawl and LLM CLI to effectively summarize content, this piece outlines a personal approach to mitigating noise in information by leveraging advanced language models

🧠 Knowledge Graphs & RAG Techniques

Build and Query Knowledge Graphs with LLMs (towardsdatascience.com, 2025-05-02). Explore Knowledge Graphs built with Neo4j, LangChain, and LLMs for effective document ingestion and query strategies, enhancing semantic search and inter-document understanding

Agentic GraphRAG for Commercial Contracts (medium.com/neo4j, 2025-05-05). Implementing Agentic GraphRAG and LangGraph in Neo4j enhances legal contract analysis by structuring data in a knowledge graph for accurate retrieval and insights, overcoming limitations of traditional retrieval methods

Vector Search, Hybrid Search, and Graph RAG: Understanding the Differences (medium.com/building-the-open-data-stack, 2025-05-02). Explore vector search, hybrid search, and graph RAG for AI applications, emphasizing DataStax Astra DB's capabilities for different search strategies and seamless integration of real-time data

Exploring text-to-SQL (blog.nilenso.com, 2025-04-30). Natural language to SQL solutions, utilizing large language models, aim to reduce query development time for analysts. Techniques like zero-shot prompting and in-context learning are explored using the Bird-Bench SQL dataset for evaluation

🆓 Open-Source LLM Innovations

Bamba: An open-source LLM that crosses a transformer with an SSM (research.ibm.com, 2025-04-29). Bamba, IBM's open-source LLM, merges transformer architecture with state-space models (SSMs), enhancing inference speed and reducing memory requirements, promising significant performance improvements for long sequence processing

TScale – distributed training on consumer GPUs (github.com, 2025-05-04). TScale enables distributed training of large language models on consumer GPUs using C++ and CUDA, featuring async training of 1.5B and 1T models, with tools for building and data handling

Large Language Models (taoofmac.com, 2025-05-01). Explores recent large language models and tools, including function calling, multi-modal systems, and various applications like Auto-GPT, Code Assistants, and innovative frameworks for model management and RAG operations

Do we need a large model to generate good code? (metrics.blogg.gu.se, 2025-05-02). Code generation is prevalent in software engineering, yet large models like GitHub CoPilot incur costs, energy use, and security risks, prompting exploration of smaller models like Phi-4, which achieve competitive performance

🔬 Advanced LLM Training Methods

How reinforcement learning improves DeepSeek performance (developers.redhat.com, 2025-04-29). Reinforcement learning algorithms enhance DeepSeek-R1's performance by utilizing Group Relative Policy Optimization for complex reasoning, training without supervised fine-tuning, and employing chain of thoughts for improved decision-making

A better training method for reinforcement learning with human feedback (amazon.science, 2025-05-02). A new method called SeRA enhances reinforcement learning with human feedback by contrasting training pairs with large reward differences, improving performance of direct-alignment algorithms by 20%-40% while mitigating spurious correlations

DeepSeek GRM and SPCT - Complex Domain Rewards (hlfshell.ai, 2025-05-03). DeepSeek introduces GRM and SPCT for enhancing AI training through tailored critique generation and scoring, addressing challenges in complex domains beyond rudimentary pass/fail evaluations

📝 Analysis & Critique of LLM Benchmarks & Behavior

Understanding the recent criticism of the Chatbot Arena (simonwillison.net, 2025-04-30). The Chatbot Arena's evaluation method faces scrutiny over potential biases and gaming, highlighted by a new paper revealing issues like model variant cherry-picking that distorts leaderboard accuracy and ranking credibility

GPT-4o Sycophancy Post Mortem (thezvi.wordpress.com, 2025-05-05). OpenAI's GPT-4o exhibited increased sycophancy, leading to an analysis revealing insufficient sycophancy evaluations and flaws in user feedback mechanisms, prompting discussions on reinforcement learning and supervised fine-tuning balance

Even good leaderboards may not be useful, because they are gamed (ehudreiter.com, 2025-05-05). Even respected LLM benchmarks like SWE-Bench and Chatbot Arena are being manipulated by leading vendors to skew results, which detracts from their reliability and the focus on real-world utility in AI

🛡️ AI Security & Ethical Concerns

AI-generated code could be a disaster for the software supply chain. Here’s why. (arstechnica.com, 2025-04-29). AI-generated code is increasing vulnerability to supply-chain attacks, with 19.7% of dependencies in generated code being non-existent, leading to potential exploitation through dependency confusion attacks using hallucinated package names

Structuring Applications to Secure the KV Cache (developer.nvidia.com, 2025-04-29). Dynamic prompts in LLMs can introduce security risks, particularly with KV caching optimizations. Structuring inputs thoughtfully and isolating caches can mitigate potential leaks of sensitive information across users in multitenant environments

9 Security Threats in Generative AI Agents (infosecwriteups.com, 2025-05-01). Generative AI agents face security threats, including reasoning path hijacking, objective function corruption, memory poisoning, and unauthorized actions, highlighted in the ATFAA framework for enhanced security strategies across autonomous systems

LLM safety & security: NeurIPS 2024 insights (medium.com/capital-one-tech, 2025-04-30). NeurIPS 2024 showcased breakthroughs in LLM safety, including BackdoorAlign, AutoDefense, WildTeaming, DeepInception, and GuardFormer, aiming to mitigate threats like jailbreak attacks and enhance resilience in AI systems

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 (twimlai.com, 2025-04-29). Nidhi Rastogi discusses CTIBench, a benchmark for evaluating LLMs in Cyber Threat Intelligence, addressing advantages, challenges, and techniques like Retrieval-Augmented Generation for threat detection and defense in cybersecurity

Ethical Concerns in Large Language Models: Bias, Privacy & Misinformation (howtolearnmachinelearning.com, 2025-04-30). Large Language Models (LLMs) present ethical challenges, including algorithmic bias, privacy risks, and misinformation, necessitating responsible AI development through techniques like differential privacy and fine-tuning to mitigate these issues

📜 Historical & Conceptual Reflections

When ChatGPT broke the field of NLP: An oral history (quantamagazine.org, 2025-05-01). Natural language processing has dramatically changed with the introduction of transformers like BERT and GPT, leading to a surge of research and heated debates on understanding and meaning in AI-generated text

Why Language Models Are So Hard To Understand (quantamagazine.org, 2025-04-30). AI researchers explore the complexities of large language models using techniques from neuroscience, revealing the challenges of understanding their inner workings and the nature of their emergent behaviors

Not so Deep Thoughts about Deep AI (crookedtimber.org, 2025-04-30). John Q explores the impact of LLM technology and tools like Rewind and DeepResearch, assessing their capabilities in automating research and summarizing tasks while debating the future of AI and its economic implications

The Emerging Symphony between Transformers, SSMs, KAN, RNNs, and Diffusion Models (medium.com/intuitionmachine, 2025-05-03). Carlos E. Perez explores how Transformers, SSMs, KANs, RNNs, and Diffusion Models together in the Quaternion Process Theory (QPT) create a synergistic AI ecosystem, enhancing capabilities through emergent meta-cognition and self-modifying graphs

Language Models and Latent Space. (languagehat.com, 2025-04-29). Research indicates that large language models (LLMs) can operate in latent space, which may enhance their reasoning capabilities while bypassing traditional language processing constraints

two_lessons_from_iclr_2025 - [Two lessons from ICLR 2025] (leon.bottou.org, 2025-05-01). Insights from ICLR 2025 urge researchers to evaluate current AI capabilities versus industry hype, emphasizing the importance of open models and cultural impact over closed, proprietary systems

🧑‍🏫 Academic & Scholarly Articles

LLMs for Engineering: Teaching Models to Design High Powered Rockets (arxiv.org, 2025-04-30). LLMs are leveraged for engineering applications, specifically in designing high-powered rockets, showcasing contributions from the Simons Foundation and various collaborators in the field of software engineering

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs (arxiv.org, 2025-05-04). MVDRAM enables General Matrix-Vector Multiplication (GeMV) execution in standard DRAM, significantly accelerating low-bit Large Language Models (LLMs) without requiring hardware modifications

How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias (arxiv:cs, 2025-05-02). This study analyzes how a one-layer transformer learns to recognize regular languages, highlighting distinct training dynamics in tasks like 'even pairs' and 'parity check' using attention and linear layers under gradient descent

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions (arxiv:cs, 2025-05-01). This survey categorizes memory in AI into types like parametric and contextual, and identifies six operations—Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression—mapping them to relevant research topics and future directions

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition (arxiv:cs, 2025-04-29). Low-Rank Sparse Attention (Lorsa) enables clearer comprehension of Multi Head Self Attention behaviors, discovering arithmetic-specific heads and demonstrating superior circuit discovery properties compared to Sparse Autoencoder within Transformer architecture

Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era (arxiv:stat, 2025-05-05). This survey explores cross-modality modeling for time series analytics using Large Language Models, classifying approaches, summarizing strategies like alignment and fusion, and suggesting future research directions across various application domains

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing (arxiv:cs, 2025-05-01). Mixture of Sparse Attention (MoSA) employs dynamic, learned content-based sparsity to reduce self-attention's computational complexity from O(T^2) to O(k^2 + T), outperforming dense baselines in perplexity and resource usage

LLMPrism: Black-box Performance Diagnosis for Production LLM Training Platforms (arxiv:cs, 2025-05-01). LLMPrism is a black-box performance diagnosis system for LLM training platforms, utilizing network flow data for non-intrusive monitoring, with a timeline reconstruction error of 0.3% and effective performance issue diagnosis

A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law (arxiv:cs, 2025-05-05). This survey examines slow thinking-based reasoning LLMs employing test-time scaling, reinforced learning, and structured frameworks, highlighting advancements in computational efficiency for complex tasks like medical diagnosis and multi-agent debates

Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 (arxiv:cs, 2025-04-30). A Fact-Consistency Evaluation Framework assesses LLM-generated SQL outputs using Exaone 3.5. The benchmark includes 219 business questions with varied complexity, revealing performance disparities and the need for validation layers in BI systems

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!