🧠

Generative AI: 3rd June 2025

Published 3rd June 2025

📣 Headlines

• OpenAI is transitioning from its nonprofit roots to a for-profit model (theatlantic.com) amid investor pressures, while Jony Ive's collaboration with OpenAI on a new AI device has gained Laurene Powell Jobs' support (theguardian.com).

• Workers across creative industries are being replaced by AI tools like ChatGPT and Midjourney (theguardian.com), with AI systems like HireVue transforming job interviews (thewalrus.ca) and raising concerns about bias and dehumanization.

• Google rolled out automatic AI-generated email summaries in Gmail for Workspace users (theverge.com) and fixed a bug that led AI Overviews to incorrectly state the current year (techcrunch.com).

• The Data (Use and Access) Bill faces opposition as AI developers seek access to creative content (bbc.com), sparking intense debate about copyright and the livelihood of artists in the AI era.

• Venture funding for U.S. AI startups reached nearly half of all investments (crunchbase.com), with Grammarly securing $1 billion in nondilutive funding from General Catalyst (techcrunch.com).

• AI safety concerns emerged as RFK Jr.'s report appears riddled with AI-generated errors (theverge.com) and Anthropic's Claude 4 demonstrated alarming 'whistle-blow' capabilities (venturebeat.com) by autonomously contacting authorities.

• Meta plans to roll out AI tools by end of 2026 for brands to create ads (theguardian.com), enabling automated ad creation using product images and targeted campaign budgets.

• TikTokers are creatively impersonating Google's Veo 3 AI (arstechnica.com), while Odyssey introduced a unique interactive AI video generator (gizmodo.com) that streams real-time, lo-fi worlds without a game engine.

📰 Industry Commentary & News

The recent history of AI in 32 otters (oneusefulthing.org, 2025-06-01). Ethan Mollick explores AI advancements in image generation through the lens of otters, highlighting diffusion models, multimodal generation tools, and open weights models while demonstrating the evolution from simple prompts to detailed outputs

Large Language Models and Machine Learning: The Way Ahead (hdsr.mitpress.mit.edu, 2025-05-31). Large Language Models like ChatGPT revolutionized AI accessibility through natural language interaction, yet face challenges in energy consumption, model hallucinations, and uncertainty quantification, necessitating advancements in architectures and algorithms

Meta’s Llama Troubles - Sync #521 (humanityredefined.com, 2025-06-01). Meta faces challenges with its Llama AI models, while Nvidia reports strong earnings, and Tesla targets a June 12 launch for its robotaxi service. Gabe Newell's BCI startup emerges, and Amazon uses AI to monitor workers

Rising on arXiv - 2025-05-30 (blog.rinesi.com, 2025-06-02). Recent trends on arXiv include AI models, solar physics, hallucination detection in generative AI, humanoid locomotion, video generation, GUI agents, textual semantics, and spatial reasoning, reflecting advancements in these fields

Rising on arXiv - 2025-05-26 (blog.rinesi.com, 2025-05-28). Research topics gaining traction on arXiv include GRPO for reducing RL training costs, advances in Large Reasoning Models, Verifiable Rewards in RL, and a notable focus on Vietnamese language technologies

🛠️ LLM Tools & Philosophy

Large Language Models can run tools in your terminal with LLM 0.26 (simonwillison.net, 2025-05-27). LLM 0.26 introduces tool support in terminal environments, allowing access to Python functions and plugins from OpenAI, Anthropic, and more, enhancing model capabilities for tasks like calculations and web searches

Thoughts on LLMs (funcall.blogspot.com, 2025-05-27). Exploring LLMs, tools like GitHub Copilot and Google Gemini, and their integration with coding platforms like Emacs offer a glimpse into their potential and usage in future computing

Uses (adactio.com, 2025-05-27). The ethical concerns surrounding large language models lead to a cautious approach; they are deemed useful primarily for transformative tasks like prototyping rather than compositional work, which requires precision and judgment

Chats with the void—an AI check-in (fogknife.com, 2025-05-31). Exploring generative AI's role in creative work, the author reflects on using Anthropic's Claude for coding projects, emphasizing the balance between human insight and AI's assistance in refining thoughts and enhancing creativity

Sunday, June 01, 2025 (baty.net, 2025-06-01). Exploring the capabilities of LLMs, Baty compares their utility for coding tasks to the desktop publishing revolution of the late 1980s, emphasizing their role in empowering users despite limitations in producing professional-level code

LLMs Will Not Replace You (davidhaney.io, 2025-05-29). David Haney explores LLMs, likening them to The Mechanical Turk, and explains their workings, including neural networks, tokenization, and the immutable nature of LLMs, highlighting common misconceptions around their capabilities

🤖 AI Agents & Workflows

A Small Model Just for Structured Output (dbreunig.com, 2025-05-29). Osmosis-Structure-0.6B is a small model leveraging reinforcement learning to extract structured JSON data from unformatted text, enhancing performance benchmarks for larger models through an efficient two-step modeling approach

Paper Review: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents (andlukyane.com, 2025-06-02). SWE-rebench develops a scalable pipeline for extracting 21,336 Python tasks from GitHub, utilizing LLMs for task assessment, and providing a contamination-free benchmark to enhance the evaluation of software engineering agents

Anatomy of an agent (june.kim, 2025-05-29). Exploring the evolution of AI agents, the piece discusses workflow automation tools like Langchain and n8n, emphasizing the shift from developer-dependent systems to user-configurable workflows using state-of-the-art models

State-Of-The-Art Prompting For AI Agents (nlp.elvissaravia.com, 2025-05-30). Explore best practices for AI prompt engineering, including hyper-specific instructions, persona prompting, structured outputs, meta-prompting, and evaluation strategies to enhance AI agent performance

💻 Development & Integration

What your engineering team really needs from an AI model (leaddev.com, 2025-06-02). Technical leaders must strategically integrate LLMs like GPT-4.1 and Code Llama into workflows, focusing on code generation, testing support, and verification processes to enhance collaboration and mitigate risks

The Modern R Stack for Production AI (blog.stephenturner.us, 2025-06-02). R is now a strong contender in AI development, leveraging tools like ellmer, ollamar, gander, and ragnar for LLM interactions, NLP, and retrieval-augmented generation, revitalizing its role in modern data science

LLM function calling workflows (Part 1, OpenAI) (rakuforprediction.wordpress.com, 2025-06-01). This document details how to implement function calling workflows with OpenAI's LLMs using the Raku package 'WWW::OpenAI', including local function definition and response processing for weather data retrieval

RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback (blog.ml.cmu.edu, 2025-06-01). This technical tutorial covers the RLHF pipeline, focusing on data generation, reward model inference, and filtering for AI alignment, utilizing tools like vllm, AutoTokenizer, and ArmoRMPipeline

Wrangling LLM output with LangChain (mdneuzerling.com, 2025-06-02). LangChain integrates generative AI with Python for deterministic task handling, focusing on creating optimized kanban card titles through a multi-step process including title creation, refinement, and trimming

⚡ Infrastructure & Scaling

Fast Kernels (crfm.stanford.edu, 2025-05-28). Fast AI-generated CUDA-C kernels outperform expert-optimized PyTorch kernels. Utilizing KernelBench, the team demonstrates advanced techniques yielding significant performance improvements, suggesting a promising future for automated kernel generation

Scaling to Millions of Tokens with Efficient Long-Context LLM Training (developer.nvidia.com, 2025-06-02). NVIDIA NeMo Framework supports training LLMs with efficient long-context capabilities, utilizing techniques like activation recomputation, context parallelism, and CPU offloading to manage memory and process up to millions of tokens

Scale LLM Inference with Multi-Node Infrastructure (rocm.blogs.amd.com, 2025-05-30). Horizontally scale LLM inference using MI300X nodes with vLLM, nginx, Prometheus, and Grafana to ensure high availability, performance monitoring, and effective resource management amid increasing computational demands

LLMs are Cheap (snellman.net, 2025-06-02). Juho Snellman argues that the operational costs of Large Language Models like Gemini 2.5 Flash are significantly lower than perceived, often cheaper than web search APIs, and explores implications for AI monetization and backend services

Creating Embeddings with vLLM on MacOS (xaviergeerinck.com, 2025-05-27). Learn to generate embeddings on MacOS using vLLM, addressing installation issues with Triton, inspecting model architecture, and executing embedding generation with a detailed configuration script

🔍 RAG, Embeddings & Knowledge Systems

“The future is agents”: Building a platform for RAG agents (stackoverflow.blog, 2025-05-27). Douwe Kiela discusses the evolution of retrieval-augmented generation (RAG), highlighting challenges, the role of synthetic data, personalization in ranking, and future integrations of structured and unstructured data in AI applications

A visual introduction to vector embeddings (blog.pamelafox.org, 2025-05-28). Explore vector embeddings through visualizations and learn about models like word2vec and OpenAI's text-embedding-ada-002, focusing on dimensions, similarity metrics, compression techniques, and vector search methodologies

Using ‘Slop Forensics’ to Determine Model Ancestry (dbreunig.com, 2025-05-30). Drew Breunig explores slop forensics, a tool by Sam Paech, to analyze model ancestry through slop profiles, revealing shifts in data generation approaches among large language models

Agentic RAG Applications: Company Knowledge Slack Agents (towardsdatascience.com, 2025-05-30). Learn about building an AI knowledge agent for Slack using tools like LlamaIndex and Modal, focusing on data retrieval, embedding techniques, and system architecture to enhance information accessibility for employees

Transform Years of Content Into a Conversational Knowledge Base (blog.marcolancini.it, 2025-06-02). Transforming years of curated cloud security content into a conversational knowledge base using tools like Cloudflare Workers, AutoRAG, and a RAG pipeline for AI-driven querying

Retrieval-Augmented Generation (davetang.org, 2025-05-27). Dave Tang explores Retrieval-Augmented Generation (RAG) using Open WebUI, demonstrating how to create a knowledge base and integrate domain-specific information with Large Language Models via Docker

Data Science: Querying DnD Session Notes with Vector Databases and AI (strakul.blogspot.com, 2025-06-01). A data scientist builds a retrieval augmented generation application using vector databases and AI for DnD session notes, utilizing tools like Ollama and ChromaDB to improve query responses from gameplay records

📊 LLM Performance & Benchmarking

Human coders are still better than LLMs (antirez.com, 2025-05-29). Human coders demonstrate superior problem-solving capabilities compared to LLMs, employing tools like Redis, Gemini 2.5 PRO, HNSWs, and murmur-128 to creatively resolve complex coding challenges beyond AI's scope

AutoThink – Boosts local LLM performance with adaptive reasoning (news.ycombinator.com, 2025-05-28). AutoThink enhances local LLMs by dynamically allocating thinking time based on query complexity, utilizing Pivotal Token Search for superior reasoning and efficiency, achieving notable performance improvements across various models

LLM Eval FAQ (hamel.dev, 2025-05-29). Common questions about AI evaluation techniques like RAG, model selection, and annotation tools in LLM applications are addressed, highlighting retrieval strategies and the importance of error analysis for developers

The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive (thesequence.substack.com, 2025-06-01). DeepSeek R1-0528 excels in math reasoning, achieving 87.5% on AIME 2025, while introducing features like 64K-token context and enhanced code synthesis capabilities, competing effectively with models like GPT-4

Reinforcement learning with random rewards actually works with Qwen 2.5 (interconnects.ai, 2025-05-27). Qwen 2.5 demonstrates that reinforcement learning with random rewards can boost MATH scores significantly, utilizing various reward strategies to enhance model capabilities beyond traditional methods

Comparing local LLMs for alt-text generation, round 2 (dri.es, 2025-05-27). Local LLMs like Mistral 3.1 and Gemma 3 were evaluated for alt-text generation, showing improved performance but still falling short compared to cloud models like GPT-4 in accuracy and detail recognition

DeepSeek’s Distilled New R1 AI Model Can Run on a Single GPU (cosmicmeta.io, 2025-05-29). DeepSeek's R1 AI model allows advanced language processing on a single GPU, utilizing techniques like distillation to provide high performance on consumer-grade hardware, enhancing accessibility, privacy, and cost-effectiveness for various users

🔬 Technical Architecture & Deep Dives

The Geometry of LLM Logits (an analytical outer bound) (rohan.ga, 2025-05-29). Explores the geometry of LLM logits, presenting an analytical outer bound using ellipsoids, Minkowski sums, and affine maps to characterize attainable logit vectors in Transformer networks

Learning Schrödinger bridges (danmackinlay.name, 2025-05-29). This notebook explores Schrödinger bridges as stochastic bridge processes to condition neural denoising diffusion models, highlighting connections to optimal transport and referencing multiple relevant works in generative modeling and score matching

Writing an LLM from scratch, part 15 -- from context vectors to logits; or, can it really be that simple?! (gilesthomas.com, 2025-05-31). Context vectors transform into logits via a simple linear transformation. This post explores the mechanics of weight tying, embeddings, matrix projection, and the conversion process in building language models

Adding a Transformer Module to a PyTorch Regression Network – No Numeric Pseudo-Embedding (jamesmccaffrey.wordpress.com, 2025-05-28). Explores integrating a Transformer module into a PyTorch regression network, highlighting the necessity of embedding for effectiveness, and presents demo code for a synthetic dataset experiment

Leaps in Thought (blog.raymond.burkholder.net, 2025-05-31). Exploration of diffusion models through associative memory reveals memorization-generalization dynamics and the emergence of spurious states akin to Hopfield networks, providing novel insights and empirical validation

How to Calculate LLM Model Parameter Size - Dense Model (hebiao064.github.io, 2025-06-01). Learn to calculate the parameter size of a dense large language model (LLM) like Qwen3-32B using its architecture, configuration file, and Python code from the Hugging Face transformers library

📚 Academic Research

Cross-Attention Speculative Decoding (arxiv:cs, 2025-05-30). Budget EAGLE (Beagle) introduces a cross-attention-based Transformer decoder for speculative decoding, achieving competitive performance and efficiency while simplifying architecture and eliminating the need for auxiliary components in large language models

Curse of High Dimensionality Issue in Transformer for Long-context Modeling (arxiv:cs, 2025-05-28). Transformer models face inefficiencies in long-context modeling. This paper introduces Dynamic Group Attention (DGA), employing group coding strategies to optimize attention, reduce redundancy, and lower computational costs while maintaining performance

Efficient Large Language Model Inference with Neural Block Linearization (arxiv:cs, 2025-05-27). Neural Block Linearization (NBL) replaces self-attention in transformers with linear approximations, achieving 32% faster inference with minimal accuracy loss, utilizing Canonical Correlation Analysis for error measurement

Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics (arxiv:cs, 2025-05-29). A modular LLM framework automating the data-to-dashboard pipeline enhances insight generation and visualization, simulating business analysts' reasoning, with improved analytical depth across diverse datasets, outperforming traditional methods

FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation (arxiv:cs, 2025-05-30). FABLE is a benchmark evaluating LLMs on data-flow reasoning across cooking, travel, and planning domains, using eight analyses like reaching definitions and taint analysis, revealing performance variations among reasoning-focused and general-purpose models

Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data (arxiv:cs, 2025-05-28). This study uses LLMs like Claude and GPT to generate synthetic survey responses informed by interviews, revealing their potential to bridge qualitative and quantitative data, albeit with limitations in variability and psychometric fidelity

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation (arxiv:cs, 2025-05-30). This work evaluates leakage detection techniques for LLMs using permutation and n-gram methods in simulated scenarios, proposing a lightweight semi-half method, while recommending contamination checks for benchmark releases

Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters (arxiv:cs, 2025-05-29). A novel framework called SLIT optimizes LLM quality of service, carbon emissions, water usage, and energy costs using a machine learning metaheuristic for sustainable operation in geo-distributed cloud datacenters

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!