Generative AI

Tuesday 15th April, 2025

Subscribe to this newsletter!

Newsletters sent once a week, unsubscribe anytime.

In the news

đź“° LLM News & Announcements

Political Email Extraction Leaderboard (simonwillison.net, 2025-04-08). Derek Willis's leaderboard evaluates LLMs on their ability to extract committee names from political fundraising emails, utilizing a benchmark of 1,000 emails and Ollama for JSON output. Gemini 2.5 Pro leads with 95.40%

Meta got caught gaming AI benchmarks (theverge.com, 2025-04-08). Meta's Llama 4 release faces scrutiny as it utilized an 'experimental chat version' to optimize performance benchmarks, misleading comparisons against OpenAI's GPT-4o and Google's Gemini 2.0 Flash

Meta got tricky with LLM benchmarks (birchtree.me, 2025-04-08). Meta's latest Llama models showed promising benchmarks, yet scrutiny arises as accusations indicate they may have gamed LMArena comparisons, similar to historical biases observed in consumer preference tests

AudioX: Diffusion Transformer for Anything-to-Audio Generation (zeyuet.github.io, 2025-04-14). AudioX is a unified Diffusion Transformer model enabling high-quality Anything-to-Audio and music generation, integrating diverse inputs like text and video with innovative multi-modal training strategies

First impressions of the new Gemini Deep Research (with 2.5 Pro) (mlops.systems, 2025-04-08). Google DeepMind's Gemini Deep Research tool is enhanced with the 2.5 Pro model, showcasing significant potential but emphasizing the need for unique research taste and control in implementation

Llama Does Not Look Good 4 Anything (thezvi.wordpress.com, 2025-04-09). Llama Scout and Llama Maverick face significant criticism for poor performance on benchmarks, raising concerns about Meta's AI credibility and the integrity of their comparative metrics and licensing policies

March 2025 month notes (rrees.me, 2025-04-13). Exploring JavaScript and Python libraries, digital sovereignty issues, Cloud services, AWS Bedrock experimentation, Fleet IDE evaluation, and LLM coding practices led to engaging discussions about development strategies and collaborative project maintenance

AiOS Dispatch 7 (rudrank.com, 2025-04-13). Explore AiOS Dispatch 7, featuring new AI tools like Quasar Alpha and Optimus Alpha for Swift development, and insights from Japanese developers on AI integration in workflows

🎙️ LLM Podcasts & Community

Instacart Speaker Series with Professor George Gui (tech.instacart.com, 2025-04-10). Professor George Gui discusses the challenges of using large language models for simulating counterfactual human behavior, highlighting pitfalls in experimental design and advocating for rethinking established protocols

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 (twimlai.com, 2025-04-14). Emmanuel Ameisen discusses mechanistic interpretability methods, circuit tracing, and the internal workings of large language models, showcasing their capabilities in language processing, mathematics, and creative writing while addressing their limitations and safety strategies

The TRUTH About Large Language Models and Agentic AI (with Andriy Burkov, Author "The Hundred-Page Language Models Book") (oneknightinproduct.podbean.com, 2025-04-08). Andriy Burkov explains large language models (LLMs), including their mathematical foundations, practical limits, and misconceptions around consciousness, while discussing their evolution from word2vec to modern architectures like transformers

RAPTOR: Recursive Thinking for Smarter AI Retrieval – The Executive Code Podcast (rajiv.com, 2025-04-11). Kirim and Rajiv discuss RAPTOR, a novel approach for improving AI information retrieval from long documents by creating a hierarchical tree of summaries, enhancing understanding for engineering teams and developer data analysis

e509 — Maverick and Marbles (gamesatwork.biz, 2025-04-14). Michael discusses AI tools like Siri and Meta's Llama 4, exploring concepts of generalization and polyglot language acquisition, alongside topics such as the Tapestry app and Zeiss's Holographic Transparent Display technology

đź’­ LLM Opinion Pieces

Use AI for Slow-Solve-Fast-Check tasks (atvbt.com, 2025-04-11). Explores the distinction between slow-solve-fast-check tasks and the utility of LLMs, highlighting how AI can assist in generating ideas and coding while noting limitations in critical reasoning tasks

The Real Moat Isn't Your Agent (blog.michalprzadka.com, 2025-04-10). Effective AI evaluation is crucial for reliability, focusing on retrieval, tool use, generation, and agent trajectory to prevent failures and enhance customer trust

AI Soft Skills: The New Differentiator for Language Models (vincentschmalbach.com, 2025-04-10). AI soft skills, including iterative reasoning and adaptability, are now pivotal in evaluating language models' utility, as seen in Gemini 2.5 Pro and Anthropic's Claude 3 with their tool usage and multi-step problem-solving capabilities

AI Does Not Spark Joy (taoofmac.com, 2025-04-11). After experimenting with LLMs, the author finds minimal productivity gains, noting issues with model reliability, prompt engineering, and the ongoing challenge of correctly implementing AI tools in coding tasks

Raiding Parties (takeonrules.com, 2025-04-12). The reflections on using Large Language Models (LLMs) in research and writing highlight their limitations in truth and reliability, suggesting a cautious approach to technology resembling tarot readings

On generalist models (i.never.nu, 2025-04-10). The future of AI lies in generalist models, with diminishing returns from size increases and the importance of reinforcement learning for reasoning tasks, especially in areas like math, logic, and factual question answering

I'm tired of dismissive anti-AI bias (mattsayar.com, 2025-04-10). Matt Sayar expresses frustration towards dismissive attitudes about AI, emphasizing its potential and utility while addressing misconceptions regarding long-form language models (LLMs) and their implications for education and job markets

🚀 LLM Deployment & Infrastructure

Splitting LLMs Across Multiple GPUs: Techniques, Tools, and Best Practices (digitalocean.com, 2025-04-10). Learn to efficiently split large language models (LLMs) across multiple GPUs using techniques like data and model parallelism, and tools such as PyTorch, Hugging Face Accelerate, DeepSpeed, and Megatron-LM

LLM Knowledge Graph Builder Back-End Architecture and API Overview (medium.com/neo4j, 2025-04-14). Neo4j's LLM Knowledge Graph Builder integrates FastAPI, LangChain, and various document loaders to convert unstructured data into interactive knowledge graphs, enhancing information retrieval and conversational AI capabilities

The Impact of Schema Representation in the Text2Cypher Task (medium.com/neo4j, 2025-04-08). Explores how schema representation affects Text2Cypher performance using Neo4j's GraphRAG package, examining techniques like schema filtering and extraction, which enhance Cypher query generation while reducing token costs in LLM applications

Use any Python AI agent framework with free GitHub Models (blog.pamelafox.org, 2025-04-11). Explore the use of GitHub Models with popular Python AI frameworks like AutoGen, LangGraph, and Semantic Kernel for building AI agents, utilizing models such as gpt-4o and async OpenAI clients

Speed, Cost, Quality: Picking the Right Open Source LLM for Private, Self-Hosted Visual Data Exploration in Omniscope (visokio.com, 2025-04-09). Visokio explores self-hosted open-source LLMs, like DeepSeek's 70B LLaMA model, for private data exploration in Omniscope, highlighting advantages in speed, cost, and structured outputs while ensuring data privacy

R with RAGS: An Introduction to rchroma and ChromaDB (cynkra.com, 2025-04-10). Learn how to enhance R workflows with the rchroma package and ChromaDB, enabling real-time document retrieval for language models through vector-based searches and seamless integration with Docker

Ingres vs Postgres MVCC Explained With Neo4j's LLM Knowledge Graph Builder (perlingresprogramming.blogspot.com, 2025-04-14). Exploring Ingres and Postgres MVCC using Neo4j's LLM Knowledge Graph Builder, which converts unstructured data into accurate knowledge graphs, surpassing traditional Retrieval-Augmented Generation techniques

Streaming with Pydantic AI (datastud.dev, 2025-04-08). Utilize Pydantic AI for streaming LLM responses with tool calls, leveraging async functions and event parsing techniques to organize and display relevant information effectively

🔌 LLM Integration & Retrieval

An LLM Query Understanding Service (simonwillison.net, 2025-04-09). Doug Turnbull demonstrates using an open-source LLM in a GPU-enabled Google Kubernetes Engine to convert search queries into structured data, exemplified by transforming 'blue armchair' into JSON format

Quick Primer on MCP Using Ollama and LangChain (polarsparc.com, 2025-04-13). Model Context Protocol (MCP) enhances LLM app integration with enterprise tools like LangChain and Ollama, enabling seamless access to external data sources for automation and contextual knowledge retrieval

Knowledge graphs, part 1 (geldata.com, 2025-04-11). Knowledge graphs enhance retrieval-augmented generation (RAG) systems by integrating contextual data. Techniques include lexical and domain graphs, vector search, and graph algorithms for efficient query handling

New Life for Fielded Search (rsdoiel.github.io, 2025-04-10). Fielded searches can leverage smaller language models to parse queries into JSON objects, which traditional full-text search engines can process, while ensuring easier integration and privacy benefits, especially for static sites

Giving Summary Generation Some Agency (shekhargulati.com, 2025-04-08). Shekhar Gulati discusses dynamic summarization prompts for large language models, utilizing techniques like document analysis and prompt engineering to create tailored summaries of complex academic papers and video content

Four retrieval techniques to improve RAG you need to know (thoughtworks.com, 2025-04-14). Explore retrieval techniques like Corrective RAG, Self-RAG, RAG-fusion, and Fast GraphRAG to enhance LLM reliability and accuracy in generative AI, addressing data handling, context, and cost challenges

An Overview of Sesame’s Conversational Speech Model (digitalocean.com, 2025-04-11). Explore Sesame's Conversational Speech Model (CSM) designed for natural conversations using autoregressive transformers and audio tokenization techniques like Residual Vector Quantization, deployable on DigitalOcean's GPU Droplets

How to Chunk Text in JavaScript for Your RAG Application (medium.com/building-the-open-data-stack, 2025-04-10). Explore JavaScript libraries like llm-chunk, LangChain, and LlamaIndex for efficient text chunking in RAG applications, enhancing data handling for vector databases and language model integration

đź”§ LLM Algorithm Implementations

Implementing DeepSeek R1's GRPO algorithm from scratch (github.com, 2025-04-13). The GRPO algorithm trains large language models using reinforcement learning, optimizing answers by sampling multiple responses and calculating rewards. Tools include PyTorch and tokenizers, utilizing the Qwen2.5 model and the CountDown task

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel (rocm.blogs.amd.com, 2025-04-09). Explore INT4 quantization techniques with GPTQModel on AMD GPUs to efficiently compress Large Language Models, reducing memory usage and enhancing inference speeds for improved deployment on resource-constrained systems

A quick first look at GPU programming in Mojo (fnands.com, 2025-04-12). Mojo now supports GPU programming with NVIDIA GPUs, enabling developers to write user-friendly kernels. The introduction includes the use of image processing with mimage and techniques like FlashAttention for optimized machine learning

Making An LLM That Just Works For My Brother (tersesystems.com, 2025-04-13). GroundedLLM is a Docker Compose setup using Tavily and Google Gemini APIs, aimed at creating an effective search agent for personal use, particularly in food recipe retrieval and processing tool integration

Answer: Can you extract and summarize a blog? (searchresearch1.blogspot.com, 2025-04-10). Exploring LLMs' effectiveness at retrieving recent blog content, the blog critiques claims of real-time access, utilizing Gemini, Perplexity, and ChatGPT to highlight AI limitations in accuracy and reliability

Local LLMs (brendandawes.com, 2025-04-09). Brendan Dawes explores local LLMs using Ollama within Touch Designer, creating real-time Haikus about financial markets, influenced by Christopher Kopic's tutorial, integrating interaction design with type cutup techniques

đź§  LLM Mechanistic & Theoretical Insights

MLIR Part 7 - Transformers (stephendiehl.com, 2025-04-11). Transformers utilize self-attention, positional encodings, and multi-head attention to enhance natural language processing, enabling robust contextual awareness and emergent capabilities through models like GPT-2, without needing prior extensive context

Circuit Tracing: A Step Closer to Understanding Large Language Models (towardsdatascience.com, 2025-04-08). Circuit tracing unveils LLMs' internal logic by using transcoders to replace MLP computations. This method highlights feature activations while showing models predictively plan and reason, contributing to mechanistic interpretability

The Science Behind Embedding Models: How Vectors, Dimensions, and Architecture Shape AI… (medium.com/the-generator, 2025-04-11). Embedding models are crucial for NLP and AI, transforming text into high-dimensional vectors. The article explores architectures like Transformers, semantic similarity through cosine similarity, and innovations such as multilingual and hybrid embeddings

Exploring LLMs as Agents: Local Models (starkravingfinkle.org, 2025-04-13). Mark Finkle explores setting up local large language models (LLMs), such as those from Ollama and Hugging Face, weighing performance, reasoning capabilities, and operational trade-offs compared to cloud models in his ToolAgent

🎓 LLM Research Studies

To Make Language Models Work Better, Researchers Sidestep Language (quantamagazine.org, 2025-04-14). Researchers explore latent space reasoning for language models, using deep neural networks like GPT-2 and variants like Coconut and a recurrent model, improving efficiency and accuracy in tasks without relying on natural language

Could LLMs help design our next medicines and materials? (news.mit.edu, 2025-04-09). MIT researchers developed Llamole, a multimodal tool combining large language models and graph-based AI to design synthesizable molecules based on user specifications, improving synthesis success rates from 5% to 35%

CMU Study Shows Large Language Models Have Distinctive Styles (cs.cmu.edu, 2025-04-10). A CMU study demonstrates that large language models can be distinguished by unique word choices and styles with 97% accuracy, highlighting implications for synthetic data use in AI training

GRPO Reinforcement Learning Explained (DeepSeekMath Paper) (aipapersacademy.com, 2025-04-14). DeepSeek's GRPO algorithm enhances RL training for mathematical reasoning models, leveraging PPO techniques and sampling multiple outputs for optimal responses, leading to significant improvements in domain-specific model capabilities

Directions in AI, Logic, LLMs and Learning: our Lab’s research Overview (medium.com/@vaishakbelle, 2025-04-09). Research explores LLM-based constraint generation, neural-symbolic integration, prediction, planning, hypothesis learning, and semantic analysis, with applications in fairness, explainability, robustness, and counterfactual reasoning

📚 LLM Architectural Research

Multihead self-attention in cortico-thalamic circuits (arxiv:q-bio, 2025-04-08). This work proposes that cortico-thalamic circuits can implement multihead self-attention, with superficial pyramidal cells encoding attention masks for deep cells, leveraging microcolumns to establish a correspondence with transformer networks

AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery (arxiv:cs, 2025-04-10). AgentAda is an LLM-powered analytics agent that learns analytics skills, using a question generator, RAG-based skill matcher, and code generator to extract insights, outperforming traditional methods in insight generation

UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference (arxiv:cs, 2025-04-10). UniCAIM introduces a hybrid CAM/CIM architecture utilizing FeFETs for efficient long-context LLM inference, achieving concurrent static and dynamic KV cache pruning to enhance energy efficiency by up to 831x while maintaining high accuracy

SEA-LION: Southeast Asian Languages in One Network (arxiv:cs, 2025-04-08). Introducing Llama-SEA-LION-v3-8B-IT and Gemma-SEA-LION-v3-9B-IT, advanced multilingual LLMs supporting 11 Southeast Asian languages, leveraging extensive pre-training and fine-tuning, achieving state-of-the-art performance in the SEA language landscape

From Token to Line: Enhancing Code Generation with a Long-Term Perspective (arxiv:cs, 2025-04-10). The LSR-MCTS algorithm enhances code generation by proposing a line-by-line processing method, utilizing MCTS for optimal path selection, and integrating a self-refine mechanism to improve diversity and program quality

SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow (arxiv:cs, 2025-04-11). SeaView is a tool designed for visualizing and analyzing the trajectories of SWE agents, assisting researchers in identifying issues and comparing experimental runs with varying hyper-parameters or LLMs

Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law (arxiv:cs, 2025-04-10). This paper introduces the Model Utilization Index (MUI) to evaluate LLM effectiveness, highlighting mechanism interpretability and proposing a Utility Law alongside four corollaries addressing performance and fairness in evaluations

Don't miss next week's newsletter!

Newsletters sent once a week, unsubscribe anytime.