Generative AI

Tuesday 25th March, 2025

Subscribe to this newsletter!

Newsletters sent once a week, unsubscribe anytime.

In the news

⚙️ LLM Internals & Development

PyTorch Internals: Ezyang's Blog (blog.ezyang.com, 2025-03-22). An overview of PyTorch internals, focusing on tensors, automatic differentiation, strides, and extension points to aid contributions to the complex C++ codebase

Inside ChatGPT: How AI Understands and Generates Language (louisbouchard.ai, 2025-03-22). Explains how ChatGPT works, covering next-token prediction, LLMs, neural networks, transformers, attention mechanisms, tokenization, embeddings, training phases, and methods like reinforcement learning for enhanced accuracy

Writing an LLM from scratch, part 10 -- dropout (gilesthomas.com, 2025-03-19). Implementing dropout in LLM training enhances knowledge distribution across the model, using PyTorch's Dropout class, typically with a 10-15% rate on attention scores, to improve generalization

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 (twimlai.com, 2025-03-24). Julie Kallini discusses her papers on MrT5's dynamic token merging, highlighting its efficiency in byte-level language modeling and solutions for under-resourced languages, alongside insights into impossible language model architectures

How I force LLMs to generate correct code (claudio.uk, 2025-03-20). Claudio Santini introduces Unvibe, a Python library using unit-tests and LLMs for generating valid code within complex codebases, optimizing correctness via Monte Carlo Tree Search and enhancing developer workflows

The logit lens can be deceptive if not used properly (soniajoseph.ai, 2025-03-19). The logit lens, while a convenient tool for investigating neural network representations, can mislead researchers due to alignment issues with output space, emphasizing the importance of linear probes for accurate internal representation analysis

⚡ Performance & Optimization

Jagged Flash Attention Optimization (shaped.ai, 2025-03-18). Jagged Flash Attention combines jagged tensors and flash attention, achieving up to 9x speedup and 22% memory reduction in recommendation systems, utilizing TorchRec for efficient sparse data handling and dynamic tensor operations

Speculative Decoding - Deep Dive (rocm.blogs.amd.com, 2025-03-24). Performance improvements utilizing speculative decoding with Llama models on AMD MI300X GPUs, showing up to 2.31x speedup across various input sizes and datasets in LLM online serving

How FlashMLA Cuts KV Cache Memory to 6.7% (louisbouchard.ai, 2025-03-19). FlashMLA by DeepSeek utilizes low-rank compression for KV caching, reducing cache size to 6.7% while optimizing performance on Hopper GPUs, enhancing memory efficiency in large language models without sacrificing speed

4 Learnings From Load Testing LLMs (blog.christianposta.com, 2025-03-19). Load testing LLMs reveals key concepts: using real prompts for accuracy, implementing a ramp-up period for realism, adjusting concurrency for GPU usage, and capturing performance metrics like Time To First Token

🎨 LLM Applications

ByteCraft: Generating video games and animations through bytes (emygervais.github.io, 2025-03-19). ByteCraft utilizes a 7B parameter LLM trained to generate executable video game and animation files from text prompts, leveraging Byte-Pair Encoding for effective byte management and promising significant potential for future development

Qwen2.5-VL-32B: Smarter and Lighter (qwenlm.github.io, 2025-03-24). Qwen2.5-VL-32B is an optimized VL model using reinforcement learning, boasting a 32B parameter scale. It outperforms major benchmarks in multimodal tasks, showcasing advantages in multimodal and pure text capabilities

Enhancing Text-to-SQL With Synthetic Summaries (saeedesmaili.com, 2025-03-18). Using synthetic summaries and retrieval-augmented in-context learning, LLMs can efficiently enhance Text-to-SQL capabilities, enabling better understanding of database structures for generating SQL queries based on user questions

🏭 Industry & Production

Bridging the AI Agent Prototype-to-Production Chasm (thedataexchange.media, 2025-03-20). Ilan Kadar discusses IntellAgent, an open-source platform using synthetic data, knowledge graphs, and reinforcement learning to enhance AI agent deployment, ensuring reliable performance and user trust in high-stakes domains like customer service and finance

Vector Podcast: Adding ML layer to Search: Hybrid Search Optimizer (dmitry-kan.medium.com/vector-podcast-adding-ml-layer-to-search-hybrid-search-optimizer-1f15e43ecc81, 2025-03-21). Dmitry Kan discusses the resurgence of hybrid search in 2025, highlighting the role of machine learning in optimizing search parameters and balancing keyword and neural search techniques in collaboration with Daniel Wrigley and Eric Pugh

Building AI Agents with LLMs: The Future of Autonomous AI Systems (soulpageit.com, 2025-03-19). AI agents utilizing Large Language Models (LLMs) are enhancing business efficiency through intelligent automation, dynamic decision-making, and integration with APIs for real-time financial analysis and risk assessment

🖥 Deployment & Infrastructure

LLM Hardware Calculators (nextbigfuture.com, 2025-03-24). An LLM Hardware Calculator helps determine the required GPUs, memory, and specifications for running local LLM models, emphasizing the limitations of current hardware against desired Grok-level performance

Thinking Different, Thinking Slowly: LLMs on a PowerPC Mac (theresistornetwork.com, 2025-03-24). The piece details running large language models (LLMs) like UllmLlama2 on PowerPC Macs, showcasing code improvements, callback mechanisms for output, and performance statistics for specific prompts

I Just Wanted Pretty Spans: A Rust OpenTelemetry Story (joshkasuboski.com, 2025-03-21). Josh Kasuboski explores integrating OpenTelemetry with Rust, specifically focusing on using spans for debugging LLMs and optimizing logging using the tracing crate, along with Jaeger for tracing visualization

🗣 Opinion & Commentary

My Thoughts on the Future of "AI" (simonwillison.net, 2025-03-19). Nicholas Carlini discusses the future potential of LLMs, predicting significant advancements or stagnation while acknowledging unknown limitations. The article highlights the R1 training method with DeepSeek v3 and its implications for cognitive tasks

How to Dismantle Knowledge of an Atomic Bomb (shkspr.mobi, 2025-03-21). Meta's legal struggles with AI training on pirated data highlight concerns about the proliferation of dangerous knowledge related to CBRNE materials, including atomic and biological weapon creation methods accessible through modern AI tools

LLMs, But Only Because Your Tech SUCKS (aartaka.me, 2025-03-23). Tech limitations drive reliance on LLMs. Embracing tools like Clojure, Lisp, Emacs, REPLs, and keyboard macros can automate tasks, reduce boilerplate, and improve documentation without needing AI-centric solutions

What do we mean when we talk about ‘openness’ in (generative) AI? (dougbelshaw.com, 2025-03-21). Doug Belshaw explores the nuances of 'openness' in generative AI, discussing the importance of model weights, the Open Weight Definition, and the costs associated with developing open-source AI tools

The future of AI is Ruby on Rails (seangoedecke.com, 2025-03-20). Large language models excel at generating code, but struggle with larger codebases. Ruby on Rails is posited as an ideal language for its brevity and elegance, catering to AI-assisted programming needs

AI isn’t always the answer: When Generative AI is not it (cevo.com.au, 2025-03-19). Generative AI excels in many areas but is often less suitable for tasks like time-series forecasting, fully autonomous agents, and high-stakes environments, where traditional machine learning models outperform in accuracy and reliability

💻 Developer Workflow

Are we there yet? (davidvujic.blogspot.com, 2025-03-23). Exploring interactive Python development through REPL Driven Development, integrating Jupyter kernels, and using AI for data generation to streamline workflows, enhancing the overall developer experience similar to Clojure's innovative tools

LLM Assisted Fuzzing (elijahpotter.dev, 2025-03-21). Elijah Potter explores 'LLM Assisted Fuzzing', utilizing Ollama for generating responses and Harper for identifying grammatical errors, aiming to reduce false positives in software testing through iterative LLM output correction

How scientists learn computing and use LLMs to program (wiki.alcidesfonseca.com, 2025-03-18). Scientists leverage LLMs to navigate programming languages like Python and R, focusing on practical scripts rather than maintainable software, potentially leading to undetected bugs that affect research results

🚀 RAG Tutorials

Build a custom RAG AI agent in TypeScript and Jupyter (deno.com, 2025-03-18). Learn to build a custom Retrieval-Augmented Generation (RAG) AI agent using TypeScript, Jupyter, Deno, Ollama, Deepseek R1, and Llama 3.2 for processing confidential documents effectively

Building a PDF RAG Chatbot: Langchain, OpenAI, PGVector, RediStore, and Streamlit (levelup.gitconnected.com, 2025-03-19). Build a PDF RAG chatbot using Langchain, OpenAI, PGVector, RediStore, and Streamlit to enhance document interactions and insights with real-time data processing

Local LLM with Retrieval-Augmented Generation (tonisagrista.com, 2025-03-24). Build a simple RAG application using a local LLM via Ollama, leveraging context information stored in a vector database to enhance chatbot capabilities with dynamic, custom datasets

How to Build an LLM Agent With AutoGen: Step-by-Step Guide (neptune.ai, 2025-03-20). LLM agents enhance reasoning and decision-making by integrating tools like RAG, memory, and APIs, utilizing Microsoft's AutoGen and Azure OpenAI for building conversational agents in Python

Step by Step RAG (tersesystems.com, 2025-03-24). Implementing retrieval augmented generation (RAG) for LLMs to enhance information accuracy, using tools like HayHooks, Tavily, and AWS documentation integration, alongside coding examples and pipeline setup for efficient data retrieval

Where Did Retrieval Augmented Generation Come From — And Where Is It Going? (medium.com/building-the-open-data-stack, 2025-03-19). Retrieval-augmented generation (RAG) combines generative models with external data retrieval, enhancing AI responses without training limitations. Key techniques include backpropagation and embeddings for optimizing relevance, cost, and performance in data systems

“Understanding Retrieval-Augmented Generation (RAG) and Vector Databases for Not-Quite Dummies” on the Pure AI Web Site (jamesmccaffrey.wordpress.com, 2025-03-20). RAG enhances AI-generated responses by integrating specific content from vector databases, enabling detailed answers tailored to user queries in natural language systems, exemplified through the Acme blood analysis machine scenario

🌐 Graph & RAG Techniques

Beyond Vectors - Knowledge Graphs & RAG Using GenAI (digitalocean.com, 2025-03-20). Learn to build a graph-based Retrieval-Augmented Generation (RAG) agent using Named Entity Recognition, Neo4j for data management, and an OpenAI-compatible API for structured and unstructured data integration

Graphiti: Knowledge Graph Memory for a Post-RAG Agentic World (medium.com/neo4j, 2025-03-24). Graphiti enhances AI's potential by offering a real-time, dynamic knowledge graph memory, utilizing Neo4j and features like a bi-temporal model and hybrid indexing for efficient data retrieval and management

Modeling Agent Memory (medium.com/neo4j, 2025-03-20). Explore how agent memory can be structured using graph databases like Neo4j, highlighting long-term and short-term memory, as well as types such as semantic, episodic, procedural, and temporal memory

Enhancing GenAI with GraphRAG: A Smarter Approach to Retrieval-Augmented Generation (soulpageit.com, 2025-03-20). GraphRAG enhances Generative AI by integrating knowledge graphs, improving retrieval accuracy, contextual understanding, and transparency, addressing limitations of traditional RAG models reliant on vector-based systems

📊 Evaluation & Metrics

The New Gold Standard in AI Evaluation: How “Agent-as-a-Judge” Changes Everything (levelup.gitconnected.com, 2025-03-20). The 'Agent-as-a-Judge' framework empowers AI evaluation by allowing agents to review other agents, enhancing the assessment process through nuanced understanding of decision-making, achieving 90% agreement with human reviewers, while cutting evaluation time drastically

Misinformation in LLMs—Causes and Prevention Strategies (promptfoo.dev, 2025-03-19). Misinformation in LLMs can lead to security risks and legal issues, particularly in regulated sectors. Technical strategies include fine-tuning models, retrieval augmented generation, and using Promptfoo to assess factual consistency and perplexity

Beyond the Scoreboard: Rethinking AI Benchmarks for True Innovation (eliza-ng.me, 2025-03-23). Explores the limitations of ML benchmarks, emphasizing Goodhart's Law, biases in language models, and the need for diverse evaluations that prioritize reasoning and problem-solving over mere metric optimization

📚 Academic Insights

Understanding R1-Zero-Like Training: A Critical Perspective (github.com, 2025-03-22). Critical examination of R1-Zero-like training focusing on base models, reinforcement learning, and minimalist techniques using the Dr. GRPO algorithm for optimizing model performance on MATH level questions with specific frameworks

HELM Capabilities: Evaluating LMs Capability by Capability (crfm.stanford.edu, 2025-03-20). HELM Capabilities benchmarks language models via curated scenarios focusing on core abilities, utilizing techniques like Chain-of-Thought prompting and postprocessing approaches for accurate assessments of performance across various tasks

The Helix and the Bit: How Unlikely Ideas Forged Today’s Large Language Models (LLMs) (medium.com/intuitionmachine, 2025-03-22). The mid-20th century's collision of DNA, digital computation, cybernetics, and information theory shaped today's Large Language Models, intertwining life’s code with machines for adaptive intelligence, echoing feedback loops and algorithms

The First LLM (thundergolfer.com, 2025-03-23). Exploring the evolution of language models, the article discusses GPT-1, ULMFit, and their contributions to AI. It defines LLMs and their architecture, highlighting self-supervised learning and the transition to multimodal systems

Research spotlight: is long chain-of-thought structure all that matters when it comes to LLM reasoning distillation? (snorkel.ai, 2025-03-19). Research explores the significance of long chain-of-thought structures in reasoning distillation for LLMs, demonstrating that structural integrity is more crucial than accuracy of individual reasoning steps

Paper Review: RWKV-7 Goose with Expressive Dynamic State Evolution (andlukyane.com, 2025-03-24). RWKV-7 Goose introduces a sequence modeling architecture with a generalized delta rule, achieving multilingual state-of-the-art performance using fewer training tokens and emphasizing efficient state evolution through vector gating and adaptive learning rates

Multimodal Transformers: AI Foundation Models, Part 1 (blogs.sas.com, 2025-03-21). Explore the unique capabilities of multimodal transformers, including self-attention, sequence prediction, and zero-shot learning. Understand their significance in various data types, such as text, images, and audio, and foundation models

DAPO: Enhancing GRPO For LLM Reinforcement Learning? (aipapersacademy.com, 2025-03-21). DAPO enhances GRPO for LLM reinforcement learning, employing techniques like Clip-Higher and dynamic sampling to outperform DeepSeek-R1, achieving 50 points on AIME 2024 benchmark with optimized training methods

📑 Academic Papers

ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism (arxiv:cs, 2025-03-20). ATTENTION2D enhances self-attention in transformer models by enabling efficient parallelism across two dimensions, achieving up to 5x and 9.4x performance boosts on GPU clusters without additional computational overhead

A Review on Large Language Models for Visual Analytics (arxiv:cs, 2025-03-19). Comprehensive review on Large Language Models and visual analytics, evaluating tools like LIDA, Chat2VIS, and ChartLlama; exploring strengths, weaknesses, opportunities, and threats in enhancing data interpretation and visualization techniques

Learning on LLM Output Signatures for gray-box LLM Behavior Analysis (arxiv:cs, 2025-03-18). Proposes leveraging LLM Output Signatures (LOS) through a transformer-based approach for advanced gray-box analysis, improving detection of hallucinations and data contamination, and outperforming existing baselines across datasets and models

SuperARC: A Test for General and Super Intelligence Based on First Principles of Recursion Theory and Algorithmic Probability (arxiv:math, 2025-03-20). A test based on algorithmic probability evaluates AGI and ASI claims, revealing LLMs' limitations and inconsistent progress while outperforming LLMs through a hybrid neurosymbolic approach grounded in Kolmogorov complexity and optimal Bayesian inference

Survey on Evaluation of LLM-based Agents (arxiv:cs, 2025-03-20). A survey analyzing evaluation methodologies for LLM-based agents, focusing on capabilities like planning and memory, and benchmarks in various applications while identifying trends and gaps in cost-efficiency, safety, and robustness

XAttention: Block Sparse Attention with Antidiagonal Scoring (arxiv:cs, 2025-03-20). XAttention introduces a framework enhancing long-context Transformer models' efficiency by using antidiagonal scoring for block importance, achieving up to 13.5x acceleration in attention computation while maintaining accuracy on various benchmarks

LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates (arxiv:cs, 2025-03-20). LLMBRACES optimizes sub-update contributions in Transformer-based LLMs by computing relevance scores in FFN layers, enabling controlled generation and sentiment modulation while outperforming baseline methods with fewer tunable parameters

Don't miss next week's newsletter!

Newsletters sent once a week, unsubscribe anytime.