Generative AI

Tuesday 11th March, 2025

Subscribe to this newsletter!

Newsletters sent once a week, unsubscribe anytime.

In the news

šŸ“° AI News & Announcements

What's new in the world of LLMs, for NICAR 2025 (simonwillison.net, 2025-03-08). Simon Willison reviews advances in LLMs for NICAR 2025, discussing multi-modal models, inference time compute, and tools like Gemini, Claude, Qwen, and Llama, emphasizing new uses in data journalism

Wargaming in the Age of AI: Opportunities and challenges (paxsims.wordpress.com, 2025-03-07). Georgetown University's virtual symposium on AI and wargaming discusses the potential of large language models like ChatGPT, Claude, and Gemini to enhance strategic decision-making and simulations in complex environments

How Much Compute and Video to Solve Real World Superintelligence ? (nextbigfuture.com, 2025-03-10). Yann LeCun, an AI expert, argues that large language models lack the efficiency to achieve true superintelligence, proposing significant video data and compute requirements for real-world learning, including insights on Tesla's evolving AI capabilities

Alibabaā€™s QwQ-32B: A New Benchmark in Efficient Reasoning Models (emsi.me, 2025-03-06). Alibabaā€™s QwQ-32B showcases effective reasoning with 32 billion parameters, utilizing reinforcement learning, code interpretation, and math solving for optimized outputs and a context window of 131,072 tokens, available as an open model

gpt-4o-mini vs. gpt-3.5-turbo for RAG: Wordier, but better? (blog.pamelafox.org, 2025-03-06). Pamela Fox evaluates gpt-4o-mini against gpt-3.5-turbo for RAG applications, highlighting longer, more detailed responses and lower costs, despite some decrease in groundedness

Generative AI Hype Peaking (bjornwestergard.com, 2025-03-10). Skepticism grows as Generative AI hype wanes; tools like LLMs and DeepSeek sparking innovation in software and customer support, yet risks persist for less experienced developers amid structural job market changes

šŸ’” Reflective Commentaries

AI #106: Not so Fast (thezvi.wordpress.com, 2025-03-06). GPT-4.5 shows limited progress, while ethical concerns grow around AI honesty and productivity tools, with advancements noted in legal AI applications like Vincent and increased adoption of LLMs highlighted

LLMs Donā€™t Know What They Donā€™t Knowā€”And Thatā€™s a Problem by Colin Eberhardt (blog.scottlogic.com, 2025-03-06). LLMs exhibit overconfidence in execution, lacking awareness of their capabilities, leading to poor handling of ambiguous tasks. Tools like Bolt and concepts such as 'vibe coding' highlight these limitations in AI development

Letā€™s Think Step-by-Step (rwblickhan.org, 2025-03-10). Discourse on LLMs includes backlash and critiques, with a focus on their utility vs. claims of general intelligence, highlighting syntactic reasoning, and raising questions about understanding and consciousness

Perhaps The LLM Juice Isnā€™t Worth The Electrical Squeeze (rwblog S6E23) (rwblickhan.org, 2025-03-10). The piece discusses the high costs versus the utility of LLMs, referencing Molly Whiteā€™s essay and expressing skepticism about the practical applications of LLM tools like Whisper and summarization in daily workflows

Thoughts on AI (davetang.org, 2025-03-07). Dave Tang reflects on his journey in AI and machine learning, discussing challenges in applying deep learning to biological data and advocating for viewing AI as augmented intelligence rather than fully autonomous technology

šŸ“š LLM Evaluation & Applications

Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets (arxiv:cs, 2025-03-09). This study explores using Large Language Models (LLMs) with Python for cleaning training datasets, showing their effectiveness in correcting erroneous entries while facing challenges with complex errors requiring broader data distribution understanding

Large language models in finance : what is financial sentiment? (arxiv:q-fin, 2025-03-05). Financial sentiment, crucial in market forecasting, is enhanced by large language models like BERT (RoBERTa, FinBERT) and GPT (GPT-4, OPT, LLaMA) for accurate sentiment classification and real-time interpretation in finance

(How) Do Language Models Track State? (arxiv:cs, 2025-03-04). Transformer language models can learn state tracking mechanisms for tasks like permutation composition, employing associative scans or permutation parity, with notable differences in robustness and controllable training outcomes

Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations (arxiv:cs, 2025-03-10). This research quantifies religious bias in open LLMs using Hamming Distance to assess demographic characteristics across diverse Asian countries, highlighting risks of a hegemonic worldview in generated outputs

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts (arxiv:cs, 2025-03-06). Chart-HQA introduces a novel Hypothetical Question Answering task for MLLMs, utilizing human-AI interactive data synthesis (HAI) to create a benchmark that highlights reasoning performance and generalization challenges in chart analysis

šŸ” RAG & Retrieval Strategies

A Practical Guide to Implementing DeepSearch / DeepResearch (simonwillison.net, 2025-03-04). DeepSearch iterates between searching, reading, and reasoning for optimal answers, contrasting with classic RAG patterns, while DeepResearch structures outputs into reports, raising concerns about perceived research quality

In-Browser Graph RAG with Kuzu-WASM and WebLLM (blog.kuzudb.com, 2025-03-10). A fully in-browser chatbot utilizing Kuzu-Wasm and WebLLM to answer LinkedIn data queries showcases Graph Retrieval-Augmented Generation (Graph RAG) techniques, enabling local AI applications without backend servers

LettuceDetect: A Hallucination Detection Framework for RAG Applications (towardsdatascience.com, 2025-03-10). LettuceDetect utilizes ModernBERT to create a lightweight hallucination detector for RAG applications, achieving competitive performance while minimizing computational costs and maintaining high efficiency in real-time systems

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation (towardsdatascience.com, 2025-03-05). Agentic Knowledge Distillation enhances Retrieval Augmented Generation (RAG) strategies using a pyramid search approach, efficiently distilling document insights into natural language while leveraging PostgreSQL and agent-based architectures for improved information retrieval

How to Deploy a RAG-Based Assistant Over Your Internal Resources (nordicapis.com, 2025-03-11). Learn to build and deploy a RAG-based assistant using tools like Kotaemon and Cohere API, enhancing LLMs with internal data for improved accuracy in natural language processing and summarization tasks

Getting an Answer is Not the Same as Coming to an Understanding (blog.ouseful.info, 2025-03-05). The article discusses the DeepSearch pattern, a development in LLMs that seeks related documents to generate answers, emphasizing that mere answers do not equate to understanding

Vector DB + RAG Maker (javierorracadeatcu.com, 2025-03-07). A new tool combining a vector database with Retrieval-Augmented Generation (RAG) enhances technical query handling in R programming by improving performance, reducing costs, and increasing accuracy with domain-specific content

šŸ”§ Practical Implementations

Predownloading embedding models in Rails with Kamal (nts.strzibny.name, 2025-03-10). Learn how to pre-download embedding models for Rails applications using Informers and Transformers.rb gems in Kamal deployments to optimize AI performance and eliminate repetitive downloads during deployment

How to create a synthetic annotator? The process of developing a domain-specific LLM-as-a-Judge. (blog.allegro.tech, 2025-03-07). Explores using Large Language Models (LLMs) as evaluators in machine learning, highlighting challenges in model evaluation, traditional metrics limitations, and the novel LLM-as-a-Judge methodology for natural language processing tasks

You also hate SQL? Let the LLM handle it (duarteocarmo.com, 2025-03-09). Duarte O.Carmo discusses using LLMs for Text-to-SQL challenges, emphasizing tools like Instructor and LiteLLM while presenting strategies for generating accurate SQL queries from natural language prompts

Word-Online: recreating Karpathyā€™s char-RNN (with supervised linear online learning of word embeddings) for text completion (thierrymoudiki.github.io, 2025-03-08). Implementing a word completion model using supervised linear online learning with an SGDClassifier, showcasing effective text generation using embeddings from Word2Vec and a char-RNN inspired architecture

Self-hosted llm-mlx: first prompt (fluffyandflakey.blog, 2025-03-04). Exploring self-hosted LLM options led to successful local setup of llm-mlx, using Python and uv to generate an Erlang tree traversal example at 199 tokens/second, demonstrating practical capabilities of localized AI tools

How to build a custom embedder in LlamaIndex: AWS Titan Multimodal example (norahsakal.com, 2025-03-05). Integrate AWS Titan Multimodal into LlamaIndex for effective text and image search by creating a custom embedder with specific configurations and packages like boto3, Pinecone client, and JSON handling

Intsets by AI (paddy3118.blogspot.com, 2025-03-05). Using AI, specifically Gemini, the author develops an efficient integer set (intset) implementation in Python, achieving significant speed improvements over traditional sets for operations involving large datasets of strings and integers

šŸ§ Analytical Perspectives

Why Do Researchers Care About Small Language Models? (quantamagazine.org, 2025-03-10). Researchers are exploring small language models (SLMs) with fewer parameters, utilizing techniques like knowledge distillation and pruning to enhance efficiency while maintaining effectiveness for specific tasks

Generality (alexgaynor.net, 2025-03-05). Machine learning models can lack generality, resulting in unexpected failures. Evaluating LLMs requires cautious attention to their specific capabilities and potential data set contamination issues

Headroom for AI development (hunch.net, 2025-03-05). Explores improving AI efficiency and capabilities beyond current transformer models, highlighting issues like sample complexity and long-term planning using examples from language learning and animal intelligence

LLM Complexity and Pricing (tersesystems.com, 2025-03-07). An exploration of LLM pricing and complexity, focusing on tools like Letta and Claude Sonnet while analyzing cost and model efficiency for specific tasks, including functions, tool calling, and recipe management integration

Using a Model to Model (blog.thestateofme.com, 2025-03-05). Large Language Models (LLMs) are transforming the handling of unstructured data, allowing for better data modeling and extraction of insights, though care must be taken with terminology and potential mixed meanings

ā€‹ā€‹How transformers expanded my view of Math and ML (mikelikejordan.bearblog.dev, 2025-03-08). Transformers, BERT, and GPT are reshaping AI with enhanced language understanding through self-attention mechanisms, surpassing RNNs and CNNs by efficiently processing sequences and contextual relationships in natural language tasks

šŸ“ LLM Evaluation Methods

Evaluating LLM using semantic entropy (thoughtworks.com, 2025-03-07). Semantic entropy evaluations can enhance trust in large language models (LLMs) by measuring output uncertainty, helping enterprise leaders deploy GenAI effectively amidst challenges including confabulation and performance inconsistencies

Evaluating LLMs - Notes on a NeurIPS'24 Tutorial (blog.quipu-strands.com, 2025-03-06). Notes from a NeurIPS'24 tutorial on evaluating LLMs emphasize rigorous testing, effective evaluation frameworks, and methodologies such as criteria-based automated evaluations, with resources like MMLU benchmarks and G-Eval metrics highlighted

RĀ² Priors for High-Dimensional Linear Regression and Autoregressive Timeseries in PyMC (austinrochford.com, 2025-03-07). PyMC implementation of RĀ²-based priors for high-dimensional linear regression and autoregressive timeseries, exploring local-global shrinkage techniques using Bayesian statistical methods

How to Make AI Evaluation Affordable: Research-Backed Methods to Cut LLM Evaluation Costs (mikulskibartosz.name, 2025-03-10). Techniques such as importance resampling, anchor points sampling, and prompt compression help reduce AI evaluation costs while maintaining performance, aiding organizations in managing budget constraints during language model evaluations

šŸŒŠ Diffusion Models in LLMs

Why I find diffusion models interesting? (rnikhil.com, 2025-03-06). Diffusion LLMs (dLLMs) generate words simultaneously, addressing issues like hallucination in traditional LLMs while enhancing agent workflows with coherent, multi-step planning and reasoning capabilities

Mercury Diffusion LLM (taoofmac.com, 2025-03-07). Mercury Diffusion LLM claims 10x speed improvement, processing over 1000 tokens per second on NVIDIA H100s, leveraging diffusion models for efficiency, yet faces challenges in text and code application

Paper Review: Large Language Diffusion Models (andlukyane.com, 2025-03-10). LLaDA utilizes a forward and reverse process for modeling distributions in large language models, featuring random masking, supervised fine-tuning, and advanced remasking strategies to outperform autoregressive models in various benchmarks

šŸ§  LLM Internal Mechanisms

Ladder: Self-improving LLMs through recursive problem decomposition (arxiv.org, 2025-03-07). LADDER proposes a framework for self-improving large language models (LLMs) using recursive problem decomposition, supported by contributions from institutions and the Simons Foundation

Understanding Attention in LLMs (bartoszmilewski.com, 2025-03-06). An overview of attention mechanisms in Large Language Models, focusing on multi-dimensional vector embeddings, context-based meaning derivation, and the softmax normalization process for calculating attention weights

The Unreasonable Effectiveness of Non-Transformer Architectures for Language Generation (medium.com/intuitionmachine, 2025-03-09). Non-Transformer architectures like RWKV, Mamba, and Liquid Neural Networks showcase remarkable efficiency in language generation, utilizing innovative techniques for sequence modeling, deep hierarchical representations, and scalable training despite the dominance of Transformer models

Writing an LLM from scratch, part 9 -- causal attention (gilesthomas.com, 2025-03-09). Causal attention enables model tokens to focus only on prior tokens, achieved through techniques like masking and normalisation using PyTorch's torch.tril and torch.triu functions to enhance LLM efficiency and performance

Writing an LLM from scratch, part 8 -- trainable self-attention (gilesthomas.com, 2025-03-04). Explores implementing trainable self-attention for LLMs through scaled dot product attention, including matrix projections and context vector calculations for token relationships in input sequences

Exploring LLMs as Agents: Planning via Prompting (starkravingfinkle.org, 2025-03-09). Mark Finkle explores planning strategies for LLM agents, focusing on prompt engineering techniques like Chain of Thought and ReAct, and discusses improvements in task execution through reflection, corrections, and tool consistency

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 (twimlai.com, 2025-03-10). Chengzu Li discusses 'Multimodal Visualization-of-Thought,' exploring frameworks like token discrepancy loss, TopViewRS, and applications in robotics and architectural design, along with spatial reasoning principles in cognitive science

Understanding Transformers... (beyond the Math) (kalomaze.bearblog.dev, 2025-03-09). An experimental exploration of transformers as state simulators, discussing in-context learning, temperature settings in token predictions using tools like llama.cpp, and techniques for understanding complex models intuitively

šŸ§Ŗ Hardware & Architecture

16-Bit to 1-Bit: Visual KV Cache Quantization for Efficient Multimodal LLMs (arxiv.org, 2025-03-05). Visual KV cache quantization is explored for improving memory efficiency in multimodal large language models, transitioning from 16-bit to 1-bit representations for better performance and storage capabilities

FPGA & HPCA 2025 (constantinides.net, 2025-03-06). Highlights from FPGA 2025 and HPCA 2025 conferences include keynotes on AI architectures, performances in FPGA applications like LUT-based machine learning, and discussions on memory-efficient encoders and architectural challenges in LLM acceleration

The Next Frontier of LLM Applications: Open Ecosystems and Hardware Synergy (arxiv:cs, 2025-03-06). Large Language Model (LLM) applications face challenges like platform silos and fragmented hardware. A proposed three-layer architecture enhances modularity and cross-platform compatibility, addressing security and privacy for scalable AI ecosystems

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention (arxiv:cs, 2025-03-05). PowerAttention introduces a novel sparse attention design for LLMs, achieving exponential receptive field growth and outperforming static methods by 5-40%, enhancing efficiency during long-range dependency tasks while maintaining competitive time complexity

ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput (arxiv:cs, 2025-03-06). ADOR is a framework that optimizes hardware architectures for Large Language Models, achieving 2.51x higher QoS and 4.01x better area efficiency compared to A100, balancing throughput and latency for scalable AI-serving

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization (arxiv:cs, 2025-03-06). HybridNorm combines Pre-Norm and Post-Norm for training transformers, utilizing QKV normalization in attention mechanisms and Post-Norm in feed-forward networks, resulting in enhanced stability and performance across benchmarks

Don't miss next week's newsletter!

Newsletters sent once a week, unsubscribe anytime.