Generative AI
Published 13th May 2025
In the news
-
Carnegie Mellon introduces LegoGPT, an AI model that generates stable Lego designs from text prompts, employing large language models and physics checks to ensure the creations can be built in real life.
-
Google announces Gemini 2.5 Pro (I/O Edition) with enhanced coding capabilities, supporting code transformation, error reduction, and achieving top scores in benchmarks, ahead of the upcoming Google I/O 2025 event set for May 20-21.
-
Despite AI advances, humans still outperform AI in forecasting future events, with AI struggling with logical reasoning despite relying on datasets from OpenAI and others.
-
Lightricks launches LTX Video-13B, an open-source AI video model achieving 30x faster generation using multiscale rendering on consumer hardware, making high-quality video creation accessible without expensive GPUs.
-
OpenAI launches reinforcement fine-tuning for the o4-mini reasoning model, allowing enterprises to customize it based on specific needs, while Epoch AI's analysis indicates that advances in reasoning AI models may decelerate soon due to scaling challenges.
-
Hugging Face releases Open Computer Agent, a free cloud-hosted AI agent operating a Linux virtual machine, while RelevanceAI and Stack AI secure millions to develop low-code platforms for creating custom AI agents for various enterprise tasks.
-
Anthropic introduces a web search API for its Claude large language models, enhancing real-time information retrieval and increasing competitive pressure on Google as AI-based search gains popularity.
-
Microsoft's Phi4 mini-reasoning LLM can run on a Raspberry Pi but with significant performance limitations, taking up to 10 minutes to answer simple questions compared to GPU-accelerated systems.
⚙️ AI Development & Tooling
Tips when generative coding (alexhyett.com, 2025-05-11). Utilizing AI for efficient coding, the author shares strategies such as setting clear requirements, using familiar technologies, breaking down tasks, and regular refactoring to enhance coding projects like a stock analysis tool using Python and Plotly
Been doing a lot of (hand)writing outside my blog (blog.lmorchard.com, 2025-05-09). The author reflects on extensive handwritten journaling using a BOOX Tab Ultra C e-ink tablet, plans to train a handwriting recognition model, and aims to analyze themes through LLMs and RAG techniques
LLM4FTS: Enhancing Large Language Models for Financial Time Series Prediction (blog.raymond.burkholder.net, 2025-05-08). LLM4FTS enhances large language models for financial time series prediction using K-means++ clustering for pattern recognition, adaptive patch segmentation, and dynamic wavelet convolution for improved forecasting accuracy
Java meets AI: Build LLM-Powered Apps with LangChain4j (hollycummins.com, 2025-05-08). Holly Cummins discusses Java, AI, and the development of LLM-powered applications using LangChain4j, while addressing the impacts of Moore's law, concurrent programming, and the importance of various interdisciplinary knowledge areas
🤔 Reflections & Adoption
Sloppy choices: the why and then-what of institutional LLM adoption (blog.rinesi.com, 2025-05-08). Exploring the adoption of LLMs in institutions reveals complexities driven by individual incentives, varying beliefs, and the technology's fluency versus expertise, with implications for the future of AI and organizational strategies
Slightly delayed... (searchresearch1.blogspot.com, 2025-05-08). Dan Russell discusses the impact of AI, particularly LLMs, on information retrieval in his upcoming talk at the Pioneer Centre for AI Research in Copenhagen, emphasizing the public's understanding of AI
AI and the devil (richardcoyne.com, 2025-05-10). The article discusses skepticism around AI, comparing it to 'digital Ouija boards.' It explores risks like bias and psychosis from LLM interactions and emphasizes the importance of spiritual discernment in technological engagement
🤖 Agentic & Workflow Patterns
Virtual Adrian Revisited as meGPT (adrianco.medium.com/virtual-adrian-revisited-as-megpt-5db561ef77b4, 2025-05-11). Adrian Cockcroft explores meGPT, an AI-driven tool that personalizes advice and learns from his extensive body of work, employing Retrieval Augmented Generation and integrating various content formats like blogs, podcasts, and videos
Function calling using LLMs (martinfowler.com, 2025-05-06). Function calling in LLMs enables AI agents to process user intent and interact with APIs, using structured JSON outputs for function calls, exemplified by a Shopping Agent implementation in Python
What Are AI Primitives? (jjude.com, 2025-05-08). Exploring the convergence of AI primitives: chat interfaces, generative models, RAG pipelines, proprietary data, and agentic workflows, and their impact on industries like entertainment, shopping, education, and healthcare
Agents Are Workflows (leehanchung.github.io, 2025-05-09). Explore how LLM agents can be modeled as Markov Decision Processes and implemented using Directed Acyclic Graphs and Finite State Machines, leveraging Bellman's Equation for understanding agent workflows
🔒 Ethics, Alignment & Misinformation
Why the Chinese Government Taught AI to Lie (petewarden.com, 2025-05-08). The Chinese government employs censorship in AI training, exemplified through models like QWEN v3, which provide sanitized responses to sensitive topics like Tiananmen Square, contrasting with uncensored counterparts
2025-05-06: Part 1 - Large Language Models (LLMs) are hallucinating in Arabic about the Quran - Part 1 (Google Gemini) (ws-dl.blogspot.com, 2025-05-06). Large Language Models, like Google Gemini, can generate false Quranic references, highlighting issues of misinformation and accuracy in LLM-generated content, particularly for Arabic texts, as users depend on such models for religious understanding
A Framework for Measuring and Fixing AI Hallucinations (mikulskibartosz.name, 2025-05-12). AI hallucinations undermine trust in products; this guide outlines strategies for measurement, debugging, and mitigation techniques like retrieval-augmented generation (RAG) and fine-tuning to address intrinsic and extrinsic issues
🧠 Model Capabilities & Reasoning
Why do LLMs have emergent properties? (johndcook.com, 2025-05-08). Large language models exhibit emergent abilities as parameter counts rise, with examples from linear regression, clustering, and Boolean circuits illustrating how sudden increases in capacity can lead to new, unexpected capabilities
Beyond the hype of reasoning models: debunking three common misunderstandings (zansara.dev, 2025-05-12). The emergence of reasoning models like OpenAI's o1, DeepSeek R1, and others marks a significant evolution in LLMs, yet misconceptions about their capabilities persist, particularly regarding AGI, AI agents, and Chain of Thought prompts
Why can't language models come up with new ideas? (seangoedecke.com, 2025-05-11). Language models struggle to create novel ideas despite vast knowledge, as they primarily combine existing concepts, raising questions on their structural limitations and the necessary scaffolding for innovation
🖥️ Local LLM Explorations
Trying out llama.cpp's new vision support (simonwillison.net, 2025-05-10). llama.cpp now supports vision models via libmtmd, allowing users to run interactive sessions and perform image analysis with pre-compiled binaries on macOS. A web server can be accessed for enhanced interaction
Dragging Myself Kicking and Screaming Into the Future (mediocregopher.com, 2025-05-11). After experimenting with Llama 3 8B Instruct and Llama-CPP, the author shares insights on local AI adoption, benefits for development workflows, and maintaining data control amidst newfound excitement for the technology
Qwen3 Leads the Pack: Evaluating how Local LLMs tackle First Year CS OCaml exercises (toao.com, 2025-05-06). Qwen3 models demonstrate superior performance on first-year CS OCaml exercises, outpacing competitors like Claude 3.7. Methods include thinking mode, OpenRouter for inference, and evaluation of coding tasks addressing core programming concepts
🔬 Model Internals & Architectures
Embeddings are underrated (2024) (technicalwriting.dev, 2025-05-12). Embeddings, crucial in machine learning, advance technical writing by allowing semantic comparisons of text through numerical vectors, as illustrated with tools like Gemini and Voyage AI's models
Block Diffusion: Interpolating Autoregressive and Diffusion Language Models (m-arriola.com, 2025-05-08). Block Diffusion introduces a novel class of language models blending autoregressive and diffusion techniques, achieving high quality, arbitrary-length generation with parallelization and improved inference efficiency via KV caching
Absolute Zero Reasoner (andrewzh112.github.io, 2025-05-08). Absolute Zero Reasoner presents a paradigm for autonomous task creation in reinforcement learning, leveraging self-play without external data, achieving superior reasoning performance using verifiable rewards and a code executor for training
Notes on the phi-4-reasoning Technical Paper (nishtahir.com, 2025-05-10). Phi-4 reasoning, a 14B parameter model by Microsoft, utilizes supervised fine-tuning and synthetic dataset curation, outperforming larger models. The approach emphasizes careful dataset curation and challenging examples for enhanced reasoning performance
Writing an LLM from scratch, part 13 -- the 'why' of attention, or: attention heads are dumb (gilesthomas.com, 2025-05-08). Explores the simplicity of attention heads in multi-head attention and how their mechanisms aid in creating complex representations for language models, addressing key concepts like embeddings and context vectors
📚 Academic & Scholarly
Byte latent transformer: Patches scale better than tokens (arxiv.org, 2025-05-12). Byte Latent Transformer explores the concept that patches, rather than tokens, provide better scalability in natural language processing tasks, leveraging advanced transformer architectures
Understanding In-context Learning of Addition via Activation Subspaces (arxiv:cs, 2025-05-08). Llama-3-8B successfully learns addition through few-shot examples, utilizing three attention heads to track six-dimensional subspaces, revealing a self-correction mechanism that refines predictions based on earlier mistakes
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey (arxiv:cs, 2025-05-06). This survey examines the potential of Large Language Models in complex problem-solving through techniques like Chain-of-Thought reasoning and knowledge augmentation, addressing challenges in software engineering, mathematical reasoning, data analysis, and scientific research
Can LLM-based Financial Investing Strategies Outperform the Market in Long Run? (arxiv:q-fin, 2025-05-11). FINSABER framework evaluates LLM-based investing strategies, revealing that LLM advantages erode over longer terms with broader symbol sets, underperforming in bull markets and incurring losses in bear markets due to risk control failures
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM (arxiv:cs, 2025-05-09). STARC is a novel data mapping scheme optimizing sparsity for LLM decoding on PIM architectures, reducing attention-layer latency by 19%--31% and energy consumption by 19%--27% while maintaining accuracy
GenAI in Entrepreneurship: a systematic review of generative artificial intelligence in entrepreneurship research: current issues and future directions (arxiv:econ, 2025-05-08). A systematic literature review identifies five thematic clusters on Generative AI and entrepreneurship, leveraging TF-IDF, PCA, and clustering techniques, highlighting research gaps and the need for ethical frameworks in business innovation
WaterDrum: Watermarking for Data-centric Unlearning Metric (arxiv:cs, 2025-05-08). WaterDrum introduces a data-centric unlearning metric for large language models, leveraging robust text watermarking to address unlearning challenges, alongside new benchmark datasets for evaluating unlearning algorithms in realistic settings
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data (arxiv:cs, 2025-05-08). Ultra-FineWeb introduces an efficient verification strategy using a lightweight fastText classifier for high-quality LLM training data filtering, creating a dataset with 1 trillion English tokens, enhancing model performance across various benchmarks
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!