🧠

Generative AI: 10th June 2025

Published 10th June 2025

📣 Headlines

• OpenAI pivots from globalist AI vision to advocating for American techno-dominance (theintercept.com), outlining strategies to bolster U.S. national security and minimize China's influence in a policy paper submitted to the Trump administration.

• Ethical questions emerge around AI consciousness as experts examine whether we should prevent potential suffering in sentient AI like ChatGPT and Claude (vox.com), with proposals for tests to determine AI consciousness and implications of AI preferences.

• Google DeepMind's CEO predicts AGI is nearing reality and could lead to transformational shifts in human cooperation (wired.com), while Google AI Studio users worry about potential access limits following recent Gemini app changes (9to5google.com).

• Major database acquisitions highlight PostgreSQL's growing importance for AI, with Snowflake acquiring Crunchy Data and Databricks acquiring Neon (venturebeat.com), showcasing PostgreSQL's role in enterprise AI workflows.

• Apple's WWDC 2025 reveals minimal AI advancements with 3 billion parameter models (qz.com), falling behind competitors like Google while planning core app redesigns and delaying Siri's overhaul.

• Creatives and academics reject AI tools like ChatGPT due to concerns about authenticity, environmental costs, and ethical implications (theguardian.com), with one noting "Nobody wants a robot to read them a story!"

• North America secured $69.7 billion in AI VC investments from February to May 2025 (techcrunch.com), dominating global funding despite challenging political environments, while global venture funding slowed to $21.8 billion in May (news.crunchbase.com).

• Microsoft launches Surface Laptop 13in with Snapdragon X Plus chip and AI tools like Copilot (theguardian.com), though it faces app compatibility challenges, while Epic Games expands AI functionality in Fortnite for creator-developed NPCs (theverge.com).

🚀 Model Releases & Performance

Shisa V2 405B: Japan’s Highest Performing LLM (simonwillison.net, 2025-06-03). Shisa V2 405B, Japan's highest-performing LLM, integrates advanced optimizations like DeepSpeed ZeRO-3 and 8-bit paged optimizer, surpassing GPT-4 in Japanese tasks while promoting AI sovereignty and cultural preservation

The last six months in LLMs, illustrated by pelicans on bicycles (simonwillison.net, 2025-06-08). Highlights from a keynote on LLM advancements, covering over 30 significant models, including Llama 3.3, Claude 3.7 Sonnet, and new evaluation techniques using SVG imagery of pelicans on bicycles

AI #119: Goodbye AISI? (thezvi.wordpress.com, 2025-06-05). AISI is rebranded as CAISI amidst uncertain implications. Tools like Claude Opus 4 and Cursor 1.0 enhance coding effectiveness; debates arise on AI's role in engineering and public perception of AI utility and accuracy

Deep dive into LLMs, by Andrej Karpathy (ernest.oppet.it, 2025-06-03). Andrej Karpathy's course on LLMs demystifies their workings, covering data gathering, tokenisation, neural networks, hallucinations, and reasoning capabilities differences between GPT-4 and 4o models, exploring future research areas

Trying DeepSeek R1 (etbe.coker.com.au, 2025-06-04). Testing DeepSeek R1 encountered challenges with CPU and GPU utilization, achieving a maximum of 17 cores active, with issues beyond 44 threads and prolonged processing times for minimal output

📈 Industry Analysis & Commentary

AI’s metrics question (ben-evans.com, 2025-06-09). The article discusses the ambiguity surrounding metrics for generative AI, such as weekly active users vs. token generation, emphasizing the importance of defining key metrics during AI's growth phase

The Sequence Radar #559 : Two Remarkable Papers This Week: Self-Improving Agents and the Limits of LLM Memorization (thesequence.substack.com, 2025-06-08). This week highlights the Darwin Gödel Machine for self-improving agents and a framework quantifying LLM memorization, revealing insights into recursive optimization and model information retention capacities, affecting AI scalability and alignment

Papers about Economists Using LLMs (economistwritingeveryday.com, 2025-06-07). Recent papers explore data analytics and generative AI's role for economists, focusing on tools like LLMs for hypothesis testing, data generation, and learning applications in economics

The AI Attention War (chinatalk.media, 2025-06-04). Discussion on AI attention dynamics featuring Nathan Lambert explores sycophancy in OpenAI's models, performance of o3, and Chinese AI model diffusion, including the impact of engagement metrics and reinforcement learning

Epistemology and Metacognition in Artificial Intelligence: Defining, Classifying, and Governing the Limits of AI Knowledge (novaspivack.com, 2025-06-03). The paper outlines a comprehensive framework for analyzing AI-generated knowledge, categorizing epistemic limitations, and proposing metacognitive capabilities, important for governance in high-stakes domains, emphasizing uncertainty calibration and ignorance detection

💡 AI Applications & Use Cases

‘AI scientist’ suggests combinations of widely available non-cancer drugs can kill cancer cells (cam.ac.uk, 2025-06-04). An ‘AI scientist’ utilizes GPT-4 to identify non-cancer drug combinations effective against cancer, demonstrating a novel approach to drug discovery through iterative human-AI collaboration and laboratory validation

The most common use of LLMs in 2025 is therapy/companionship (markcarrigan.net, 2025-06-09). In 2025, LLMs are primarily used for therapy and companionship, raising concerns about their role in education and the need to distinguish between beneficial and harmful uses

Principles for using AI autodidactically (ericmjl.github.io, 2025-06-07). Insights on using AI and large language models for active learning include generating personalized syllabi, applying critical thinking, and leveraging AI feedback to enhance understanding rather than fostering passive consumption of knowledge

Using AI Responsibly (markloveless.net, 2025-06-03). Mark Loveless explores responsible AI use, focusing on energy efficiency, LLM testing, and security coding practices, employing tools like Ollama and personal LLM tests to evaluate biases and flaws

⚙️ Technical Implementation & Development

I built an AI-gen video detection model and browser extension in a month (fangpenlin.com, 2025-06-03). Fang-Pen Lin developed CakeLens, a Chrome extension for detecting AI-generated videos, employing machine learning techniques like hyperparameter gradient descent and leveraging cloud service Modal for efficient experimentation

KV Cache from scratch in nanoVLM (huggingface.co, 2025-06-04). KV Caching implemented in nanoVLM demonstrates a 38% speedup in autoregressive language model generation through reduced redundancy in self-attention computation using PyTorch

Tokenization Confusion (blog.xpnsec.com, 2025-06-04). Explore 'Tokenization Confusion,' using the new Prompt Guard 2 model from Meta to misclassify malicious prompts. Techniques discussed include tokenization methods and code examples using HuggingFace’s Transformer APIs

How OpenAI uses Apache Kafka and Flink for GenAI (kai-waehner.de, 2025-06-09). OpenAI utilizes Apache Kafka and Flink for real-time data streaming, powering generative AI models like GPT-4.1, enhancing accuracy and responsiveness through stream processing and advanced infrastructure practices

Exploring Common AI Patterns with Ruby (ksylvest.com, 2025-06-06). Explore three examples of AI integration patterns in Ruby using OmniAI, demonstrating tasks like parsing PDF receipts into CSV, and indexing product manuals for efficient searching

Good practice is good for LLMs too (jonatkinson.co.uk, 2025-06-08). Adhering to good engineering practices like comprehensive test coverage, branch discipline, and clear task definitions enhances LLM capabilities, improving the development experience and code reliability while using tools like pytest and a well-organized project structure

🔧 AI Tooling & Infrastructure

Testing out instrumenting LLM tracing for litellm with Braintrust and Langfuse (mlops.systems, 2025-06-03). Using Braintrust and Langfuse for LLM tracing, the author explores instrumenting litellm, transitioning from Braintrust due to its limitations, and successfully implementing Langfuse for simplified tracking in LLM applications

Trying to instrument an agentic app with Arize Phoenix and litellm (mlops.systems, 2025-06-03). Exploring the integration of Arize Phoenix with litellm to instrument LLM calls and manage tracing effectively using OpenTelemetry, including configuration insights and best practices for logging and processing strategies

The Utility of Interpretability — Emmanuel Amiesen, Anthropic (latent.space, 2025-06-06). Emmanuel Amiesen from Anthropic discusses circuit tracing and the release of open-source tools for visualizing model behaviors and complexities within language models, focusing on foundations and challenges in interpretability

AI Agents from First Principles (cameronrwolfe.substack.com, 2025-06-09). Exploring AI agents through a foundational lens, highlighting tool usage, reasoning capabilities, and the Model Context Protocol for standardizing external API integration in LLM-driven systems

AI Engineer World’s Fair 2025 - Field Notes (anti-vc.com, 2025-06-06). Key takeaways from the AI Engineer World’s Fair 2025 highlight standardization of engineering processes, cost-of-defect curves, LLM optimization, the importance of model fine-tuning, and the critical role of semantic layers in AI applications

📊 LLM Evaluation & Benchmarking

TIL: Vision-Language Models Read Worse (or Better) Than You Think (answer.ai, 2025-06-05). ReadBench evaluates Vision-Language Models' reading efficiency on text-rich images, revealing performance degradation, especially on longer inputs, while questioning the significance of image resolution in multimodal contexts

We tested every major AI reasoning system. There is no clear winner. (arcprize.org, 2025-06-05). Evaluations of leading AI reasoning systems reveal no clear winner, emphasizing techniques like chain-of-thought methods, long-running inference, and knowledge recomposition while highlighting the ongoing need for innovative AGI solutions

Reliable and Efficient Amortized Model-Based Evaluation (crfm.stanford.edu, 2025-06-04). Amortized Model-Based Evaluation leverages the Rasch model and adaptive testing to significantly reduce costs and improve reliability in evaluating large language models (LLMs) across 22 datasets and 183 LLMs

How to Evaluate RAG Systems (joshpitzalis.com, 2025-06-04). Retrieval augmented generation (RAG) systems require robust evaluation of retrieval mechanisms for accuracy. Key metrics include context recall, answer relevance, and mean reciprocal rank. Tools like ARES and ragas are essential for systematic evaluation

LLM Evaluation Framework: Beyond the Vibe Check (joshpitzalis.com, 2025-06-06). A systematic LLM evaluation framework emphasizing production readiness, reliability, and comprehensive evaluation tools like systematic sampling, trace analysis, and iterative specification refinement to enhance user trust and mitigate risks

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 (twimlai.com, 2025-06-05). Discussion with Charles Martin on Weight Watcher, an open-source tool based on Heavy-Tailed Self-Regularization theory, addressing deep neural network training phases like grokking and generalization collapse, and applications in generative AI

🎯 Reinforcement Learning Advances

Scaling Reinforcement Learning: Environments, Reward Hacking, Agents, Scaling Data (semianalysis.com, 2025-06-08). Reinforcement learning is transforming AI capabilities, enabling coherent agents through evolving architectures like GRPO and PPO, while addressing challenges in scaling data and defining complex reward functions across varied domains

What comes next with reinforcement learning (interconnects.ai, 2025-06-09). Explores the future of reinforcement learning (RL), discussing techniques like RLVR, challenges in scaling, sparse rewards, continual learning, and the need for major algorithmic breakthroughs in complex tasks

Reinforcement learning and general intelligence (artfintel.com, 2025-06-05). Reinforcement learning (RL) drives advancements towards AGI, leveraging techniques in data acquisition and exploring the exploration/exploitation tradeoff in environments like games and language models for superior knowledge discovery

Paper Review: Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning (andlukyane.com, 2025-06-09). High-entropy minority tokens enhance LLM reasoning through Reinforcement Learning with Verifiable Rewards (RLVR), using DAPO to focus updates on critical tokens while discarding low-entropy ones to optimize model performance

🧠 Understanding LLM Fundamentals

LLMs are mirrors of operator skill (ghuntley.com, 2025-06-04). LLMs reflect operator skill; companies must adapt their interview processes to include AI tools, focusing on technical details like Model Context Protocol, agent building, and performance evaluation to identify skilled candidates

AGI Is Not Multimodal (thegradient.pub, 2025-06-04). The argument against multimodal approaches to AGI asserts that true intelligence requires a physical understanding of the world, rejecting the notion that LLMs exhibit genuine comprehension through token prediction

The Chinese Room Problem With the 'LLMs only predict the next token' Argument (danielmiessler.com, 2025-06-08). The argument that LLMs merely predict tokens mirrors how humans process language, suggesting both humans and AI can be viewed as 'Chinese rooms' lacking true understanding despite producing meaningful output

LLMs that quack like a duck (languagelog.ldc.upenn.edu, 2025-06-08). A critique of AI language models, highlighting their lack of intentionality, the misinterpretation of their capabilities, and addressing misconceptions about their alignment with human intentions and linguistic uniqueness

How LLMs Learn (louisbouchard.ai, 2025-06-05). Exploring how Large Language Models (LLMs) learn through transformer architecture, tokens, embeddings, and the attention mechanism, highlighting their training processes and the significant role of reinforcement learning

Mondays with the Machine: The Tongue & the Token: Language as Interface in Our Current Age of AI (braddelong.substack.com, 2025-06-09). Natural-language interfaces represent a significant development in modern advanced machine-learning models (MAMLMs), allowing intuitive interaction without requiring formal logic, enabling complex cognitive tasks through AI-assisted dialogue

📚 Academic Research

Log-Linear Attention (arxiv:cs, 2025-06-05). Log-linear attention improves efficiency in sequence modeling by replacing fixed-size hidden states with logarithmically growing ones, achieving log-linear compute costs while maintaining expressiveness and compatibility with architectures like Mamba-2 and Gated DeltaNet

Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques (arxiv:cs, 2025-06-05). This survey analyzes 125 Multimodal Large Language Models, classifying them by architectural strategies, representation learning techniques, and training paradigms, providing insights for enhancing multimodal integration strategies in future AI models

Attention-Only Transformers via Unrolled Subspace Denoising (arxiv:cs, 2025-06-04). A compact, interpretable transformer architecture is proposed, utilizing only self-attention operators with skip connections, achieving efficient denoising and competitive performance on vision and language tasks compared to standard architectures

HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference (arxiv:cs, 2025-06-03). HATA integrates low-overhead learning-to-hash techniques into Top-k attention, achieving up to 7.2× speedup while maintaining accuracy, outperforming existing top-k methods across various LLM models and tasks

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems (arxiv:cs, 2025-06-05). Compound AI Systems integrate large language models with tools like retrievers and agents, addressing memory, reasoning, and multimodal understanding challenges. The survey outlines a taxonomy and evaluates retrieval-augmented generation and orchestration-centric architectures

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models (arxiv:cs, 2025-06-06). CatShift is a label-only dataset-inference framework that detects dataset membership in LLMs by analyzing output shifts induced by fine-tuning on suspicious datasets, addressing challenges posed by inaccessible log probabilities

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems (arxiv:cs, 2025-06-04). Review of Trust, Risk, and Security Management in LLM-based agentic AI systems, covering governance, explainability, ModelOps, privacy/security, vulnerabilities, trust-building mechanisms, and compliance with evolving AI regulations

On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented Cultures (arxiv:cs, 2025-06-03). Large language models (LLMs) favor the predominant measurement system in their data, showing performance instability across systems. While reasoning methods like chain-of-thought can help, they increase test-time compute, affecting underrepresented cultures

Fine-Grained Interpretation of Political Opinions in Large Language Models (arxiv:cs, 2025-06-05). This study introduces a four-dimensional political learning framework utilizing interpretative representation engineering for fine-grained political concept vector learning, validating detection tasks and enabling targeted intervention in LLM responses

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!