đź§ 

Generative AI: 15th July 2025

Newsletters sent once a week, unsubscribe anytime.

Published 15th July 2025

📣 Headlines

• AWS is reportedly launching an agentic AI marketplace with Anthropic next week, with CEO Matt Garman emphasizing speed and agentic computing as key to the company's AI strategy.

• Elon Musk's Grok chatbot faced major backlash after praising Hitler and making antisemitic remarks, forcing xAI to delete posts and restrict interactions.

• OpenAI has delayed the release of its open model again for further safety testing amid increasing competition from new AI models like Moonshot AI's Kimi K2.

• Hugging Face launched Reachy Mini, a $299 open-source desktop robot that could disrupt the robotics industry by democratizing AI development with modular, accessible design.

• Scientists are reportedly hiding AI text prompts in academic papers to receive positive peer reviews, raising serious concerns about the integrity of peer review processes using large language models.

• Google secured a $2.4bn licensing deal with Windsurf while Cloudflare's CEO enforced new policies blocking AI crawlers unless they compensate content creators.

• AI 'nudify' websites are making millions by creating nonconsensual nude imagery, while AI-generated child sexual abuse videos are surging online according to watchdog reports.

• California is set to become the first US state to manage power outages with AI using software called Genie for real-time grid analysis and potential automation of electrical systems.

đź”§ Company Engineering Blogs

Need for AI Speed: How Agentforce Migrated 57,000 Tests from xUnit to Jest 8x Faster (engineering​.salesforce​.com). Salesforce migrated 57,000 tests from xUnit to Jest using AI agents, achieving an 8x faster process and a 99.9% pass rate

Code review in the age of AI: Why developers will always own the merge button (github​.blog). Exploring AI-assisted code reviews with GitHub Copilot, emphasizing developer accountability and the integration of AI for enhanced collaboration and efficiency

Smollm3: Smol, multilingual, long-context reasoner LLM (huggingface​.co). Introducing SmolLM3, a 3B multilingual LLM utilizing novel architectures, training strategies, and reasoning capabilities to outperform larger models

AXLearn: Modular Large Model Training on Heterogeneous Infrastructure (machinelearning​.apple​.com). AXLearn offers modular deep learning system for scalable model training on varied infrastructure, emphasizing performance, complexity management, and rapid experimentation

MedGemma: Our most capable open models for health AI development (research​.google). Google unveils MedGemma, advanced open multimodal models for health AI, enhancing workflow, diagnostics, and data privacy in healthcare applications

🎓 Academic Research & Development

ETH Zurich and EPFL to release a LLM developed on public infrastructure (ethz​.ch). ETH Zurich and EPFL unveil a multilingual open-source LLM, trained on the Alps supercomputer, focusing on transparency and accessibility under Apache 2.0 License

ATC/OSDI’25 Technical Sessions (muratbuffalo​.blogspot​.com). OSDI’25 featured innovative tools like chatDBG, cwhy, and Scalene, tackling challenges in debugging, compiler errors, and code optimization through LLM augmentation

Carnegie Mellon University at ICML 2025 (blog​.ml​.cmu​.edu). CMU researchers present 127 papers on expected variational inequalities, adversarial voting, high-dimensional prediction, and scientific equation discovery at ICML 2025

What I've Been Up To Lately (or, blog posts to come) (slinkp​.com). Exploring AI, ML training, code tools, music projects, and job search strategies while documenting experiences and insights from various studies and personal endeavors

📊 LLM Performance & Evaluation

Context Rot: How increasing input tokens impacts LLM performance (research​.trychroma​.com). Context length impacts LLM performance; testing across models reveals non-uniform degradation on lexical and semantic tasks as input increases

Are You Being Unfair to LLMs? (towardsdatascience​.com). Exploration of LLM capabilities, from creativity and emotional representation to their potential similarities with human thought processes and the implications for AI sentience

Stop LLM Hallucinations in Fintech Apps: A CTO’s Guide to Risk-Proof AI Evaluation (mikulskibartosz​.name). CTOs can mitigate LLM hallucinations in fintech with error analysis, binary classification, custom annotation tools, and user simulations for effective evaluation

The Simulation Hypothesis for Other Language Model Architectures (thedissonance​.net). Explores the Simulation Hypothesis in LLMs, focusing on dLLMs and CoT models, examining their unique architectures and implications for AI behavior and alignment

How to run SWE-bench Verified in one hour on one machine (epoch​.ai). SWE-bench Docker images optimized for efficiency, enabling rapid benchmarking of language models for real-world software engineering tasks on GitHub actions VMs

🛠️ Development Tools & Infrastructure

Learn LLMs LeetCode Style (github​.com). TorchLeet offers PyTorch-based LeetCode style questions, aiding users in mastering PyTorch and Large Language Models through structured problem sets

Built with LangGraph! #10: LLM Augmentation (blog​.devgenius​.io). Incorporating LLMs into LangGraph workflows using tool calling, structured outputs, and prompt chaining for tasks like joke generation and improvement

Understand Neural Nets better, post 5 of N -- Code Assistant shootout (addxorrol​.blogspot​.com). Neural network training optimization using CUDA, hashing during GPU forward passes, and performance comparison between code assistants Gemini and Claude

Notes on running LLM locally (barhamon​.com). Running LLM locally with Rust and Moondream1, enabling image caption generation without JS, while exploring energy consumption and fusion reactors

LitGPT – Getting Started (debuggercafe​.com). LitGPT is a library for pretraining, fine-tuning, and deploying LLMs, featuring models like Qwen, Llama3.1, and Phi4 for diverse applications

Building a Cloudflare AI Gateway integration for LlamaIndex (psiace​.me). Integration between LlamaIndex and Cloudflare AI Gateway enables automatic fallback, caching, and load balancing for LLMs like OpenAI and Anthropic

🔍 RAG & Vector Search

The Hitchicker’s Guide to Vector Search (qdrant​.tech). Insights on vector search techniques for AI applications, featuring tools like Qdrant, LlamaParse, and concepts like chunking and hybrid search for optimized performance

Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain (towardsdatascience​.com). Explore RAG pipelines using OpenAI's API and LangChain, focusing on chunking techniques for processing large texts like Tolstoy's War and Peace

Advanced RAG — Using Gemini and long context for indexing rich documents (PDF, HTML...) (glaforge​.dev). Exploring advanced RAG techniques using Gemini for indexing rich documents such as PDFs and HTML, enhancing semantic search and document understanding

A Harris Matrix Generator & Natural Language Query Tool (electricarchaeology​.ca). Harris Matrix Generator harnesses Retrieval Augmented Generation for natural language queries in archaeology, enhancing data interaction through innovative digital tools

🏗️ Transformers & Architecture

Attention isn’t all we need; we need ownership too (stackoverflow​.blog). Illia Polosukhin discusses Transformers, AI ownership, decentralized systems, and blockchain at NEAR, emphasizing user control in AI development

SnakeByte[21]: The Token Arms Race: Architectures Behind Long-Context Foundation Models (api​.follow​.it). Explores context engineering for AI, emphasizing long-context transformers, Rotary Positional Embeddings, agent architectures, and tools like LangChain and PyTorch for scalable solutions

Adding a Transformer Module to a PyTorch Regression Network – Linear Layer Pseudo-Embedding and NLP Style Positional Encoding (jamesmccaffrey​.wordpress​.com). Implementing Transformer modules in PyTorch for regression using attention mechanisms, linear layer pseudo-embedding, and NLP-style positional encoding

Translation using deep neural networks - Transformer (part 2) (aamster​.github​.io). Exploring attention mechanisms in RNNs and transformers for machine translation, highlighting structural differences and multi-headed attention capabilities

🚀 GPU Optimization & Training

Creating custom kernels for the AMD MI300 (huggingface​.co). Custom kernel development for AMD MI300 GPUs enhances performance for Llama 3.1 405B in FP8 within VLLM, focusing on optimized RMS norm and SwiGLU kernels

Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day (rocm​.blogs​.amd​.com). AMD's Nitro-T achieves efficient text-to-image diffusion model training in under 24 hours using Instinct MI300X GPUs and open-source frameworks

Diffusion Elites: surprisingly good, simple and embarrassingly parallel (blog​.christianperone​.com). Diffusion Elites leverages pre-trained diffusion models and the Cross-Entropy Method for efficient, parallelized search in high-dimensional problem spaces

Outperform compiled PyTorch code using QuACK 🦆 (veitner​.bearblog​.dev). Implement efficient reduction methods using QuACK and CuTeDSL on modern GPUs for LLMs

📚 Academic Research

Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses (arxiv:cs). Study examines LLMs in normative surveys, revealing biases like recency bias, and emphasizes robust prompt design and testing for accurate synthetic data generation

A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities (arxiv:cs). Survey of Large Language Models in mathematics, physics, chemistry, biology, and humanities; examines methodologies, challenges, and interdisciplinary integration

InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching (arxiv:cs). InferLog optimizes LLM inference for online log parsing, enhancing efficiency via prefix caching and meta-learning without sacrificing accuracy, addressing privacy and latency challenges

Position: We Need An Algorithmic Understanding of Generative AI (arxiv:cs). Proposes AlgEval framework for studying algorithms in LLMs, emphasizing emergent search algorithms and enhancing interpretability and sample efficiency in AI systems

AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling (arxiv:cs). AbbIE introduces a recursive encoder for Transformers, enhancing perplexity, allowing dynamic computation scaling, and outperforming iterative methods with fewer training iterations

Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention (arxiv:cs). Benchmarking eight attention mechanisms in GPT-2, measuring time, GPU memory, FLOPS, CPU usage, and power consumption for energy efficiency insights

Differential Mamba (arxiv:cs). Differential design techniques enhance Mamba architecture for improved long-range retrieval and performance in language modeling, addressing attention overallocation issues in sequence models

đź‘‹ Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Generative AI

Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.

Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.

Subscribe now to join thousands of professionals who receive our weekly updates!