Generative AI
Published 24th June 2025
📣 Headlines
• Meta's Llama 3.1 70B model can reproduce 42% of Harry Potter, while the BBC threatens legal action against Perplexity for allegedly scraping and misrepresenting its content, highlighting growing copyright tensions in AI development.
• Research reveals that AI models' carbon emissions vary dramatically, with some reasoning models producing up to 50 times more CO2 than simpler alternatives, as AI chips drastically increase data center power consumption.
• Meta is offering over $100M to poach OpenAI staff, though Sam Altman claims no top talent has left due to OpenAI's mission of achieving superintelligence.
• Yoshua Bengio launches LawZero to ensure AI safety by introducing 'Scientist AI' as a guardrail against harmful AI actions, emphasizing human agency over AI autonomy.
• Seed funding is surging for AI autonomous agents as a top 2025 trend, while Nvidia has significantly increased investments in over 80 AI startups with 49 funding rounds in 2024 alone.
• AI scraping bots are overloading libraries, archives, and museums, causing server failures as bots often ignore robots.txt protections for cultural resources.
• Up to 70% of AI-generated music streams on Deezer are fraudulent, using bots to manipulate royalties, while the music industry develops detection systems to track synthetic content.
• Amazon CEO warns staff that AI will reduce workforce needs in coming years, as AI tools reshape legal advice access and clinical automation receives $70M in funding.
🔧 Company Engineering Blogs
We’re expanding our Gemini 2.5 family of models (deepmind.google). The Gemini 2.5 model family expands with the general availability of Gemini 2.5 Flash and Pro, along with the introduction of Flash-Lite, a cost-efficient and fast model excelling in coding, math, and reasoning tasks
How Salesforce Engineering Operationalized AI Productivity at Scale (engineering.salesforce.com). Salesforce Engineering integrates AI tools like Cursor, CodeGenie, and GitHub Copilot for enhanced productivity across six major engineering clouds, achieving over 90% adoption
GitHub Copilot Spaces: Bring the right context to every suggestion (github.blog). GitHub Copilot Spaces enables customized coding assistance by bundling contextual knowledge into reusable 'spaces,' enhancing AI code suggestions based on team-specific workflows, coding styles, and documentation
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware (huggingface.co). Learn efficient fine-tuning of the FLUX.1-dev model on consumer hardware using QLoRA with techniques like low-rank adaptation, gradient checkpointing, and 8-bit optimization for improved performance and reduced memory usage
🛠️ Applications & Tools
Refining input guardrails for safer LLM applications | Capital One (medium.com/capital-one-tech). Capital One enhances LLM safety using chain-of-thought prompting and alignment techniques, focusing on input guardrails to mitigate adversarial attacks and improve moderation accuracy in AI-driven applications
Beyond Code Generation: Continuously Evolve Text with LLMs (towardsdatascience.com). Explore the use of OpenEvolve for continuous content evolution with LLMs, focusing on prompt automation, evaluation, and the adaptation of coding tools for non-code applications like poetry generation
Useful LLM Agent Tools (tersesystems.com). Explore tools for building self-hosted LLM agents using Letta, featuring integrations with Open WebUI, Google Calendar, Notion, and more, enhancing content extraction and summarization capabilities
Graph RAG and Movie Reviews: Connecting the Dots to Find Better Movies (medium.com/building-the-open-data-stack). Graph RAG enhances movie recommendations by connecting structured Rotten Tomatoes metadata with unstructured reviews using tools like GraphRetriever, allowing for dynamic graph traversal based on metadata without pre-built knowledge graphs
🚀 Performance & Infrastructure
Compiling LLMs into a MegaKernel: A path to low-latency inference (zhihaojia.medium.com/compiling-llms-into-a-megakernel-a-path-to-low-latency-inference-cf7840913c17). A compiler transforms LLM inference into a single megakernel, enhancing GPU performance while reducing latency by 1.2-6.7x through end-to-end fusion of computation and communication across multiple GPUs
Vendor-recommended LLM parameter quick reference (muxup.com). Overview of vendor-recommended LLM parameters for model configuration including temperature, top_p, top_k, and usage of generation_config.json
Understanding and Coding the KV Cache in LLMs from Scratch (sebastianraschka.com). Learn about KV caches, a crucial tool for efficient inference in large language models (LLMs), including its implementation and benefits in caching key and value computations during text generation
Will long context windows solve all your problems? (frontierai.substack.com). Long context windows in AI can aid data processing but aren’t a panacea, as challenges like high costs, latency, and data complexity necessitate effective search strategies for better application outcomes
🔬 Evaluation & Benchmarking
AbsenceBench: Language Models Can't Tell What's Missing (simonwillison.net). AbsenceBench explores how language models struggle to identify missing content compared to recognizing present information, revealing weaknesses in models like Gemini-2.5-flash, Claude-3.7-Sonnet, and GPT-4.1
Language model benchmarks only tell half a story (blog.mastykarz.nl). Benchmarks don't fully reflect a language model's effectiveness for specific applications. A custom benchmark with test cases and scoring functions like BERT, ROUGE, and edit distance is crucial for accurate evaluation
Arcified.AI Winning Playbook for Strong Compute ARC AGI 2 Hackathon (words.strongcompute.com). Vijayraj Gohil and Aditya Shah from Arcified.AI achieved 80% solve rates on ARC AGI 2 using advanced techniques like one-shot reinforcement learning and evolutionary refinement, showcasing significant gains over previous models
Surprise! Evaluating AI’s ability to tell surprising stories (txtlab.org). A novel framework evaluates narrative surprise in AI-generated stories using six criteria, assessing endings from human authors and language models like GPT-4, revealing insights into predictability and reader engagement
🎯 Advanced Training Techniques
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware (huggingface.co). Learn efficient fine-tuning of the FLUX.1-dev model on consumer hardware using QLoRA with techniques like low-rank adaptation, gradient checkpointing, and 8-bit optimization for improved performance and reduced memory usage
Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation (rocm.blogs.amd.com). This practical playbook details the process for adapting LLMs to new languages via continued pretraining, specifically enhancing Finnish language capabilities with the Poro 2 model using Llama 3.1 on AMD GPUs
How is Spiky Superhuman AI trained? (medium.com/@danieldkang). Spiky superhuman AI (SSAI) leverages reinforcement learning (RL) and search techniques, exemplified by Google's AlphaEvolve and other models, to solve complex problems like AIME math competitions more effectively than humans
Efficient RL Training - Optimizing Memory Usage in veRL (Draft) (hebiao064.github.io). Biao He and Ata Fatahi discuss optimizing memory usage in veRL, a reinforcement learning library, utilizing techniques like Fully Sharded Data Parallel, Megatron-LM, and the torch_memory_saver library to address GPU memory challenges
🧠 Reasoning & Reinforcement Learning
Trying out the new Gemini 2.5 model family (simonwillison.net). Gemini 2.5 Pro and Flash models launched, introducing audio generation, improved video recall capabilities, and new pricing structures. Supported by tools like llm-gemini, models can generate SVGs and transcribe audio effectively
Reinforcement learning, explained with a minimum of math and jargon (understandingai.org). Reinforcement learning enhances AI capabilities, moving beyond imitation techniques like those used in BabyAGI and AutoGPT, leading to tools like Bolt.new that offer remarkable performance improvements in coding and multi-step tasks
How Smart Are Reasoning Models in 2025? (louisbouchard.ai). Reasoning models leverage increased compute during inference to enhance results. They utilize chains of thought and reinforcement learning to improve accuracy in tasks like arithmetic and coding while managing costs
“How Reinforcement Learning Is Used by Large Language Models and Why You Should Care” on the Pure AI Web Site (jamesmccaffrey.wordpress.com). Reinforcement learning in AI mitigates hallucinations through human-in-the-loop fine-tuning, where evaluators rate AI outputs, optimizing their responses while acknowledging potential bias and transparency issues in AI systems
📚 AI Philosophy & Meta-Learning
Engineered Qualia, Confabulation, and Emergent Consciousness (karmanivero.us). Engineered qualia in AI may help curb confabulation through introspective data structures, enabling models to self-monitor and improve reliability, drawing on techniques like Constitutional AI and recursive introspection
Learning to learn in the Age of LLMs (carette.xyz). Exploring the transformative impact of LLMs on learning, the importance of critical thinking, and the necessity of cautious engagement with AI tools like code generation in technical interviews
Reviewing “AI Engineering” by Chip Huyen (tensorlabbet.com). Chip Huyen's 'AI Engineering' explores LLMs, addressing application development, fine-tuning techniques like LoRA, retrieval-augmented generation, and challenges like hallucinations, offering insights into ML versus AI Engineering
🔮 Sunday edition #529: The new OS; Zuckerberg’s $14B pivot; routine cognition repriced; mainframe-to-Mac++ (exponentialview.co). Exploring the potential of LLMs as new operating systems, Zuckerberg's $14B acquisition of Scale AI, and the implications of routine cognition's pricing shift in an evolving AI landscape
Is using a thesaurus cheating? (maxkapur.com). Exploring the balance between creativity and assistance, the author reflects on using a thesaurus and generative AI tools like LLMs and NumPy, emphasizing personal craftsmanship over ease of productivity
📚 Academic Research
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem (arxiv:cs). This survey reviews the evolution of large language models to multimodal systems and agents, detailing jailbreak attack methods, defense strategies, and highlighting gaps in research including agent-specific security challenges and hybrid jailbreak taxonomy
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models (arxiv:cs). Large Language Models like GPT-4 necessitate advanced data center designs that incorporate FullFlat architectures, optimize compute/communication, and employ performance modeling tools for enhanced scalability and efficiency
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories (arxiv:cs). Study of LLM-based agents in software engineering, analyzing their thought-action-result trajectories, program repair, and issue resolution with insights on agent design
On the Existence of Universal Simulators of Attention (arxiv:cs). Investigation of transformer encoders as universal simulators for arbitrary attention mechanisms, demonstrating a data-agnostic algorithmic approach using RASP framework
Long-Context Generalization with Sparse Attention (arxiv:cs). Sparse attention mechanisms using α-entmax and the new Adaptive-Scalable Entmax (ASEntmax) improve long-context generalization by focusing on fixed-size patterns, surpassing traditional softmax and other baselines with better positional encodings
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers (arxiv:cs). Scalable Attention Module Discovery (SAMD) maps complex concepts to transformer attention heads, while Scalar Attention Module Intervention (SAMI) adjusts their effects, demonstrating applications in multilingualism, performance enhancement, and image classification suppression
👋 Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!