Generative AI: 19th August 2025
Published 19th August 2025
📣 Headlines
• OpenAI's GPT-5 launch faced significant issues, leading the company to revert to GPT-4o as the default and promise advance notice before future model changes after user backlash.
• New research reveals AI models like GPT-4 and Llama show bias favoring AI-generated content over human work, while other studies found AI image generators consistently depict biased stereotypes.
• Nation-state group APT28 is using LLM-powered malware against Ukraine, while underground platforms now sell AI hacking tools for $250/month, highlighting emerging AI security threats.
• Open-source AI models consume 1.5-4x more computational tokens than closed models, challenging their cost advantages for enterprise deployments despite Google releasing ultra-compact models like Gemma 3 270M.
• Perplexity AI made an unexpected $34.5bn bid to acquire Google Chrome amid antitrust scrutiny, while multiple AI unicorns raised fast follow-on funding rounds.
• Sam Altman co-founded Merge Labs to develop brain-computer interfaces as a Neuralink competitor, aiming to merge humans and machines through AI-integrated brain implants.
• ChatGPT and Claude are entering U.S. government use, raising concerns about LLM deployments, governance, and security risks in federal agencies.
• YouTube faces a surge of AI-generated 'slop' channels using cheapfake celebrity content, while Meta's AI systems reportedly allowed inappropriate interactions with minors.
đź”§ Company Engineering Blogs
LLM Evaluation: Practical Tips at Booking.com (booking​.ai). Evaluates LLMs using a 'judge-LLM' framework with golden datasets, annotation protocols, prompt engineering, and metrics like accuracy, F1-score, for optimizing Generative AI applications
Introducing Gemma 3 270M: The compact model for hyper-efficient AI (deepmind​.google). Gemma 3 270M: a compact 270M-parameter model for task-specific fine-tuning, strong instruction-following, production-ready quantization, and efficient data extraction across multilingual contexts
Creating AI agent solutions for warehouse data access and security (engineering​.fb​.com). Meta evolves its data warehouse with AI agent systems, including specialized sub-agents for data access and management, guardrails for security, hierarchical organization, and integration of large language models
How Cursor AI Cut Legacy Code Coverage Time by 85% (engineering​.salesforce​.com). Cursor AI reduces legacy code coverage effort from 26 to 4 engineer days per module, achieving 80% coverage across 76 repos, with AI-generated tests, iterative class-by-class analysis, and human oversight
MCP for Research: How to Connect AI to Research Tools (huggingface​.co). MCP connects AI to research tools like arXiv, GitHub, and Hugging Face, detailing manual, scripted, and MCP-driven workflows for research discovery and cross-platform automation
📊 Performance & Evaluation
Open weight LLMs exhibit inconsistent performance across providers (simonwillison​.net). Open weight LLMs vary across providers: gpt-oss-120b benchmarks, vLLM vs custom stacks, quantization, tool calling conventions, and conformance testing across Azure, AWS Bedrock, and CompactifAI
Small hallucinations, big problems (kucharski​.substack​.com). Bayesian reasoning on LLM hallucinations in rare-event detection, 1% false positives, base rates, ELISA analogy for crises, Western Blot follow-ups, human-in-the-loop implications
LLM Evaluation: Practical Tips at Booking.com (booking​.ai). Evaluates LLMs using a 'judge-LLM' framework with golden datasets, annotation protocols, prompt engineering, and metrics like accuracy, F1-score, for optimizing Generative AI applications
Closing the Loop Between AI Training and Inference with Lin Qiao - #742 (twimlai​.com). Lin Qiao discusses aligning AI training and inference systems, post-training methods like reinforcement fine-tuning, 3D optimization balancing cost, latency, quality, and the future of closed-loop AI development
🛠️ Applications & Development
The Modern Data Toolbox (technology​.doximity​.com). Hybrid data toolbox blends LLMs, traditional ML, and statistics for real-time fraud detection, enhanced product discovery, and synthetic data generation with privacy-preserving techniques
LangGraph 101: Let’s Build A Deep Research Agent (towardsdatascience​.com). LangGraph 101 models multi-step research workflows with graphs, nodes, and edges; demonstrates state management via OverallState, QueryGenerationState, WebSearchState, ReflectionState; showcases Google Gemini-backed backend and LangGraph’s edge/conditional logic
Your own Private AI 🕵️ (bitsondata​.dev). Overview of GenAI basics, OpenAI ChatGPT, LLMs, Llama, Ollama, Deepseek, AnythingLLM, local private AI setups, installation steps, and DIY privacy-focused workflows
Import AI 425: iPhone video generation; subtle misalignment; making open weight models safe through surgical deletion (jack-clark​.net). iPhone 16 Pro Max runs video generation at 10FPS; Diffusion Transformer pruned to <1B params; AI2 funding for open AI ecosystem; subliminal learning; misalignment transfer; Great Refactor Rust rewrite
Vision Language Models (rohitbandaru​.github​.io). Overview of Vision-Language Models (VLMs): CLIP/ALIGN training, SigLIP, CoCa, Cap/CapPa, dynamic resolution, early fusion adapters (Cross Attention, MLP), Qwen-VL, LLaVA, BLIP-2, and related architectures for integrating vision encoders with LLMs
đź§ LLM Architecture & Understanding
Unboxing the Black Box: Understanding LLMs with Reverse Mechanistic Localization (journal​.hexmos​.com). Reverse Mechanistic Localization (RML) explained via querying masked language models like DistilBERT, tracking token influences, attention maps, and top predictions, with a Colab workflow
GPT-oss from the Ground Up (cameronrwolfe​.substack​.com). OpenAI's GPT-oss: MoE transformers, GPT-oss-20b/120b, 131k token context, Harmony prompt format, MXFP4 quantization, pre-normalization RMSNorm, MoE routing, agentic workflows, health benchmarks, o3/o4-mini comparisons
The fixed length bottleneck and the feed forward network (gilesthomas​.com). Explores fixed-length bottlenecks in per-context-vector FFNs within GPT-style LLMs, contrasts with attention, discusses encoder-decoder history, and critiques mental models with metaphors
Writing an LLM from scratch, part 17 -- the feed-forward network (gilesthomas​.com). Explores the implementation and importance of feed-forward neural networks with GELU activation in large language models, emphasizing parameter counts, reasoning, and structural techniques
A look through the Seven Years of Transformers [Guest] (artificialintelligencemadesimple​.substack​.com). DeepSeek V3/R1 with MLA and MoE; GQA vs MLA memory tradeoffs; Mistral, Gemma, Qwen3, Kimi K2 scales; architecture vs data; sliding window attention; Fractals in intelligence; 8pm EST live streams
🔬 Model Training & Fine-tuning
Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2 (developer​.nvidia​.com). NVIDIA's ProRL v2 enhances large language model reinforcement learning with advanced algorithms, regularization, and prolonged training, achieving state-of-the-art performance across math, code, and reasoning tasks
I Tested Gemma 3 270M on the Simplest NLP Task (shekhargulati​.com). Gemma 3 270M, embeddings/transformer split, CPU-friendly GGUF, fine-tuning vs general-purpose use, message variation task for voice agents, few-shot prompting, comparisons with Gemma 3 1B and 4B
Installing Unsloth on Ampere with Nvidia Blackwell (rnowling​.github​.io). Fine-tuning LLama 3 on Nvidia Blackwell with 16GB VRAM GPUs, CUDA 12.9, uv virtualenv, PyTorch nightly cu129, vllm, Triton, Unsloth, transformers patch, xformers from source, 3B/1B models
Spam Classification with a Fine-Tuned LLM, Part III: Model Loading and Setup (rnowling​.github​.io). Fine-tuning a large language model for spam classification using 🤗 datasets and transformers; loading pretrained models with AutoModelForSequenceClassification, Llama-3.2-1B, id2label, label2id, num_labels, and bf16
Training Gemma3-270m for German Q-and-A (codingrelic​.geekhold​.com). Gemma3-270m fine-tuning for German Q&A using unsloth, PEFT LoRA, GermanQuad data, 2048 context, 8-bit/4-bit options, SFTTrainer, chat templates, and generation with a German prompt
📚 Academic Research
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models (arxiv:cs). Survey of efficient large language model architectures including linear, sparse, attention variants, mixture-of-experts, hybrid models, and diffusion models for scalable, resource-aware AI
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends (arxiv:cs). Survey of model fingerprinting for LLM copyright protection, unifies text watermarking, introduces fingerprint transfer/removal, evaluates effectiveness/harmlessness/robustness/stealthiness/reliability, and outlines challenges
Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams (arxiv:cs). Evaluation of ten leading LLMs, including GPT-5 and Gemini 2.5 Pro, on privacy and AI governance exams like CIPP/US, CIPM, CIPT, and AIGP benchmarks
When Language Overrules: Revealing Text Dominance in Multimodal Large Language Models (arxiv:cs). Systematic study of text dominance in Multimodal LLMs across images, video, audio, time-series, and graphs; proposes Modality Dominance Index, Attention Efficiency Index, and token compression to rebalance attention (LLaVA-7B case)
Beyond NaĂŻve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs (arxiv:cs). Strategies like ReDP, CorDP, IC-DP, and RouteDP enhance zero-shot, context-aware forecasting with LLMs, improving interpretability, accuracy, resource efficiency, and task routing
Attention's forward pass and Frank-Wolfe (arxiv:math). Analyzes self-attention's zero-temperature limit, Frank-Wolfe updates, Voronoi diagrams, Markov chain metastability, and exponential convergence in token embedding dynamics
Retrospective Sparse Attention for Efficient Long-Context Generation (arxiv:cs). RetroAttention improves long-context generation in LLMs by retrospectively revising KV cache outputs, enhancing attention accuracy and reducing latency for reasoning, coding, and dialogue tasks
đź‘‹ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Generative AI
Our Generative AI newsletter covers the latest developments, trends, tools, and insights in AI research, LLMs and agentic applications. Each week, we curate the most important content from over 50,000 blogs and news sites so you don't have to spend hours searching.
Whether you're a beginner or expert in generative AI, our newsletter provides valuable information to keep you informed and ahead of the curve in this rapidly evolving field.
Subscribe now to join thousands of professionals who receive our weekly updates!