🤖

Machine Learning Engineer: 21st October 2025

Newsletters sent once a week, unsubscribe anytime.

Published 21st October 2025

🔧 Company Engineering Blogs

Beyond Founder Mode: Mission Mode (blog​.palantir​.com). Mission Mode organizes entire company around customer missions, embedding engineers with teams and prioritizing outcomes over founder involvement

How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware (engineering​.fb​.com). Meta uses AI, NLP, LLMs (Llama 3.1) and PCFs to estimate IT hardware Scope 3 emissions, building a component-level taxonomy for data centers

Building Data Cloud’s New Unstructured Data Governance: Automated PII Detection at Enterprise Scale (engineering​.salesforce​.com). Automated PII detection across hundreds of gigabytes of unstructured data using Spark pipelines, Microsoft Presidio integration, and policy-driven masking

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface​.co). Google Cloud C4 with Intel Xeon Granite Rapids improves GPT OSS MoE inference throughput and 1.7x TCO savings over C3 via Intel-Hugging Face optimizations

Using AI to identify genetic variants in tumors with DeepSomatic (research​.google). DeepSomatic uses CNNs to identify somatic variants in tumor genomes across Illumina, PacBio, and ONT data with CASTLE training data

🧭 Careers & Industry

How to Get a Job in Edge AI: Essential Skills for 2025 (shawnhymel​.com). Edge AI on devices like iPhones, Android phones, Raspberry Pi, and microcontrollers; ML basics, Python/C/C++, Linux, RTOS, and hardware accelerators

Future-Proofing Your AI Engineering Career in 2026 (machinelearningmastery​.com). Future-proof AI engineering in 2026 via math foundations, system automation, cross-domain fluency, open source, and ethics

From Columns to Rewards: Automating the Two Pillars That Drive Modern AI (tomtunguz​.com). Reinforcement learning basics, feature engineering history, AutoML vs AutoRL, and the shift toward automated reward design

Engineering in the Age of Agents with Yechezkel Rabinovich (softwareengineeringdaily​.com). eBPF-powered observability with groundcover; BYOC model, kernel sensors, AI’s impact on code review and root-cause analysis

Data and AI culture: How Nu’s philosophy became a competitive advantage (building​.nubank​.com). Nu's data and AI culture, Data Mesh, autonomy, and 100+ ML models powering customer-centric decisions

🗺️ Culture & Visualization

Rising on arXiv - 2025-10-17 (blog​.rinesi​.com). Overview of high-bandwidth memory, causal attribution, and primordial non-Gaussianities in arXiv discussions and recent papers

Review: Genius Makers (blog​.piaw​.net). Biographical look at Geoff Hinton and students, tracing neural networks history and AI industry moves through Google and China

Cartography of generative AI (flowingdata​.com). Explores the groundwork of generative AI, including Estampa visuals, and the unseen data, energy, and people behind chatbot responses

🛠️ Practical ML Engineering

Growing a 454-page ML reference manual in 5 days: permacomputer harvest (russell​.ballestrini​.net). Harvesting a 454-page ML reference manual in 5 days via permacomputer automation and multi-language seed implementations

ML quacks: Combining duckdb and mlpack (dirk​.eddelbuettel​.com). Combining duckdb with mlpack to run adaboost on rectangular datasets via a duckdb extension

Tracking Down Mysterious ML Training Stalls (medium​.com/@Pinterest_Engineering). Pinterest's ML training stalls traced to torch.compile interactions and a Ray monitoring task, resolved by removing psutils.memory_full_info

🔧 Beyond Python ML

Integrating Python Forecasting with R’s Tidyverse (datageeek​.com). Integrates a Python forecasting model (nnetsauce MTS with BoosterRegressor) into R's tidyverse via reticulate for predictions and intervals

Decision Tree Regression (Without Recursion) From Scratch Using C# (jamesmccaffreyblog​.com). Decision Tree Regression (non-recursive) in C# with a handcrafted trainer and evaluation workflow

Saving a Trained Kernel Ridge Regression Model Using C# (jamesmccaffreyblog​.com). Kernel ridge regression with RBF kernel, Cholesky inverse training, and saving/loading model weights in C#

☁️ Cloud & HPC Engineering

Dataflow Computing for AI Inference with Kunle Olukotun - #751 (twimlai​.com). Reconfigurable dataflow units for AI inference, multi-model serving, and dynamic hardware via Sambanova and Kunle Olukotun

Enabling Scalable AI-Driven Molecular Dynamics Simulations (developer​.nvidia​.com). Integrating PyTorch-based MLIPs with LAMMPS via ML-IAP-Kokkos for scalable, multi-GPU MD simulations and Python-based model loading

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference (aws​.amazon​.com). Fine-tuning Amazon Nova Lite for document processing, leveraging Bedrock ODI, PEFT, and JSON structured outputs for tax forms

Configure and verify a distributed training cluster with AWS Deep Learning Containers on Amazon EKS (aws​.amazon​.com). Configure and verify a distributed training cluster on Amazon EKS using AWS DLCs with PyTorch, NCCL, EFA, FSx Lustre, and etcd/Kubeflow tools

🚀 LLM Inference at Scale

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface​.co). Google Cloud C4 with Intel Xeon Granite Rapids improves GPT OSS MoE inference throughput and 1.7x TCO savings over C3 via Intel-Hugging Face optimizations

Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism (engineering​.fb​.com). Meta shares tensor, context, and expert parallelism innovations for scalable LLM inference and long-context handling

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems (developer​.nvidia​.com). Wide Expert Parallelism with NVL72 GB200, NVLink, and TensorRT-LLM boosts MoE inference throughput and lowers TCO

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 (simonwillison​.net). EXO 1.0: NVIDIA DGX Spark vs M3 Ultra Mac Studio for LLM prefill and decode; 4x faster inference on Llama-3.1 8B via streaming KV cache over 10Gb Ethernet

🧩 LLM Training Hacks

modded-nanogpt medium world record: Re-using intermediate activations in the output latents (snimu​.github​.io). Modded-nanogpt medium record reuses layer-11 activations in output latents with learned weights, exploring backout hypothesis and multi-layer skip experiments

Three AI customisation concepts (anna​.kiwi). Three AI customization concepts explained via embeddings, RAG vs fine-tuning, and LoRA with developer and non-technical metaphors

Writing an LLM from scratch, part 22 -- finally training our LLM! (gilesthomas​.com). Training an LLM from scratch, comparing GPT-2 weights, using AdamW, temperature and top-k sampling, and cost considerations

supplement to 0.430 (aarnphm​.xyz). DeepSeek MLA: multi-head latent attention, KV compression, and on-device serving with vLLM and RoPE-enhanced queries

The case for the return of fine-tuning (welovesota​.com). Fine-tuning resurges with LoRA, Tinker, PEFT, and open-weight ecosystems enabling modular, controlled AI with personal hardware touches

🕵️ Explainability & Synthesis

Introducing LightSHAP (lorentzen​.ch). LightSHAP: a lightweight, framework-agnostic SHAP implementation for tabular data with explain_tree and explain_any examples using CatBoost and linear models

How We Stress Test Credit Models (barnesanalytics​.com). Five-p pillar framework for stress-testing credit models (data checks, backtesting, macro scenarios, sensitivity tests, explainability and governance) with SHAP, calibration, and TPRM relevance

GAN-like Synthetic Data Generation Examples with DistroSimulator (thierrymoudiki​.github​.io). GAN-like synthetic data using DistroSimulator across univariate/multivariate distributions, digits, Fashion-MNIST, and Olivetti faces

🧬 Scientific ML & Neuroscience

Compositional modeling of plant communities with Dirichlet regression (ecogambler​.netlify​.app). Dirichlet regression with Gaussian process smooths for plant-community composition across elevation and temperature using brms and Hilbert-space GP approximations

Using AI to identify genetic variants in tumors with DeepSomatic (research​.google). DeepSomatic uses CNNs to identify somatic variants in tumor genomes across Illumina, PacBio, and ONT data with CASTLE training data

Whole-brain, bottom-up neuroscience: The time for it is now (thetransmitter​.org). Bottom-up whole-brain neuroscience using C. elegans, expansion microscopy, connectomics, optogenetics, and molecular modeling to build integrative brain models

How AI hears accents (accent-explorer​.boldvoice​.com). AI-driven 3D latent visualization of English accents using HuBERT fine-tuning and UMAP, revealing clustering by geography and immigration

📐 Deep Learning Theory

Covariant spatio-temporal receptive fields for spiking neural networks (jepedersen​.dk). Covariant spatio-temporal receptive fields for spiking neural networks leveraging affine Gaussian derivatives and leaky integrator/IF neurons

Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality (ollybritton​.com). Overview of data classes where DNNs can beat the curse of dimensionality, including hidden manifold model, ReLU expressivity, and trajectory-based complexity

Geometric Structure of Emergent Misalignment: Evidence for Multiple Independent Directions (lesswrong​.com). Geometric analysis of six steering vectors across three emergent misalignment directions reveals orthogonal, independent mechanisms with asymmetric cross-laboratory transfer

Lecture - Theories of Deep Learning MT25, II, Why deep learning (ollybritton​.com). Overview of theories behind deep learning, architectures, and key papers shaping modern DL practice

📏 Mathematical Musings

“We can obtain less rigorous but more convincing results by other means” (new paper) (noncommutativeanalysis​.wordpress​.com). Experimental bounds for dilations of free unitaries; universal commuting dilation constant; semidefinite programming; Copilot; random unitary pairs

Linkage (11011110​.github​.io). Overview of geometric constructions, puzzles, and algorithms, including Herschel enneahedron truncation, origami links, cuckoo hashing, and Hermite interpolation in curves

H2O: Higher-Order Pattern-Discovery in High-Dimensional Data, Aarhus University, Denmark, March 30-31 2026 (appliedtopology​.org). Two-day workshop integrating TDA, TSA, and high-dimensional statistics on network data at Aarhus University

Distribution of coordinates on a sphere (johndcook​.com). Uniformly distributed sphere coordinates x, y, z show zero linear correlation but non-independence; uses normal samples, normalization, and distance correlation

📚 Academic Research

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training (arxiv:cs). PRISM models runtime variation in large-scale training using profiling and probabilistic simulation. Accurately predicts distributions, guiding parallelism, placement, and kernel optimizations for predictable, efficient clusters

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning (arxiv:cs). Introduces MPC algorithms for multiplying secret sparse matrices, slashing memory and communication costs. Enables privacy-preserving ML on high-dimensional sparse data, with sparsity assumptions and applications

nuGPR: GPU-Accelerated Gaussian Process Regression with Iterative Algorithms and Low-Rank Approximations (arxiv:math). nuGPR combines preconditioned conjugate gradients, low-rank approximations, and CUDA parallelism to accelerate Gaussian Processes. Reduces training time and memory, making uncertainty-aware models practical for engineers

GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework (arxiv:cs). GRank unifies target-aware candidate generation and ranking without structured indices. GPU-accelerated MIPS and training improve recall and throughput, simplifying industrial retrieval pipelines at billion-item scale

Dimension Mask Layer: Optimizing Embedding Efficiency for Scalable ID-based Models (arxiv:cs). DML learns per-feature embedding dimensions by masking components during training. Cuts embedding memory 40–50% with little loss, reducing overfitting and serving costs in ID-heavy models

👋 Before you go

Blaze newsletters will soon be moving to substack as the main email delivery service. This is primarily to make managing subscriptions, sending email and archived newsletters more streamlined - there will be no change to your newsletters, they will continue to be completely free and you will be able subscribe and unsubscribe just as easily as before.

Blaze's sister site https://blognerd.app, a search engine for blogs and posts, has had a major makeover, and is a good place to search for smart, independent writing.

Finally, if you get value from your newsletter, please consider supporting me by joining the patreon page at patreon.com/blazeemail. Becoming a patron helps me to cover my costs and to keep blaze going so everyone can enjoy the newsletters for free.

You may also like

About Machine Learning Engineer

Our Machine Learning Engineer newsletter covers the latest developments, research papers, tools, and techniques in ML engineering and deployment. Each week, we curate the most important content so you don't have to spend hours searching.

Whether you're a beginner or expert in machine learning engineering, our newsletter provides valuable information to keep you informed and ahead of the curve in this technically challenging field.

Subscribe now to join thousands of professionals who receive our weekly updates!