Machine Learning Engineer: 21st October 2025
Published 21st October 2025
🔧 Company Engineering Blogs
Beyond Founder Mode: Mission Mode (blog.palantir.com). Mission Mode organizes entire company around customer missions, embedding engineers with teams and prioritizing outcomes over founder involvement
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware (engineering.fb.com). Meta uses AI, NLP, LLMs (Llama 3.1) and PCFs to estimate IT hardware Scope 3 emissions, building a component-level taxonomy for data centers
Building Data Cloud’s New Unstructured Data Governance: Automated PII Detection at Enterprise Scale (engineering.salesforce.com). Automated PII detection across hundreds of gigabytes of unstructured data using Spark pipelines, Microsoft Presidio integration, and policy-driven masking
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface.co). Google Cloud C4 with Intel Xeon Granite Rapids improves GPT OSS MoE inference throughput and 1.7x TCO savings over C3 via Intel-Hugging Face optimizations
Using AI to identify genetic variants in tumors with DeepSomatic (research.google). DeepSomatic uses CNNs to identify somatic variants in tumor genomes across Illumina, PacBio, and ONT data with CASTLE training data
🧭 Careers & Industry
How to Get a Job in Edge AI: Essential Skills for 2025 (shawnhymel.com). Edge AI on devices like iPhones, Android phones, Raspberry Pi, and microcontrollers; ML basics, Python/C/C++, Linux, RTOS, and hardware accelerators
Future-Proofing Your AI Engineering Career in 2026 (machinelearningmastery.com). Future-proof AI engineering in 2026 via math foundations, system automation, cross-domain fluency, open source, and ethics
From Columns to Rewards: Automating the Two Pillars That Drive Modern AI (tomtunguz.com). Reinforcement learning basics, feature engineering history, AutoML vs AutoRL, and the shift toward automated reward design
Engineering in the Age of Agents with Yechezkel Rabinovich (softwareengineeringdaily.com). eBPF-powered observability with groundcover; BYOC model, kernel sensors, AI’s impact on code review and root-cause analysis
Data and AI culture: How Nu’s philosophy became a competitive advantage (building.nubank.com). Nu's data and AI culture, Data Mesh, autonomy, and 100+ ML models powering customer-centric decisions
🗺️ Culture & Visualization
Rising on arXiv - 2025-10-17 (blog.rinesi.com). Overview of high-bandwidth memory, causal attribution, and primordial non-Gaussianities in arXiv discussions and recent papers
Review: Genius Makers (blog.piaw.net). Biographical look at Geoff Hinton and students, tracing neural networks history and AI industry moves through Google and China
Cartography of generative AI (flowingdata.com). Explores the groundwork of generative AI, including Estampa visuals, and the unseen data, energy, and people behind chatbot responses
🛠️ Practical ML Engineering
Growing a 454-page ML reference manual in 5 days: permacomputer harvest (russell.ballestrini.net). Harvesting a 454-page ML reference manual in 5 days via permacomputer automation and multi-language seed implementations
ML quacks: Combining duckdb and mlpack (dirk.eddelbuettel.com). Combining duckdb with mlpack to run adaboost on rectangular datasets via a duckdb extension
Tracking Down Mysterious ML Training Stalls (medium.com/@Pinterest_Engineering). Pinterest's ML training stalls traced to torch.compile interactions and a Ray monitoring task, resolved by removing psutils.memory_full_info
🔧 Beyond Python ML
Integrating Python Forecasting with R’s Tidyverse (datageeek.com). Integrates a Python forecasting model (nnetsauce MTS with BoosterRegressor) into R's tidyverse via reticulate for predictions and intervals
Decision Tree Regression (Without Recursion) From Scratch Using C# (jamesmccaffreyblog.com). Decision Tree Regression (non-recursive) in C# with a handcrafted trainer and evaluation workflow
Saving a Trained Kernel Ridge Regression Model Using C# (jamesmccaffreyblog.com). Kernel ridge regression with RBF kernel, Cholesky inverse training, and saving/loading model weights in C#
☁️ Cloud & HPC Engineering
Dataflow Computing for AI Inference with Kunle Olukotun - #751 (twimlai.com). Reconfigurable dataflow units for AI inference, multi-model serving, and dynamic hardware via Sambanova and Kunle Olukotun
Enabling Scalable AI-Driven Molecular Dynamics Simulations (developer.nvidia.com). Integrating PyTorch-based MLIPs with LAMMPS via ML-IAP-Kokkos for scalable, multi-GPU MD simulations and Python-based model loading
Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference (aws.amazon.com). Fine-tuning Amazon Nova Lite for document processing, leveraging Bedrock ODI, PEFT, and JSON structured outputs for tax forms
Configure and verify a distributed training cluster with AWS Deep Learning Containers on Amazon EKS (aws.amazon.com). Configure and verify a distributed training cluster on Amazon EKS using AWS DLCs with PyTorch, NCCL, EFA, FSx Lustre, and etcd/Kubeflow tools
🚀 LLM Inference at Scale
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface.co). Google Cloud C4 with Intel Xeon Granite Rapids improves GPT OSS MoE inference throughput and 1.7x TCO savings over C3 via Intel-Hugging Face optimizations
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism (engineering.fb.com). Meta shares tensor, context, and expert parallelism innovations for scalable LLM inference and long-context handling
Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems (developer.nvidia.com). Wide Expert Parallelism with NVL72 GB200, NVLink, and TensorRT-LLM boosts MoE inference throughput and lowers TCO
NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 (simonwillison.net). EXO 1.0: NVIDIA DGX Spark vs M3 Ultra Mac Studio for LLM prefill and decode; 4x faster inference on Llama-3.1 8B via streaming KV cache over 10Gb Ethernet
🧩 LLM Training Hacks
modded-nanogpt medium world record: Re-using intermediate activations in the output latents (snimu.github.io). Modded-nanogpt medium record reuses layer-11 activations in output latents with learned weights, exploring backout hypothesis and multi-layer skip experiments
Three AI customisation concepts (anna.kiwi). Three AI customization concepts explained via embeddings, RAG vs fine-tuning, and LoRA with developer and non-technical metaphors
Writing an LLM from scratch, part 22 -- finally training our LLM! (gilesthomas.com). Training an LLM from scratch, comparing GPT-2 weights, using AdamW, temperature and top-k sampling, and cost considerations
supplement to 0.430 (aarnphm.xyz). DeepSeek MLA: multi-head latent attention, KV compression, and on-device serving with vLLM and RoPE-enhanced queries
The case for the return of fine-tuning (welovesota.com). Fine-tuning resurges with LoRA, Tinker, PEFT, and open-weight ecosystems enabling modular, controlled AI with personal hardware touches
🕵️ Explainability & Synthesis
Introducing LightSHAP (lorentzen.ch). LightSHAP: a lightweight, framework-agnostic SHAP implementation for tabular data with explain_tree and explain_any examples using CatBoost and linear models
How We Stress Test Credit Models (barnesanalytics.com). Five-p pillar framework for stress-testing credit models (data checks, backtesting, macro scenarios, sensitivity tests, explainability and governance) with SHAP, calibration, and TPRM relevance
GAN-like Synthetic Data Generation Examples with DistroSimulator (thierrymoudiki.github.io). GAN-like synthetic data using DistroSimulator across univariate/multivariate distributions, digits, Fashion-MNIST, and Olivetti faces
🧬 Scientific ML & Neuroscience
Compositional modeling of plant communities with Dirichlet regression (ecogambler.netlify.app). Dirichlet regression with Gaussian process smooths for plant-community composition across elevation and temperature using brms and Hilbert-space GP approximations
Using AI to identify genetic variants in tumors with DeepSomatic (research.google). DeepSomatic uses CNNs to identify somatic variants in tumor genomes across Illumina, PacBio, and ONT data with CASTLE training data
Whole-brain, bottom-up neuroscience: The time for it is now (thetransmitter.org). Bottom-up whole-brain neuroscience using C. elegans, expansion microscopy, connectomics, optogenetics, and molecular modeling to build integrative brain models
How AI hears accents (accent-explorer.boldvoice.com). AI-driven 3D latent visualization of English accents using HuBERT fine-tuning and UMAP, revealing clustering by geography and immigration
📐 Deep Learning Theory
Covariant spatio-temporal receptive fields for spiking neural networks (jepedersen.dk). Covariant spatio-temporal receptive fields for spiking neural networks leveraging affine Gaussian derivatives and leaky integrator/IF neurons
Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality (ollybritton.com). Overview of data classes where DNNs can beat the curse of dimensionality, including hidden manifold model, ReLU expressivity, and trajectory-based complexity
Geometric Structure of Emergent Misalignment: Evidence for Multiple Independent Directions (lesswrong.com). Geometric analysis of six steering vectors across three emergent misalignment directions reveals orthogonal, independent mechanisms with asymmetric cross-laboratory transfer
Lecture - Theories of Deep Learning MT25, II, Why deep learning (ollybritton.com). Overview of theories behind deep learning, architectures, and key papers shaping modern DL practice
📏 Mathematical Musings
“We can obtain less rigorous but more convincing results by other means” (new paper) (noncommutativeanalysis.wordpress.com). Experimental bounds for dilations of free unitaries; universal commuting dilation constant; semidefinite programming; Copilot; random unitary pairs
Linkage (11011110.github.io). Overview of geometric constructions, puzzles, and algorithms, including Herschel enneahedron truncation, origami links, cuckoo hashing, and Hermite interpolation in curves
H2O: Higher-Order Pattern-Discovery in High-Dimensional Data, Aarhus University, Denmark, March 30-31 2026 (appliedtopology.org). Two-day workshop integrating TDA, TSA, and high-dimensional statistics on network data at Aarhus University
Distribution of coordinates on a sphere (johndcook.com). Uniformly distributed sphere coordinates x, y, z show zero linear correlation but non-independence; uses normal samples, normalization, and distance correlation
📚 Academic Research
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training (arxiv:cs). PRISM models runtime variation in large-scale training using profiling and probabilistic simulation. Accurately predicts distributions, guiding parallelism, placement, and kernel optimizations for predictable, efficient clusters
Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning (arxiv:cs). Introduces MPC algorithms for multiplying secret sparse matrices, slashing memory and communication costs. Enables privacy-preserving ML on high-dimensional sparse data, with sparsity assumptions and applications
nuGPR: GPU-Accelerated Gaussian Process Regression with Iterative Algorithms and Low-Rank Approximations (arxiv:math). nuGPR combines preconditioned conjugate gradients, low-rank approximations, and CUDA parallelism to accelerate Gaussian Processes. Reduces training time and memory, making uncertainty-aware models practical for engineers
GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework (arxiv:cs). GRank unifies target-aware candidate generation and ranking without structured indices. GPU-accelerated MIPS and training improve recall and throughput, simplifying industrial retrieval pipelines at billion-item scale
Dimension Mask Layer: Optimizing Embedding Efficiency for Scalable ID-based Models (arxiv:cs). DML learns per-feature embedding dimensions by masking components during training. Cuts embedding memory 40–50% with little loss, reducing overfitting and serving costs in ID-heavy models
👋 Before you go
Blaze newsletters will soon be moving to substack as the main email delivery service. This is primarily to make managing subscriptions, sending email and archived newsletters more streamlined - there will be no change to your newsletters, they will continue to be completely free and you will be able subscribe and unsubscribe just as easily as before.
Blaze's sister site https://blognerd.app, a search engine for blogs and posts, has had a major makeover, and is a good place to search for smart, independent writing.
Finally, if you get value from your newsletter, please consider supporting me by joining the patreon page at patreon.com/blazeemail. Becoming a patron helps me to cover my costs and to keep blaze going so everyone can enjoy the newsletters for free.
You may also like
About Machine Learning Engineer
Our Machine Learning Engineer newsletter covers the latest developments, research papers, tools, and techniques in ML engineering and deployment. Each week, we curate the most important content so you don't have to spend hours searching.
Whether you're a beginner or expert in machine learning engineering, our newsletter provides valuable information to keep you informed and ahead of the curve in this technically challenging field.
Subscribe now to join thousands of professionals who receive our weekly updates!