🤖

Machine Learning Engineer: 22nd July 2025

Published 22nd July 2025

🔧 Company Engineering Blogs

Using AI to make lower-carbon, faster-curing concrete (engineering.fb.com). Meta's AI tool optimizes concrete mixes for strength and sustainability, collaborating with Amrize and University of Illinois for environmental impact reduction

PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors (machinelearning.apple.com). Secure aggregation of high-dimensional vectors using block-sparse vectors, enhancing efficiency in private federated learning while maintaining differential privacy

Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes (medium.com/pinterest-engineering). Pinterest's search infrastructure migration to Kubernetes faced latency spikes due to cAdvisor's memory management, impacting performance for millions of users

A Smarter Way to Handle DynamoDB Change History (medium.com/vanguard-technology). Vanguard enhances client advice plan tracking by unifying DynamoDB change history using Kinesis Data Streams, AWS Lambda, and REST APIs across microservices

Behind the Streams: Live at Netflix. Part 1 (netflixtechblog.com). Exploring Netflix's live streaming infrastructure, including technology choices, cloud-based solutions, and performance optimization for events like Live comedy and sports

🧬 Applied ML & Specialized Domains

Talk by Marisa Eisenberg at the SMB annual conference (alanrendall.wordpress.com). Marisa Eisenberg discusses public health, mathematical modelling, parameter identifiability, and AI's role in epidemiology at the SMB conference in Edmonton

Unsupervised, generalizable method for doing anomaly detection (amazon.science). SEAD employs an unsupervised ensemble approach for anomaly detection, leveraging multiplicative weights to optimize model performance across diverse datasets

Do Smart Machines Make Smarter Trades? (alphaarchitect.com). Machine learning enhances trading strategies by exploiting market anomalies, improving returns, and capturing complex interactions overlooked by traditional models

Tool Helps Scientists Spot Source of Disease (cmu.edu). Causarray tool enhances identification of genetic changes in Alzheimer’s and schizophrenia, moving from association to causation in genomics

Arc Virtual Cell Challenge: A Primer (huggingface.co). Arc Virtual Cell Challenge invites participants to train models predicting gene silencing effects in cell types using CRISPR, utilizing a curated RNA sequencing dataset

PhD Student Position in TDA for Financial and Economic Systems, Maastricht University (appliedtopology.org). Fully funded PhD position in Topological Data Analysis for Financial and Economic Systems at Maastricht University, seeking candidates with strong mathematical and data science backgrounds

🏗️ ML Engineering & Infrastructure

In my orbit: hacking orbital’s ML-to-SQL for xgboost (emilyriederer.com). Exploration of using orbital’s ML-to-SQL framework for deploying xgboost models in SQL environments with known limitations and workarounds

Limiting Parallelism in scikit-learn (rnowling.github.io). Addressing KMeans clustering issues in scikit-learn on high-core systems using OpenBLAS and threadpoolctl for thread management

Rethinking Distributed Computing for the AI Era (cacm.acm.org). Rethinking distributed computing for AI with DeepSeek's efficient models, highlighting the mismatch with traditional systems and proposing new design principles for AI workloads

Hidden Technical Debt in AI (tomtunguz.com). Explores hidden complexities in AI, including operational challenges, tool integration, observability, and deterministic software to manage costs and enhance performance

🎯 Generative Models & Neural Networks

Flow Matching in 5 Minutes (nrehiew.github.io). Understanding Flow Matching for generative modeling, transforming distributions, vector fields, and iterative sampling in image generation using deep learning techniques

Training a Chunker with Burn (elijahpotter.dev). Neural network chunker achieves ~95% accuracy, utilizing Word + POS embeddings, BiLSTM architecture, and Burn toolkit for grammatical rule matching

Dragoncatcher: Quantum automata (robinsloan.com). Exploration of quantum automata in generative art, featuring emergent glyphs and reflections on privacy and authorship in digital spaces

Training a Variational Autoencoder (VAE) on the MNIST Dataset (blog.devgenius.io). Train a Variational Autoencoder using PyTorch to generate synthetic data from the MNIST dataset, visualize latent space, and evaluate with Fréchet Inception Distance

📊 Methodology & Evaluation

On the scientific method and its application to the science of deep learning (james-simon.github.io). Exploration of the scientific method's relevance in deep learning and the necessity for clear hypothesis testing and empirical validation in advancing understanding

Efficacy Engineering (dshieble.github.io). Efficacy engineering improves system effectiveness by evaluating inputs and outputs, emphasizing evaluation pipelines and qualitative insights to drive meaningful outcomes

Parameter Tweaking (ekran.org). Ben Bogart explores parameter tweaking in wave accumulation, focusing on frequency ranges, wave counts, and their aesthetic impact on visual compositions

Always-On Probability Calibration with Multiplicative Weights (gojiberries.io). Leverage Multiplicative Weights Updates for real-time probability calibration in ad systems, avoiding the need for extensive historical data and periodic retraining

🧮 Mathematical Foundations & Theory

Revisiting scaling laws via the z-transform (francisbach.com). Explores scaling laws in machine learning using z-transform, discussing asymptotic behavior, gradient descent, Nesterov acceleration, and convergence rates

Approximate first principal component (30fps.net). Approximate first principal component for color quantization and texture compression using PCA and a novel primary vector computation

Dual-numbers reverse AD for functional array languages (simon.peytonjones.org). Dual-numbers reverse-mode automatic differentiation enhances multidimensional array processing with vectorisation, focusing on array combinators and performance improvements

Good Enough: Satisficing in Statistics and Machine Learning (gojiberries.io). Explores satisficing in machine learning, demonstrating early stopping with Decision Trees, power analysis, and efficient model tuning techniques

📚 Academic Research

Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection (arxiv:stat). Bayesian Tensor Network Kernel Machines utilize hierarchical priors for automatic rank and feature selection, enhancing interpretability while maintaining computational efficiency and uncertainty quantification

Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning (arxiv:cs). GNNs enhanced by one-hot graph encoder embedding improve node learning, achieving state-of-the-art performance in clustering and faster convergence in classification tasks

Toward Temporal Causal Representation Learning with Tensor Decomposition (arxiv:cs). Temporal causal representation learning using CaRTeD framework for irregular tensor decomposition, improving causal insights and explainability in health record datasets

A Nonparallel Support Tensor Machine for Binary Classification based Large Margin Distribution and Iterative Optimization (arxiv:math). Proposes LDM-NPSTM classifier leveraging tensor data and marginal distribution for improved binary classification through alternative projection algorithm and optimization techniques

Imbalanced Regression Pipeline Recommendation (arxiv:cs). Meta-learning for Imbalanced Regression (Meta-IR) recommends optimal resampling and learning model combinations in zero-shot settings, outperforming existing AutoML frameworks

Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases (arxiv:cs). Rel-HNN leverages hypergraph representation for relational databases, capturing intra-tuple relations and enabling efficient multi-GPU training for improved classification and regression performance

Explainable Evidential Clustering (arxiv:cs). Explores explainable evidential clustering using Dempster-Shafer theory, introducing Iterative Evidential Mistake Minimization for decision trees in uncertain data contexts

Robust-Multi-Task Gradient Boosting (arxiv:cs). Robust-Multi-Task Gradient Boosting (R-MTGB) effectively handles outlier tasks in multi-task learning, enhancing performance through knowledge transfer and error reduction

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale (arxiv:cs). FourCastNet 3 employs geometric ML for fast, scalable probabilistic weather forecasting, achieving high accuracy and spectral fidelity on a large GPU infrastructure

Newfluence: Boosting Model interpretability and Understanding in High Dimensions (arxiv:cs). Newfluence: an improved tool for model interpretability in high dimensions, enhancing influence functions and applicable to techniques like Shapley values

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Machine Learning Engineer

Our Machine Learning Engineer newsletter covers the latest developments, research papers, tools, and techniques in ML engineering and deployment. Each week, we curate the most important content so you don't have to spend hours searching.

Whether you're a beginner or expert in machine learning engineering, our newsletter provides valuable information to keep you informed and ahead of the curve in this technically challenging field.

Subscribe now to join thousands of professionals who receive our weekly updates!