Machine Learning Engineer: 22nd July 2025
đ§ Company Engineering Blogs
Using AI to make lower-carbon, faster-curing concrete (engineeringâ.fbâ.com). Meta's AI tool optimizes concrete mixes for strength and sustainability, collaborating with Amrize and University of Illinois for environmental impact reduction
PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors (machinelearningâ.appleâ.com). Secure aggregation of high-dimensional vectors using block-sparse vectors, enhancing efficiency in private federated learning while maintaining differential privacy
Debugging the One-in-a-Million Failure: Migrating Pinterestâs Search Infrastructure to Kubernetes (mediumâ.com/pinterest-engineering). Pinterest's search infrastructure migration to Kubernetes faced latency spikes due to cAdvisor's memory management, impacting performance for millions of users
A Smarter Way to Handle DynamoDB Change History (mediumâ.com/vanguard-technology). Vanguard enhances client advice plan tracking by unifying DynamoDB change history using Kinesis Data Streams, AWS Lambda, and REST APIs across microservices
Behind the Streams: Live at Netflix. Part 1 (netflixtechblogâ.com). Exploring Netflix's live streaming infrastructure, including technology choices, cloud-based solutions, and performance optimization for events like Live comedy and sports
đ§Ź Applied ML & Specialized Domains
Talk by Marisa Eisenberg at the SMB annual conference (alanrendallâ.wordpressâ.com). Marisa Eisenberg discusses public health, mathematical modelling, parameter identifiability, and AI's role in epidemiology at the SMB conference in Edmonton
Unsupervised, generalizable method for doing anomaly detection (amazonâ.science). SEAD employs an unsupervised ensemble approach for anomaly detection, leveraging multiplicative weights to optimize model performance across diverse datasets
Do Smart Machines Make Smarter Trades? (alphaarchitectâ.com). Machine learning enhances trading strategies by exploiting market anomalies, improving returns, and capturing complex interactions overlooked by traditional models
Tool Helps Scientists Spot Source of Disease (cmuâ.edu). Causarray tool enhances identification of genetic changes in Alzheimerâs and schizophrenia, moving from association to causation in genomics
Arc Virtual Cell Challenge: A Primer (huggingfaceâ.co). Arc Virtual Cell Challenge invites participants to train models predicting gene silencing effects in cell types using CRISPR, utilizing a curated RNA sequencing dataset
PhD Student Position in TDA for Financial and Economic Systems, Maastricht University (appliedtopologyâ.org). Fully funded PhD position in Topological Data Analysis for Financial and Economic Systems at Maastricht University, seeking candidates with strong mathematical and data science backgrounds
đď¸ ML Engineering & Infrastructure
In my orbit: hacking orbitalâs ML-to-SQL for xgboost (emilyriedererâ.com). Exploration of using orbitalâs ML-to-SQL framework for deploying xgboost models in SQL environments with known limitations and workarounds
Limiting Parallelism in scikit-learn (rnowlingâ.githubâ.io). Addressing KMeans clustering issues in scikit-learn on high-core systems using OpenBLAS and threadpoolctl for thread management
Rethinking Distributed Computing for the AI Era (cacmâ.acmâ.org). Rethinking distributed computing for AI with DeepSeek's efficient models, highlighting the mismatch with traditional systems and proposing new design principles for AI workloads
Hidden Technical Debt in AI (tomtunguzâ.com). Explores hidden complexities in AI, including operational challenges, tool integration, observability, and deterministic software to manage costs and enhance performance
đŻ Generative Models & Neural Networks
Flow Matching in 5 Minutes (nrehiewâ.githubâ.io). Understanding Flow Matching for generative modeling, transforming distributions, vector fields, and iterative sampling in image generation using deep learning techniques
Training a Chunker with Burn (elijahpotterâ.dev). Neural network chunker achieves ~95% accuracy, utilizing Word + POS embeddings, BiLSTM architecture, and Burn toolkit for grammatical rule matching
Dragoncatcher: Quantum automata (robinsloanâ.com). Exploration of quantum automata in generative art, featuring emergent glyphs and reflections on privacy and authorship in digital spaces
Training a Variational Autoencoder (VAE) on the MNIST Dataset (blogâ.devgeniusâ.io). Train a Variational Autoencoder using PyTorch to generate synthetic data from the MNIST dataset, visualize latent space, and evaluate with FrĂŠchet Inception Distance
đ Methodology & Evaluation
On the scientific method and its application to the science of deep learning (james-simonâ.githubâ.io). Exploration of the scientific method's relevance in deep learning and the necessity for clear hypothesis testing and empirical validation in advancing understanding
Efficacy Engineering (dshiebleâ.githubâ.io). Efficacy engineering improves system effectiveness by evaluating inputs and outputs, emphasizing evaluation pipelines and qualitative insights to drive meaningful outcomes
Parameter Tweaking (ekranâ.org). Ben Bogart explores parameter tweaking in wave accumulation, focusing on frequency ranges, wave counts, and their aesthetic impact on visual compositions
Always-On Probability Calibration with Multiplicative Weights (gojiberriesâ.io). Leverage Multiplicative Weights Updates for real-time probability calibration in ad systems, avoiding the need for extensive historical data and periodic retraining
đ§Ž Mathematical Foundations & Theory
Revisiting scaling laws via the z-transform (francisbachâ.com). Explores scaling laws in machine learning using z-transform, discussing asymptotic behavior, gradient descent, Nesterov acceleration, and convergence rates
Approximate first principal component (30fpsâ.net). Approximate first principal component for color quantization and texture compression using PCA and a novel primary vector computation
Dual-numbers reverse AD for functional array languages (simonâ.peytonjonesâ.org). Dual-numbers reverse-mode automatic differentiation enhances multidimensional array processing with vectorisation, focusing on array combinators and performance improvements
Good Enough: Satisficing in Statistics and MachineâŻLearning (gojiberriesâ.io). Explores satisficing in machine learning, demonstrating early stopping with Decision Trees, power analysis, and efficient model tuning techniques
đ Academic Research
Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection (arxiv:stat). Bayesian Tensor Network Kernel Machines utilize hierarchical priors for automatic rank and feature selection, enhancing interpretability while maintaining computational efficiency and uncertainty quantification
Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning (arxiv:cs). GNNs enhanced by one-hot graph encoder embedding improve node learning, achieving state-of-the-art performance in clustering and faster convergence in classification tasks
Toward Temporal Causal Representation Learning with Tensor Decomposition (arxiv:cs). Temporal causal representation learning using CaRTeD framework for irregular tensor decomposition, improving causal insights and explainability in health record datasets
A Nonparallel Support Tensor Machine for Binary Classification based Large Margin Distribution and Iterative Optimization (arxiv:math). Proposes LDM-NPSTM classifier leveraging tensor data and marginal distribution for improved binary classification through alternative projection algorithm and optimization techniques
Imbalanced Regression Pipeline Recommendation (arxiv:cs). Meta-learning for Imbalanced Regression (Meta-IR) recommends optimal resampling and learning model combinations in zero-shot settings, outperforming existing AutoML frameworks
Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases (arxiv:cs). Rel-HNN leverages hypergraph representation for relational databases, capturing intra-tuple relations and enabling efficient multi-GPU training for improved classification and regression performance
Explainable Evidential Clustering (arxiv:cs). Explores explainable evidential clustering using Dempster-Shafer theory, introducing Iterative Evidential Mistake Minimization for decision trees in uncertain data contexts
Robust-Multi-Task Gradient Boosting (arxiv:cs). Robust-Multi-Task Gradient Boosting (R-MTGB) effectively handles outlier tasks in multi-task learning, enhancing performance through knowledge transfer and error reduction
FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale (arxiv:cs). FourCastNet 3 employs geometric ML for fast, scalable probabilistic weather forecasting, achieving high accuracy and spectral fidelity on a large GPU infrastructure
Newfluence: Boosting Model interpretability and Understanding in High Dimensions (arxiv:cs). Newfluence: an improved tool for model interpretability in high dimensions, enhancing influence functions and applicable to techniques like Shapley values
đ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves â vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worriesâthe newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Machine Learning Engineer
Our Machine Learning Engineer newsletter covers the latest developments, research papers, tools, and techniques in ML engineering and deployment. Each week, we curate the most important content so you don't have to spend hours searching.
Whether you're a beginner or expert in machine learning engineering, our newsletter provides valuable information to keep you informed and ahead of the curve in this technically challenging field.
Subscribe now to join thousands of professionals who receive our weekly updates!