Machine Learning Engineer
Tuesday 11th March, 2025
Subscribe to this newsletter!
💡 Industry Perspectives
What my privacy papers (don't) have to say about copyright and generative AI (nicholas.carlini.com, 2025-03-11). Nicholas Carlini discusses how his privacy-focused research on memorization in generative models like GPT-2 and Stable Diffusion is often misinterpreted in copyright cases, emphasizing the distinction between privacy and copyright concerns
A blueprint for data-driven molecule engineering (ericmjl.github.io, 2025-03-06). Explore how Catalyst Therapeutics accelerates protein binder discovery through robust experimental design, data capture, Bayesian modeling, and machine learning to optimize molecule engineering in the biotech sector
Building A Hybrid Trading Model with ML and Python (levelup.gitconnected.com, 2025-03-05). A hybrid trading model utilizing fundamentals and options data is developed using Python, incorporating rule-based and machine learning approaches to create more robust trading signals for buy, hold, or sell decisions
Teaching machines to understand (danturkel.com, 2025-03-10). Dan Turkel discusses the evolution of AI understanding from ELIZA to ChatGPT, emphasizing the role of vector space models and embeddings in enhancing machines' ability to contextualize information and search effectively
How transformers expanded my view of Math and ML (mikelikejordan.bearblog.dev, 2025-03-08). Transformers, BERT, and GPT are reshaping AI with enhanced language understanding through self-attention mechanisms, surpassing RNNs and CNNs by efficiently processing sequences and contextual relationships in natural language tasks
Thoughts on AI (davetang.org, 2025-03-07). Dave Tang reflects on his journey in AI and machine learning, discussing challenges in applying deep learning to biological data and advocating for viewing AI as augmented intelligence rather than fully autonomous technology
⚙️ Compilers & Systems
MLIR Part 1 - Introduction to MLIR and Modern Compilers (stephendiehl.com, 2025-03-10). MLIR, a new LLVM project, enhances modern compilers for AI workloads, supporting diverse architectures, optimizing code with dialects, and facilitating machine learning frameworks, while enabling domain-specific languages
Debugging Disposable ML Frameworks (petewarden.com, 2025-03-06). Nat Jeffries shares insights on debugging disposable ML frameworks for on-device transformer deployments, emphasizing tensor validation, quantization considerations, and the importance of maintaining a clear understanding of implementation nuances
Deal: Distributed End-to-End GNN Inference for All Nodes (arxiv:cs, 2025-03-04). Deal is a distributed GNN inference system that optimizes end-to-end inference for multi-billion edge graphs, achieving up to 7.70x faster inference and 21.05x faster graph construction through innovative memory-saving and efficient communication techniques
Efficient Kernel Smoother in the ONNX Format (thebigdatablog.com, 2025-03-08). Explore efficient kernel smoothing using Fast Fourier Transform (FFT) converted to ONNX format, leveraging tools like ndonnx and spox for enhanced interoperability and performance across platforms
📈 Optimization & Dynamics
Deriving Muon (jeremybernste.in, 2025-03-07). Muon, a new neural net optimizer, is derived from theoretical principles, leveraging RMS-to-RMS operator norm for efficient weight updates, achieving faster convergence and automatic learning rate transfer across network widths
Hessian Matrix, Taylor Series, and the Newton-Raphson Method (pyimagesearch.com, 2025-03-10). Explore the Hessian Matrix, Taylor Series, and Newton-Raphson Method to optimize machine learning models through advanced vector calculus concepts, including polynomial approximations and second-order derivatives
Training neural networks faster with minimal tuning using pre-computed lists of hyperparameters for NAdamW (arxiv:cs, 2025-03-06). This study presents hyperparameter lists for NAdamW, derived from exhaustive experiments, offering a practical tuning method for neural networks that outperforms basic sweeps and Bayesian optimization under limited resource constraints
Learning Decision Trees as Amortized Structure Inference (arxiv:cs, 2025-03-10). A hybrid amortized structure inference approach using deep reinforcement learning (GFlowNet) enables learning predictive decision tree ensembles that outperform state-of-the-art models, ensuring robustness and interpretability while scaling performance systematically
Clustering-based Meta Bayesian Optimization with Theoretical Guarantee (arxiv:stat, 2025-03-08). A scalable meta-BO method partitions heterogeneous historical tasks into homogeneous clusters, learns geometry-based surrogates for each cluster, and adapts meta-priors using statistical distance-based weighting, enhancing optimization for hyperparameter tuning
Grouped Sequential Optimization Strategy -- the Application of Hyperparameter Importance Assessment in Deep Learning (arxiv:cs, 2025-03-07). Hyperparameter importance assessment accelerates hyperparameter optimization in deep learning, employing a 'Sequential Grouping' strategy, reducing optimization time by 31.9% while maintaining model performance across various image classification datasets
Optimization in Machine Learning: From Gradient Descent to Modern Variants (jiha-kim.github.io, 2025-03-06). Explore optimization theory in machine learning using gradient descent and modern variants, connecting them with physics-inspired frameworks, as well as addressing convergence issues and providing examples like gradient flows and Lyapunov stability
🔍 Interpretability & Forecasting
Developmental interpretability (danmackinlay.name, 2025-03-10). Developmental interpretability explores the evolution of neural networks during training, emphasizing mechanistic phase transitions, training dynamics, and curriculum influences, utilizing tools like Singular Learning Theory and visualizations of training trajectories
Stiffness in Neural Networks (eklausmeier.goip.de, 2025-03-09). The text discusses the relationship between neural networks and stiff ordinary differential equations, emphasizing the importance of stiff solvers to bypass high frequency solutions and target low frequency ones for optimizing towards global maxima
R² Priors for High-Dimensional Linear Regression and Autoregressive Timeseries in PyMC (austinrochford.com, 2025-03-07). PyMC implementation of R²-based priors for high-dimensional linear regression and autoregressive timeseries, exploring local-global shrinkage techniques using Bayesian statistical methods
Differentially Private Gradient Flow based on the Sliced Wasserstein Distance (techblog.criteo.com, 2025-03-05). A novel differentially private generative modeling approach using gradient flows and Gaussian-smoothed Sliced Wasserstein Distance achieves high-quality data generation while safeguarding privacy and reducing computational costs
(News from) Probabilistic Forecasting of univariate and multivariate Time Series using Quasi-Randomized Neural Networks (Ridge2) and Conformal Prediction (thierrymoudiki.github.io, 2025-03-09). Probabilistic forecasting of univariate and multivariate time series using Ridge2 neural networks and conformal prediction offers efficient techniques for handling nonlinear interactions and enhancing prediction intervals
There is no such thing as “the best approach for everything” (openforecast.org, 2025-03-06). No single forecasting method is superior for every situation. Tools like cross-validation and statistical tests can help identify the best approach, as performance varies across different datasets and contexts