📊

Data Scientist (with R): 30th September 2025

Newsletters sent once a week, unsubscribe anytime.

Published 30th September 2025

🫶 R community, conferences & roundups

Ten year anniversary of Free Range Statistics by @ellis2013nz (freerangestats​.info). Ten years of Free Range Statistics blogging: 225 posts, 350,000 words, analytics in R with loess smoothing and markup removal

Does Traveling to a Professional Conference Make Sense if You’re Retired? (NextChapter​.machlis​.com). Retired data scientist attends in-person data science conference in Atlanta to reconnect with R community, gain focus, and stress learning over career growth

R Weekly 2025-W40 Ducklake, Slidecrafting, Shiny & LLMs (rweekly​.org). Ducklake, Slidecrafting with reveal.js and Quarto; Shiny & LLMs; testing with testthat; new and updated R packages and CRANberries

rOpenSci News Digest, September 2025 (ropensci​.org). Monthly digest highlighting rOpenSci activities, training, software reviews, new packages, and community updates

2025-09-26 AI Newsletter (posit​.co). Posit AI newsletter covers Anthropic Claude reliability, Codex updates, GPT-5-Codex, agent definitions, and Posit news and partner programs

Issue 2025-W39 Highlights (serve​.podhome​.fm). R Weekly Highlights episode 211 covers posit::conf(2025), Reading & Writing Markdown with programming, Vibe-coding a new package to learn Japanese, and quality control exams

Software for Phylogenetic Trees: SSEC co-organizes Workshop in London (escience​.washington​.edu). SSEC co-organizes a London workshop on Phylo2Vec with Rust core rewrite and Python/R APIs across Imperial College London and University of Copenhagen

SSEC collaborates on Workshop at the Barcelona Supercomputing Center (escience​.washington​.edu). SSEC aids BSC workshop on ecological forecasting and Docker-based R package development for biodiversity projections

📦 R packages: releases, testing & maintenance

shinystate 0.1.0 is available on CRAN! (shinydevseries​.com). shinystate 0.1.0 enables server-side bookmarking with StorageClass, snapshot/restore, and multiple session management for Shiny apps

Piwik Pro doesn't offer a free plan anymore (rstats-tips​.net). Piwik Pro free plan discontinued, pricing at least 420€ yearly; impact on piwikproR CRAN package maintenance for R developers

Testing with {testthat} (jumpingrivers​.com). Overview of testthat features for R package testing, including expectations, structure, and visual tests with doppelganger

ggsci 4.0.0: 400+ new color palettes (nanx​.me). ggsci 4.0.0 adds 400+ discrete color palettes from Primer, Atlassian, and iterm2-color-schemes for ggplot2 and plotnine

Dependencies and reverse dependencies: Python vs. R (spatialists​.ch). Reverse dependency checks in CRAN shape empathetic maintenance, easing migrations for researchers and data scientists

R-multiverse: a new way to publish R packages (ropensci​.org). R-multiverse creates a dual repository for R packages, enabling central installation, automated quarterly snapshots, and GitHub/GitLab-based releases

New version of phytools on CRAN (blog​.phytools​.org). New phytools 2.5-2 on CRAN adds models, updates, and new simulation and plotting tools for comparative biology

📝 Quarto, R Markdown & APA-style reporting

Moving to quarto and LaTeX… (blog​.ouseful​.info). Moving to Quarto and LaTeX: from Jupyter Book, PDFs, and EPUBs to print-on-demand booklets with LaTeX templates and build automation

Weekly recap (Sep 26, 2025) (blog​.stephenturner​.us). Apple enters protein folding with SimpleFold 3B parameters; parses R Markdown and Quarto, vibe coding an R package; AI in biosecurity and writing; de-extinction conservation applications

Recreating APA Manual Table 7.16 in R with apa7 (wjschne​.github​.io). recreating APA Table 7.16 in R using apa7, flextable, tidiverse, and lavaan to simulate and format path-model results

Recreating APA Manual Table 7.19 in R with apa7 (wjschne​.github​.io). Recreating APA Table 7.19 in R using apa7, flextable, ftExtra, tidyverse, and easystats

📈 ggplot2 visualizations, charts & mapping in R

Analyzing ICE Arrest Data - Part 2 (jefworks-lab​.github​.io). Analyzes ICE arrest data from deportationdata.org with R (tidyverse, dplyr, ggplot2, gganimate) to compare criminality categories and apprehension methods over time

Tuesdays and Travels (seanlunsford​.com). Reflection on globalization, data visualization with Tidy Tuesday, and visa policies through a Sankey chart highlighting US vs global visa access

Exploring ggbot2: Creating a volcano plot with your voice (tomsing1​.github​.io). Voice-controlled volcano plot creation using ggbot2, ggplot2, ggrastr, ggrepel, and Shiny within R; with Mattila et al. data and LLM-driven code generation

Vizualizing global testosterone levels by country (mihiretukebede​.com). Scrapes testosterone-by-country data from WorldPopulationReview and builds a Python/R-style choropleth map using tidyverse, rvest, sf, and viridis

Still presenting regression results in tables? why not forest plots? (mihiretukebede​.com). Reproduces an elegant JAMA forest plot in R with meta package, comparing regression results to tables

UK Household Spending Inequality by Income Level in FYE 2024 (stevenponce​.netlify​.app). UK housing costs drive largest spending gap across quintiles using ONS Family Spending data in an R viz with tidyverse, ggridges, patchwork, and custom themes

Spurious Correlations in R - Correlation is not Causation (pacha​.dev). Spurious correlations in R with spuriouscorrelations: plotting, lm modeling, and double y-axis visuals highlighting correlation vs causation

🔬 Applied analyses & workflows with R (health, bio & maps)

NHANES Activity using MIMS (Monitor-Independent Movement Summary) (hopstat​.wordpress​.com). NHANES MIMS analysis with MIMSunit in R, comparing default vs custom MIMS for 80Hz NHANES 2012 data

Learning And Exploring The Workflow of RNA-Seq Analysis - A Note To Myself (kenkoonwong​.com). RNA-Seq workflow in C. difficile: fastp, kallisto, DESeq2; QC, transcriptome reference, Tximport, PCA, and differential expression

Lake Hornborgasjön cranes: seasonal peaks and long-term growth (stevenponce​.netlify​.app). Spring migration peaks and long-term growth in crane counts shown with tidyverse, ggplot2, and TidyTuesday data

Mapping Bike Rides (Part III) (rasterweb​.net). GPX files on a map via gpx.studio, Mapbox tiles, and other free/open tools for mapping bike rides

🧮 Statistical inference, simulations & Bayesian thinking

Type S and M errors as a “rhetorical tool” (daniellakens​.blogspot​.com). Gelman and Carlin's Type S/M errors discussed as rhetorical tools vs. practical methods; author critiques their use in study design and interpretation

A warning about data-driven simulations (garstats​.wordpress​.com). Cautions on data-driven simulations: sampling distributions, population vs. sample, and power bias in RT lexical decision data

A Chess Scandal Revisited – Why Nakamura is Right About Cherry-Picking (bayesianspectacles​.org). Bayesian analysis debates cherry-picking in Nakamura-Kramnik chess controversy; discusses likelihood principle, optional stopping, change-point models, and data selection

Some notes on survey weights (blog​.djnavarro​.net). Survey weights in NHANES: correcting for stratified sampling when modeling height with GAMLSS in R

scalable Monte Carlo for Bayesian learning [book review] (xianblog​.wordpress​.com). Review of scalable Monte Carlo for Bayesian learning, covering stochastic gradient MCMC, non-reversible MCMC, continuous-time MCMC, and convergence diagnostics

Tomorrow's Causal I Workshop at Mixtape Sessions, A Fight I Saw at the Patriots-Steelers Game, and Thoughts About My Pedagogy in My Gov 50 Class at Harvard (causalinf​.substack​.com). Harvard Gov 50 causal inference pedagogy, classroom tools like Cosmos and ChatGPT, and a vivid Patriots-Steelers game anecdote

Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind (towardsdatascience​.com). MissForest's standard imputation lacks stored models for predictions; MissForestPredict preserves imputation parameters for train/test, MAR/MCAR/MNAR handling, and out-of-time validation

📚 Academic Research

An Interpretable Single-Index Mixed-Effects Model for Non-Gaussian National Survey Data (arxiv:stat). Interpretable single-index mixed-effects model for non-Gaussian survey data with skewed random effects, heavy-tailed residuals, monotone single index, grouped horseshoe, and survey weights in periodontal CAL/PD analysis using MSIMST

Measuring Partial Exchangeability with Reproducing Kernel Hilbert Spaces (arxiv:stat). Measuring partial exchangeability in Bayesian multilevel models via reproducing kernel Hilbert spaces for a priori and posterior dependence

Detecting gene-environment interactions to guide personalized intervention: boosting distributional regression for polygenic scores (arxiv:stat). Cyclical gradient boosting for Gaussian location-scale models to derive sparse polygenic scores for mean and variance, revealing GxE interactions with statins and lifestyle

Improving Disease Risk Estimation in Small Areas by Accounting for Spatiotemporal Local Discontinuities (arxiv:stat). Greedy scan clustering integrated into Bayesian spatiotemporal modelling improves cancer mortality risk estimation in Spanish municipalities

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!