📊

Data Scientist (with R): 2nd September 2025

Newsletters sent once a week, unsubscribe anytime.

Published 2nd September 2025

📦 Package Updates & Community

July 2025 Top 40 New CRAN Packages (rworks​.dev). July 2025 CRAN Top 40: ciflyr, hmde, RBaM, ArgentinAPI, bcRP, gilmour, ROCnGO, tEDM, topolow, flowcluster, mantis, staving a diverse mix of causal inference, Bayesian modeling, ML, time series, utilities, and visualization

movepub 0.4.0 (oscibio​.inbo​.be). movepub 0.4.0 introduces new functions and updates write_dwc() for Movebank data, metadata standardization, and publication to GBIF/OBIS

rOpenSci News Digest, August 2025 (ropensci​.org). rOpenSci News Digest: community calls on R-multiverse, useR! 2025 and posit::conf(2025), new packages trud, sasquatch, dataset, and updates to rOpenSci events and peer review

Closing my tabs (Aug 29, 2025) (blog​.stephenturner​.us). RAND Defining Hazardous Capabilities of Biological AI; OpenAI/Retro Biosciences stem cell reprogramming 50x; Bluesky for science; LLMs in education; PyPI vs CRAN; R/Python programming book; SIGCSETS; The Test Set podcast

Send Me Your Questions and Ideas (pacha​.dev). Request for reader questions and ideas on R, Shiny, and C++, with a form submission and GitHub organization

🎓 Education & Workshops

R-kurser i höst (statistikakademin​.se). R-kurser i höst: Baspaket, Mediumpaket och Kompletta paket onlinekurser i R, regression, visualisering, SEM, survival, biomarkörsdata, ML och AI

R courses this fall (statistikakademin​.se). R courses in fall: packages Basic, Medium, Complete; R 1–5 online courses on introduction, regression, visualization, survival analysis, biomarker data, ML/AI with discounts

Skytorial and Linkedtorial 1: Introduction to GitHub for Researchers (yabellini​.netlify​.app). Linkedtorials introducing GitHub for researchers, focusing on projects, repositories, version control, collaboration, reproducibility

Some Stuff I Learned About Harvard This Week (causalinf​.substack​.com). Harvard Gov50 class design, quantitative reasoning with data, R coding, GitHub workflows, PERMA well-being model, AI policy, podcast/papers on Imbens Angrist King, Causal Inference pedagogy

[open] 3-day mixOmics workshop, 22 – 24 Sept 2025, Lund University (mixomics​.org). [open] 3-day mixOmics workshop, 22 – 24 Sept 2025, Lund University – mixOmics. From Single to Multi-Omics Data Integration. beginner-level hands-on workshop using the R package mixOmics; covers exploratory and supervised analysis, data integration, Pearson/covariance, PLS methods, LASSO penalisation, and apply to own dataset

[open] Single and multi-omics analysis and integration with mixOmics (mixomics​.org). [open] Single and multi-omics analysis and integration with mixOmics – mixOmics

🛠️ Posit Tools & Development

Databot is not a flotation device (posit​.co). Databot introduction, risks, and governance by Posit; includes LLM usage warnings and data science best practices

Announcing posit::conf(2025) Virtual Day, Sept 16th! (posit​.co). Announcing posit::conf(2025) Virtual Day, Sept 16th with AMA, virtual talks, Data Science Hangout, and Discord participation

posit::glimpse() Newsletter – August 2025 (posit​.co). Positron IDE, Code OSS-based data science workflow, Shiny/Streamlit/Dash apps, Quarto 1.8, Shiny for Python 1.4.0, package releases and tutorials, Posit conf in Atlanta

Using OpenAI Codex in Positron (blog​.stephenturner​.us). Developing an R package with OpenAI Codex in Positron; Codex integration, tests, devtools, usethis, Roxygen, testthat, GPT-5 vs Claude, GitHub workflows, and cost considerations

🔬 Applied Research & Visualization

Learning The Basics of Phylogenetic Analysis (kenkoonwong​.com). Workflow in R/Bioconductor: extract 16S rRNA, barrnap, DECIPHER alignment, Jukes-Cantor distances, rapidNJ/PHYLIP, ggtree/FigTree visualization

Wildlife management (openanalytics​.eu). Open Analytics visualizes Flemish wildlife data via Faunabeheer, e-loket fauna en flora, waarnemingen.be, Wilder; R/Shiny apps, GitHub Actions, ShinyProxy, INBO/ANB collaboration

The social and spatial effects of fare cuts on public transport (urbandemographics​.blogspot​.com). Fare cuts, public transport demand, and spatial effects analyzed with agent-based modeling, space syntax, and R; urban mobility, induced demand, and social equity

Plotting with ukmaps v0.0.4 and ggplot2 (pacha​.dev). UKmaps, ggplot2, boundaries, dplyr, sf, ggplot2, London, Barnet, Golders Green, LADs, counties, country(), tintin, election_results, r counts

🤖 Machine Learning & Modeling

I was wrong about tidymodels and LLMs (simonpcouch​.com). Databot and Predictive: tidymodels usage, run_r_code, run_experiment, evaluative findings, and model performance across Claude Sonnet 4 and Gemini Pro 2.5

external regressors in ahead::dynrmf’s interface for Machine learning forecasting (thierrymoudiki​.github​.io). External regressors in ahead::dynrmf interface demonstrated with USAccDeaths, AirPassengers, fpp2 a10, fdeaths; xreg creation; runs with ridge and glmnet cv.glmnet

NRL Predictions for Round 27 (statschat​.org​.nz). NRL predictions, team ratings, performance metrics, and author David Scott background from Stats Chat

A guide to actuarial techniques in R and Python (posit​.co). A guide to actuarial techniques using R and Python, Posit tools, and open-source data science resources

📊 Statistics & Probability Methods

A One-Page Primer on: Statistical Power (carlislerainey​.com). Power analysis overview: SESOI, SE, Cohen-type references, R^2, pre–post design gains, and practical rules of thumb for readers and researchers

Bad BBC stats (bristoliver​.substack​.com). BBC statistics on pupil absence's first-week link to later absence; correlation vs causation; group composition; 18% persistently absent; p solving 0.57p+0.14(1-p)=0.18

The sisters "paradox" - counter-intuitive probability (blog​.engora​.com). Counter-intuitive probability: two-child problem, sample space, 1/3 vs 1/2, elder-youngest conditioning, Python simulation suggestion

You can’t have everything you want: beta edition (johndcook​.com). Beta priors for binomial likelihood; conjugacy, alpha+beta as prior sample size, non-informative beta(0.9,0.1) vs beta(1.8,0.2), singularities at 0 and 1, improper beta(0,0) debates

📚 Academic Research

An analysis of the effects of open science indicators on citations in the French Open Science Monitor (arxiv:cs). Statistical analysis of 900K publications showing open science practices increase citations by 8-19%. Relevant for R data scientists interested in open science impact and reproducible research practices

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!