📊

Data Scientist (with R): 26th August 2025

Newsletters sent once a week, unsubscribe anytime.

Published 26th August 2025

👥 R Community & Research Applications

Epiverse community engagement and software sustainability for research software (epiverse-trace​.github​.io). Epiverse-TRACE refactors ringbp R package: modularized scenario_sim, customizable incubation and delay distributions, presymptomatic transmission options, roxygen2 inheritance, vignettes, bug fixes, and improved user experience

How R Powers Cancer Research and Community Teaching in Austria (r-consortium​.org). Ekaterina Akimova-Höpner (R-Ladies Salzburg) discusses using R for cancer genomics, Bioconductor, Tidyverse pipelines, RUGS grants, and building a Western Austria R community with workshops and Mindful Doctorate course ideas

ANES 2024 is Out! How to Analyze the Data with R (rworks​.dev). ANES 2024 data analyzed in R using survey and srvyr; weights, strata, and clusters; in-person/web samples; recoding V241229; design object with V240101b; population adjustments via tidycensus and ACS 2023; population estimates for 18+ citizens; GQ adjustments; code snippets and data loading

El rol del software de investigación a lo largo del ciclo de vida investigativo (ropensci​.org). Webinars de software y datos de investigación: papel de Yanina Bellini Saibene, rOpenSci y contribuciones al software de investigación abierto a lo largo del ciclo de vida, con registro en Zoom

📈 Data Analysis & Visualization

Step count versus city walkability (rawdatastudies​.com). Step-count data from thousands of movers linked to city walkability; code and data on GitHub; critique of data sharing, non-linear fits, and raw data availability in a countrywide natural experiment

Men's domestic chores and fertility rates - Part II, technical notes by @ellis2013nz (freerangestats​.info). Technical notes on drawing directed graphs with coloured edges using ggdag, dagify, and tidy_dagitty; UN SDG data access via curl and httr, SDG Series DataCSV; data wrangling for time-use by sex, age, location; model notes with country random effects

Closing my tabs (Aug 22, 2025) (blog​.stephenturner​.us). Digest of AI, genomics, and data science topics: European Heart Journal study on vascular ageing post-COVID, Ground Truths by Topol, R Weekly Positron/Shiny, nf-core advisories, Julia for R users, mcptools, AI hallucinations, Bluesky for science, Moonshots in biology, and Parallellism in R/Python with mall 0.2.0

Deploying a Golem Shiny App to ShinyApps.io (pacha​.dev). Deploying a Golem Shiny App to ShinyApps.io: troubleshooting dependencies, renv, CRAN/GitHub sources, golem::add_shinyappsio_file, remotes::install_github, .rscignore, and rsconnect deployment

Replicating Hansen’s Econometrics using Armadillo (pacha​.dev). Dataset replication of Hansen's Econometrics in Armadillo, using tidy data principles, C++/Armadillo implementation, Hansen book exercises datasets, Mauricio Vargas S. August 2025, Buy Me a Coffee note

🤖 AI & LLMs in R

ragnar 0.2 (tidyverse​.org). Ragnar 0.2 introduces a tidy R package for building trustworthy RAG pipelines, embedding with OpenAI models, creating a duckdb store, and retrieving via semantic and BM25 scoring

AI vs Manual Scatterplots in R: ggplot2 Workflows for the AI Era (datavizpyr​.com). AI vs manual ggplot2 scatterplots in R: three workflows—manual artisan, AI-assisted, and hybrid—using Palmer Penguins, ggplot2, dplyr, custom color palettes, geom_point, geom_smooth, labs, theme_minimal, and ggsave

Prejudicial Peer Review with AI (blog​.stephenturner​.us). PLANES framework evaluates plausibility of infectious disease forecasts using modular heuristic components (repeat, taper, shape, trend) with rplanes R package, validated on FluSight data and correlated with WIS

genAI Day 2025 (rinpharma​.com). GenAI Day 2025 showcases pharma-focused GenAI use cases: LLM-driven Shiny apps, clinical trial applications, ADaM AI pair programming, CDISC data automation, multi-agent frameworks, RAG, and Shiny tooling with Posit, Roche, Pfizer, Eli Lilly, Biogen, Novo Nordisk, Appsilon, Formation Bio, A2-AI

Setting up local LLMs for R and Python (posit​.co). Setting up local LLMs for R and Python with Posit's Positron, local inference workflows, RStudio/Jupyter/VS Code integration, CRAN/PyPI/Bioconductor package management, Shiny apps, and AI workflow enhancements

📊 Statistical Methods & Probability

The birthday problem (r​.iresmi​.net). Birthday problem simulation in R: max_group_size 60, 1e5 iterations, multi-core with furrr, simulate birthday collisions including 365/366 days, plot probability vs group size, identify 50% collision threshold

Probability Density Function (PDF) (statisticalaid​.com). Overview of Probability Density Function (PDF): continuous variables, PMF vs PDF, Uniform and Normal examples, CDF relation, area under curve equals 1, practical computation, applications in data science, physics, finance, engineering, healthcare, and visualization

2-Sample Median Bootstrap Test Calculator (statisticsbyjim​.com). Bootstrap-based 2-sample median test, bootstrap confidence intervals, R or Python implementation, medians comparison, nonparametric inference, effect size, p-values, permutation ideas, sample sizes, robust statistics

Fact and fiction in statistics (larspsyll​.wordpress​.com). Frequentism and Bayesianism incomplete; causal justifications for data-generating processes; DeFinetti on idealizations; critique of model misspecification and physical constraints in health, medical, and social sciences

🧠 Bayesian Statistics & Advanced Modeling

“Surprises” in BLS Jobs Revisions Became More Frequent After 2020 (medium​.com/@baogorek). BLS job revisions, 2-distribution model, mclust in R, quantile residuals, surprise proportion ~11.3%, post-2020 revision patterns, GFC, dot-com era, February vs June revisions, Groshen interview

ベイズ構造時系列モデル(bstsパッケージ)の個人メモ (watagusa​.hatenablog​.com). Bayesian structural time series with bsts in R: generate synthetic data, spike variables, local linear trend, seasonal components, model fitting via MCMC, and posterior summaries

Stop Guessing at Priors: R2D2’s Automated Approach to Bayesian Modeling (dspn​.substack​.com). Explores R2D2 Bayesian priors, R2D2M2 extensions for GLMs and multilevel models, variance allocation on R², Dirichlet decomposition, and practical code for hierarchical data

Bayes' Theorem as universe ratios (scyy​.fi). Bayes’ Theorem reinterpreted as universe ratios, using prior odds, likelihoods, and universe-shares to illustrate hypothesis updating with an illustrative dinner invitation example

📚 Academic Research

CSTEapp: An interactive R-Shiny application of the covariate-specific treatment effect curve for visualizing individualized treatment rule (arxiv:stat). First-ever Shiny app for estimating individualized treatment rules in precision medicine through point-and-click interface. Essential for R users building interactive dashboards for causal inference and medical applications

piCurve: an R package for modeling photosynthesis-irradiance curves (arxiv:stat). Comprehensive R package with 24 models, uncertainty quantification, and tidy workflows for reproducible biological research. Demonstrates best practices in R package development and scientific computing

Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients (arxiv:stat). Advanced variable selection method with rangerKnockoff R package for mixed data types and nonlinear models. Critical for data scientists working with complex datasets requiring rigorous statistical inference

Multinomial probit model based on joint quantile regression (arxiv:stat). Novel Bayesian quantile regression approach using Gibbs sampling for multinomial choice data analysis. Valuable for Stan users and researchers applying advanced statistical modeling techniques

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!