📊

Data Scientist (with R)

Published 24th June 2025

🌐 Community & Learning

R Weekly 2025-W26 Duckplyr, Trail Running, Oh Leave It Out (rweekly.org). Duckplyr joins tidyverse, exploring trail running data science, new R packages, and updates in healthcare analytics and community events

Drop #669 (2025-06-23): Monday Morning (Barely) Grab Bag (dailydrop.hrbrmstr.dev). Insights into a Rube Goldberg-inspired DuckDB pipeline, R package fplot for plot automation, and advanced CSS color techniques from CSS-Tricks

Posts (jottr.org). Explore various posts on JottR, featuring advancements in parallelization with R, including tools like %dopar%, parallelly, and progressr, along with meetups and updates on high-performance computing support

Git, un gentil 'push' vers une meilleure maîtrise (ropensci.org). Maëlle Salmon will lead a Git workshop during the Happy R session at the StateoftheR event held at Campus AgroParisSaclay, focused on enhancing skills in version control for data science

¡Miércoles, Git! Manejo de errores en Git y no morir en el intento (ropensci.org). Workshop on Git error management in Spanish by Maëlle Salmon, hosted by R en Buenos Aires and R-Ladies+ Santa Rosa

🔧 tidyverse & Data Processing

Obtenir des tableaux personnalisés dans les trois formats de sortie de quarto grâce à flextable (delladata.fr). Customizing tables in Quarto with flextable for consistent output in HTML, Word, and PDF formats without duplicating code chunks

duckplyr fully joins the tidyverse! (tidyverse.org). Duckplyr 1.1.0 integrates with the tidyverse, providing a powerful dplyr backend using DuckDB for efficient data manipulation and analytics. Install via CRAN to enhance performance with large datasets

Future got better at finding global variables (jottr.org). The future package celebrates a decade on CRAN, with recent globals package updates enhancing global variable detection for parallel processing using a DFS algorithm, resolving unexpected errors and improving usability in R

An easy-to-manage tool for forest ecosystem modeling—The pnetr R package (methodsblog.com). The pnetr R package simplifies forest ecosystem modeling, addressing challenges in ecosystem process interactions, particularly vegetation phenology's effects on carbon and water cycles, using the PnET framework

🚀 Shiny & Web Development

From Friction to Flow: Designing Smarter Dashboards with {bidux} (jrwinget.com). The BID framework and bidux R package enhance dashboard design by integrating cognitive psychology principles to reduce user friction and improve decision-making in data visualization

Creating A Question Bank Using Google Sheet, Plumber, and Digital Ocean Droplet (kenkoonwong.com). Build a flashcard-style question bank using Google Sheets for storage, R's Plumber API for backend, and Digital Ocean for hosting, with step-by-step setup and tips

Shiny in Production 2025: Lightning Talk Lineup (jumpingrivers.com). Shiny in Production 2025 showcases lightning talks on epidemiological surveillance, lifeguard monitoring, UI-first development, cancer treatments, and app management challenges using R and Shiny

Shiny in Production 2025: Full Length Talks (jumpingrivers.com). Shiny in Production 2025 announces full-length talks featuring tools like R Shiny, Rlinguo, and Shiny for Python for government services, mobile applications, AI-driven forecasting, and enhanced data visualization in healthcare and education

📦 R Package Development

CodemetaR Author streamlines software metadata updates (softwareheritage.org). CodemetaR facilitates R developers in managing software metadata updates, utilizing codemetar to generate Codemeta JSON files that enhance reproducibility and discoverability in diverse programming environments

Season 48 is now in 📦{survivoR} + new datasets and data updates (gradientdescending.com). Survivor 48 data is now available in the survivoR package, featuring new scoring metrics, a refined castaways table, updates to boot orders, and insights for the upcoming Season 50

An Arch linux package for the Air R formatter (discindo.org). An Arch Linux package for the Air R formatter, named r-air, created for compatibility reasons to avoid conflicts with existing Go binaries. It enables seamless formatting for R code using a Rust-backed language server

Kendallknight: An R package for efficient implementation of Kendall’s correlation coefficient computation (pacha.dev). Kendallknight is an R package that optimizes Kendall’s correlation coefficient computation, reducing operations from 400 million to 200,000 for large datasets, enhancing processing speed without compromising accuracy

Simplifying R Package Documentation with pkgdown and GitHub Pages (r-consortium.org). pkgdown simplifies creating R package documentation websites, integrating seamlessly with GitHub Pages for easy hosting, while GitHub Actions automates updates, enhancing usability and adoption of R packages

📊 Data Analysis & Visualization

Handling and visualising archive data from Strava (samlangton.info). Samuel Langton shares methods for handling and visualising archived Strava data using R, including parsing GPX files and creating summarised statistics and visualisations with packages like dplyr and ggplot2

Oh Leave it Out (kieranhealy.org). This post discusses leave-one-out techniques in data analysis using R, specifically jackknife cross-validation and summary statistics, with practical examples applying tidyverse functions to compute adjusted means

Understanding data uncertainty philosophically (diagrammonkey.wordpress.com). Explores philosophical perspectives on data uncertainty, emphasizing the significance of uncertainty estimates, contrasting GUM's approach with error methodology, and its implications for climate research decision-making

Trail Running Meets Data Science: Adventures with LLMs and Race Stats (posit.co). Explore dynamic data insights through RStudio, Jupyter, and VS Code with secure package management for Python and R, facilitating faster reporting and innovation in clinical trials and data-driven applications

📈 Statistics & Machine Learning

Checking Data and Code in Repositories with Papercheck (daniellakens.blogspot.com). Papercheck automates checks of data repositories linked to scientific manuscripts, ensuring adherence to open science best practices by verifying the presence of documentation, organized folders, and necessary components like Read Me files and codebooks

BayesComp 2025.1 (xianblog.wordpress.com). The BayesComp 2025.1 workshop featured discussions on post-Bayesian inference, powered likelihood, Gibbs posteriors, cut posteriors, and computational challenges, as presented by experts including Jeremias Knoblauch and David Rossell

Demystifying LLMs with Ellmer (r-consortium.org). Explore the integration of LLMs with R using the ellmer package and Shiny framework, as demonstrated by Joe Cheng's workshop, transitioning from skepticism to embracing AI's transformative potential in data science

Stacked generalization (Machine Learning model stacking) + conformal prediction for forecasting with ahead::mlf (thierrymoudiki.github.io). This guide demonstrates using ahead::mlf for univariate probabilistic time series forecasting with machine learning, particularly Elastic Net, Stacked Generalization, and Conformal Prediction techniques

Beyond ARMA-GARCH: leveraging any statistical model for volatility forecasting (thierrymoudiki.github.io). A flexible hybrid approach for probabilistic stock forecasting integrates any statistical model with ARCH effects, using tools like Theta, ARIMA, and exponential smoothing for enhanced volatility forecasting accuracy

📚 Academic Research

Bayesian integrative factor analysis methods, with application in nutrition and genomics data (arxiv:stat). Guide to Bayesian integrative factor analysis models for multi-study data integration in nutrition and genomics, including practical implementation with R code

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
First dibs on merch (details still cooking)
That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!