📊

Data Scientist (with R): 29th July 2025

Newsletters sent once a week, unsubscribe anytime.

Published 29th July 2025

🎉 Community & Package Updates

vcr v2 (recology​.info). Recology announces vcr v2, enhancing usability, security, and adding features for R users, including local scoping tools and improved debugging capabilities

Easily download files from the Open Science Framework with Papercheck (daniellakens​.blogspot​.com). Papercheck's new function allows easy downloading of files from the Open Science Framework, maintaining folder structure and facilitating access for researchers and reviewers

Diversity Alliance Hackathon (rinpharma​.com). Join the R/Pharma Diversity Alliance Hackathon on September 30, aimed at underrepresented individuals to engage in open source projects with mentorship opportunities

rOpenSci News Digest, July 2025 (ropensci​.org). New rOpenSci Champions cohort supports open science in Latin America; highlights from useR! 2025; package updates and peer review submissions featured

ellmer 0.3.0 (tidyverse​.org). ellmer 0.3.0 released on CRAN enhances R package for large language models with simplified chat interface, improved tool specifications, and greater reliability

Version 3.0 of neonUtilities R package released (neonscience​.org). Major update to neonUtilities R package, version 3.0.0, enhances functionality with new and modified functions, available on CRAN for improved performance

Shiny in Production 2025: R Dev Day (jumpingrivers​.com). Join R Dev Day at Shiny in Production 2025 for hands-on contributions to base R, including Litmus, with no cost to attend

🛠️ Quarto & Development Tools

Building a Multi-Notebook Report with Quarto (safjan​.com). Using Quarto to create multi-notebook reports, integrating Jupyter notebooks for organized analysis and polished outputs in HTML, PDF, or EPUB formats

How Quarto embed fixes data science storytelling (emilyriederer​.com). Quarto's embed feature enhances data science storytelling by bridging reproducible research and effective communication, offering new ways to structure and present analysis

quarto R package v1.5.0: Streamlined Workflows for R Users (quarto​.org). Quarto R package 1.5.0 enhances workflows with R values in metadata, Markdown in tables, theme customization, and improved YAML compatibility

How to open a folder as a Positron project with macOS Quick Actions (andrewheiss​.com). Integrate 'Open in Positron' Quick Action in macOS Finder for efficient project management in Positron, a competitor to RStudio, using custom workflows

A quick tour of Positron (posit​.co). Explore Positron, a next-gen data science IDE for R and Python, featuring dynamic data insights, centralized management, and integrated AI assistance

📊 Data Visualization & Analysis

Modelbased for Quick and Beautiful Model Visualization (imachordata​.com). Explore quick and effective model visualization with the modelbased package, focusing on multivariate models using the Palmer Penguins dataset

What Makes the Difference in a Stacked Bar Chart? (policyviz​.com). Fabio Murgia introduces AlphaR, a benchmark-based approach for improving stacked bar charts, enhancing clarity in visualizing data relationships and differences

An Enigma: Transmission Of Epidemic Influenza (part 14) (jdee​.substack​.com). Exploration of influenza mortality data for females (1920-2022) using ARIMA techniques, revealing unusual age-specific trends and potential data anomalies

Tidyverse with GitHub Copilot for Healthcare Analytics – Part 2 (rworks​.dev). Analyzing diabetes patient data using Tidyverse and GitHub Copilot, focusing on care effectiveness, medication prescriptions, and hospital readmission outcomes

Reading and Tidying Barcelona Air Quality Data (jmsallan​.netlify​.app). Retrieve and tidy Barcelona air quality data using R with CKAN API, tidyverse, and janitor for data analysis and visualization

Small area estimation of age-specific and total fertility rates in Bangladesh (aliceinstatisticsland​.wordpress​.com). Dr. Unnati Saha presents small area estimation of fertility rates in Bangladesh using DHS data, highlighting regional disparities and statistical modeling techniques

📈 Statistical Methods & Theory

What’s in a correlation? (the100​.ci). Explores the intricacies of Pearson correlation coefficients, causes behind variations, and pitfalls of comparing correlations in psychological research

In Survey Calibration, Raking Minimizes a Statistical Distance Subject to an External Constraint (medium​.com/@baogorek). Raking in survey statistics adjusts weights to match population totals, minimizing statistical distance while adhering to estimation constraints, utilizing iterative proportional fitting

Bootstrap Confidence Limits for Bootstrap Overfitting-Corrected Model Performance (fharrell​.com). Efron-Gong bootstrap optimizes model performance estimation, addressing overfitting and enabling confidence interval computation for binary logistic regression

How much uncertainty is too much uncertainty? (kucharski​.substack​.com). Explores the balance of risk and uncertainty using concepts like aleatoric and epistemic uncertainty, statistical significance, and the Kelly Criterion

The difference between models, drive-time vs fatality edition (andrewpwheeler​.com). Andrew Wheeler critiques statistical significance in models comparing drive time and gunshot fatality, highlighting methodological implications and previous research results

e-values in Chennai (xianblog​.wordpress​.com). Insights from the BIRS-CMI workshop on e-values in Chennai, exploring Bayesian perspectives, likelihood ratios, empirical Bayes methods, and connections with p-values

My Road to Bayesian Stats (aneeshsathe​.com). Exploration of Bayesian Statistics, small sample sizes, significance testing, and data generating processes using DAGs in biological experiments

📚 Academic Research

A Bayesian Geoadditive Model for Spatial Disaggregation (arxiv:stat). Bayesian spatial disaggregation model using penalized splines and low-rank kriging for count data; efficient estimation and application in disease rate mapping

Modelling longitudinal polytomous animal data using Bayesian hierarchical models (arxiv:stat). Bayesian hierarchical models for longitudinal categorical data analysis; MCMC methods applied to animal behaviour in agrarian science enhance insights into welfare measures

Convergence Rate of Efficient MCMC with Ancillarity-Sufficiency Interweaving Strategy for Panel Data Models (arxiv:stat). Analyzing the efficiency of Markov chain Monte Carlo using ancillarity-sufficiency interweaving strategy in Bayesian hierarchical panel data models for improved convergence

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!