Data Scientist (with R): 29th July 2025
Published 29th July 2025
🎉 Community & Package Updates
vcr v2 (recology.info). Recology announces vcr v2, enhancing usability, security, and adding features for R users, including local scoping tools and improved debugging capabilities
Easily download files from the Open Science Framework with Papercheck (daniellakens.blogspot.com). Papercheck's new function allows easy downloading of files from the Open Science Framework, maintaining folder structure and facilitating access for researchers and reviewers
Diversity Alliance Hackathon (rinpharma.com). Join the R/Pharma Diversity Alliance Hackathon on September 30, aimed at underrepresented individuals to engage in open source projects with mentorship opportunities
rOpenSci News Digest, July 2025 (ropensci.org). New rOpenSci Champions cohort supports open science in Latin America; highlights from useR! 2025; package updates and peer review submissions featured
ellmer 0.3.0 (tidyverse.org). ellmer 0.3.0 released on CRAN enhances R package for large language models with simplified chat interface, improved tool specifications, and greater reliability
Version 3.0 of neonUtilities R package released (neonscience.org). Major update to neonUtilities R package, version 3.0.0, enhances functionality with new and modified functions, available on CRAN for improved performance
Shiny in Production 2025: R Dev Day (jumpingrivers.com). Join R Dev Day at Shiny in Production 2025 for hands-on contributions to base R, including Litmus, with no cost to attend
🛠️ Quarto & Development Tools
Building a Multi-Notebook Report with Quarto (safjan.com). Using Quarto to create multi-notebook reports, integrating Jupyter notebooks for organized analysis and polished outputs in HTML, PDF, or EPUB formats
How Quarto embed fixes data science storytelling (emilyriederer.com). Quarto's embed feature enhances data science storytelling by bridging reproducible research and effective communication, offering new ways to structure and present analysis
quarto R package v1.5.0: Streamlined Workflows for R Users (quarto.org). Quarto R package 1.5.0 enhances workflows with R values in metadata, Markdown in tables, theme customization, and improved YAML compatibility
How to open a folder as a Positron project with macOS Quick Actions (andrewheiss.com). Integrate 'Open in Positron' Quick Action in macOS Finder for efficient project management in Positron, a competitor to RStudio, using custom workflows
A quick tour of Positron (posit.co). Explore Positron, a next-gen data science IDE for R and Python, featuring dynamic data insights, centralized management, and integrated AI assistance
📊 Data Visualization & Analysis
Modelbased for Quick and Beautiful Model Visualization (imachordata.com). Explore quick and effective model visualization with the modelbased package, focusing on multivariate models using the Palmer Penguins dataset
What Makes the Difference in a Stacked Bar Chart? (policyviz.com). Fabio Murgia introduces AlphaR, a benchmark-based approach for improving stacked bar charts, enhancing clarity in visualizing data relationships and differences
An Enigma: Transmission Of Epidemic Influenza (part 14) (jdee.substack.com). Exploration of influenza mortality data for females (1920-2022) using ARIMA techniques, revealing unusual age-specific trends and potential data anomalies
Tidyverse with GitHub Copilot for Healthcare Analytics – Part 2 (rworks.dev). Analyzing diabetes patient data using Tidyverse and GitHub Copilot, focusing on care effectiveness, medication prescriptions, and hospital readmission outcomes
Reading and Tidying Barcelona Air Quality Data (jmsallan.netlify.app). Retrieve and tidy Barcelona air quality data using R with CKAN API, tidyverse, and janitor for data analysis and visualization
Small area estimation of age-specific and total fertility rates in Bangladesh (aliceinstatisticsland.wordpress.com). Dr. Unnati Saha presents small area estimation of fertility rates in Bangladesh using DHS data, highlighting regional disparities and statistical modeling techniques
📈 Statistical Methods & Theory
What’s in a correlation? (the100.ci). Explores the intricacies of Pearson correlation coefficients, causes behind variations, and pitfalls of comparing correlations in psychological research
In Survey Calibration, Raking Minimizes a Statistical Distance Subject to an External Constraint (medium.com/@baogorek). Raking in survey statistics adjusts weights to match population totals, minimizing statistical distance while adhering to estimation constraints, utilizing iterative proportional fitting
Bootstrap Confidence Limits for Bootstrap Overfitting-Corrected Model Performance (fharrell.com). Efron-Gong bootstrap optimizes model performance estimation, addressing overfitting and enabling confidence interval computation for binary logistic regression
How much uncertainty is too much uncertainty? (kucharski.substack.com). Explores the balance of risk and uncertainty using concepts like aleatoric and epistemic uncertainty, statistical significance, and the Kelly Criterion
The difference between models, drive-time vs fatality edition (andrewpwheeler.com). Andrew Wheeler critiques statistical significance in models comparing drive time and gunshot fatality, highlighting methodological implications and previous research results
e-values in Chennai (xianblog.wordpress.com). Insights from the BIRS-CMI workshop on e-values in Chennai, exploring Bayesian perspectives, likelihood ratios, empirical Bayes methods, and connections with p-values
My Road to Bayesian Stats (aneeshsathe.com). Exploration of Bayesian Statistics, small sample sizes, significance testing, and data generating processes using DAGs in biological experiments
📚 Academic Research
A Bayesian Geoadditive Model for Spatial Disaggregation (arxiv:stat). Bayesian spatial disaggregation model using penalized splines and low-rank kriging for count data; efficient estimation and application in disease rate mapping
Modelling longitudinal polytomous animal data using Bayesian hierarchical models (arxiv:stat). Bayesian hierarchical models for longitudinal categorical data analysis; MCMC methods applied to animal behaviour in agrarian science enhance insights into welfare measures
Convergence Rate of Efficient MCMC with Ancillarity-Sufficiency Interweaving Strategy for Panel Data Models (arxiv:stat). Analyzing the efficiency of Markov chain Monte Carlo using ancillarity-sufficiency interweaving strategy in Bayesian hierarchical panel data models for improved convergence
👋 Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!