Data Scientist (with R): 14th October 2025
Published 14th October 2025
🌐 R Community News
2025-10-10 AI Newsletter (posit.co). Posit newsletter covers Claude Sonnet 4.5, coding models, Databot, R/Python packaging, and AI market dynamics
Weekly recap (Oct 10, 2025) (blog.stephenturner.us). Weekly AI and biosecurity, R updates, genome engineering, RAG, Lost Science, AI in medicine, ggplot2, Quarto, and DNA forensics
A Primer on Domain Verification (ropensci.org). Domain verification across Mastodon, GitHub, and GitHub Pages demonstrates cross-site authenticity using DNS TXT records and metadata validation
2025 Annual Conference (mapor.org). MAPOR's 50th Annual Conference in Chicago features short course on complex survey data analysis in R and a keynote by the former Census Bureau director
R Weekly 2025-W42 Aaaaand… they’re off!, Generative AI for Data Visualisation, and Behavior-Driven Development (rweekly.org). Generative AI for Data Visualisation, Behavior-Driven Development in Shiny, and more R Weekly highlights
📊 R Visualisation
Generative AI for Data Visualisation (nrennie.rbind.io). Generative AI tools (ChatGPT, Claude, Copilot, Gemini) tested on weather and CEO data visualisation prompts, with varying results and guidance
Explore #TidyTuesday literary prizes with Positron’s Data Explorer (juliasilge.com). Explore Positron's Data Explorer with #TidyTuesday dataset on British literary prizes and new features
Halloween in the Round (kieranhealy.org). Explores aggregating FARS pedestrian fatalities, uses ggplot2: coord_polar/coord_radial, geom_textsegment, forte in polar plots and donut-style visuals
The World Started Tracking Severe Food Insecurity in 2016 (stevenponce.netlify.app). FAO indicators tracked since 2016; R packages and ggplot2-based visuals with tidytuesdayR data
🤖 R + AI Apps
Extracting location from text with AI (jla-data.net). Using Gemini structured output in R (gemini.R) to summarize and geocode locations from texts with prompts for single most important location
My Attempt To Reproduce Stanford HIVdb Sequence and Mutation Analysis From Scratch (kenkoonwong.com). Rebuilding Stanford HIVdb resistance logic using R with Bioconductor tools, DECIPHER, and BLAST on HIV reference genomes
R port of llama2.c (thierrymoudiki.github.io). R port of llama2.c with Shiny app, installation steps, and API access for educational use
🧩 Shiny Engineering
foreach: Making All %dopar% Behave Like %dofuture% Everywhere (jottr.org). Overview of doFuture 1.10.0 features: registerDoFuture('%dofuture%'), foreach integration, and improved error handling and RNG consistency
Behavior-Driven Development in R Shiny: A Step-By-Step Example (jakubsobolewski.com). Behavior-Driven Development with R Shiny: write specs in testthat or cucumber, build a driver, implement app and storage, evolve via TDD
Deploy Multiple Shiny Apps from One R Package (jakubsobolewski.com). Deploy multiple Shiny apps from one R package using a monorepo, sharing common R/ functions via inst/apps with a custom rsconnect deploy function
🎲 Statistical Inference
If you have two measures of the same confounder, you can just include both of them in your regression model (the100.ci). Two correlated covariates need not derail regression: including both improves bias and predictive precision for X on Y
Excursion 1 Tour I (2nd Stop): Probabilism, Performance, and Probativeness (1.2) (errorstatistics.com). Probabilism vs. performance in statistical inference; severity, error probes, and examples like Potti, Bristol-Roach, and Texas sharpshooter
Regression to the mean (blog.engora.com). Regression to the mean explained with cars, heights, sports, schools, and business examples, plus implications for analysis
How much should we trust medicine? (emilkirkegaard.com). Review of Medical Nihilism by Jacob Stegenga, examining Bayes, meta-analysis, SEU, biases, and pharmaceutical incentives in medicine
📚 Academic Research
Examining the Interface Design of Tidyverse (arxiv:stat). Examines Tidyverse interface design via HCI lens for data viz and wrangling; advocates iterative, user feedback-driven development
Zero-Inflated Bayesian Multi-Study Infinite Non-Negative Matrix Factorization (arxiv:stat). Bayesian non-parametric multi-study NMF with zero-inflation for cross-study dietary pattern analysis and cancer risk association
Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML (arxiv:cs). Declarative mapping rules with ShExML for generating CodeMeta across crosswalks, validated by SHACL/ShEx to enhance FAIR research software
Automated Gating for Flow Cytometry Data Using a Kernel-Smoothed EM Algorithm (arxiv:stat). Automated gating of flow cytometry phytoplankton via a kernel-smoothed EM algorithm for time-evolving Gaussian mixtures
👋 Before you go
Blaze newsletters will soon be moving to substack as the main email delivery service. This is primarily to make managing subscriptions, sending email and archived newsletters more streamlined - there will be no change to your newsletters, they will continue to be completely free and you will be able subscribe and unsubscribe just as easily as before.
Blaze's sister site https://blognerd.app, a search engine for blogs and posts, has had a major makeover, and is a good place to search for smart, independent writing.
Finally, if you get value from your newsletter, please consider supporting me by joining the patreon page at patreon.com/blazeemail. Becoming a patron helps me to cover my costs and to keep blaze going so everyone can enjoy the newsletters for free.
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!