📊

Data Scientist (with R)

Newsletters sent once a week, unsubscribe anytime.

Published 19th August 2025

🎓 Training, Community & Resources

Blogroll (jrhawley​.ca). A collection of personal blogs spanning Bartosz Ciechanowski’s interactive explainers, Mandy Brown’s essays, Tatsuya Tanaka’s daily miniature photography, Canadian history via Active History, and tech voices like Julia Evans, Ethan Marcotte, Erin Kissane, Alex Kladov, Yoshua Wuyts, plus statistical and genomic insights from Gelman, Navarro, Heng Li, Tao, and n-Category cafe

How We Do Training at Jumping Rivers: Seamless, Expert-Led, and Tailored to You (jumpingrivers​.com). Jumping Rivers offers customised, expert-led data science training with admin-supported, flexible online/onsite delivery; from SPSS/SAS transitions to R, Python, SQL, Quarto reports, and training audits

Personalized R and Shiny training sessions (pacha​.dev). 1:1 personalized R and Shiny training tailored to your datasets, questions, and objectives; contrasts with MOOCs; ideal for data analysts transitioning from spreadsheets and researchers automating tasks; includes four sessions for 164 CAD; discounts for students; links to YouTube channel and Buy Me a Coffee

Enabling GenAI Capabilities for Statistical Programmers (rinpharma​.com). GenAI for R statistical programmers: GitHub Copilot in RStudio, voxr package for Vox LLM API, text generation, sentiment analysis, debugging, documentation, custom training, responsible use, performance with context, Pfizer

🔧 Posit Platform & IDE Tools

R and the Model Context Protocol (simonpcouch​.com). mcptools enables MCP in R for server/client roles, enabling tools like btw for package docs, Rscript mcptools::mcp_server(), and ellmer-based tool registration in Claude/Positron

Closing My Tabs: Aug 15 2025 (blog​.stephenturner​.us). Weekly tabs recap: Prompts for bioinformatics, Colossal Foundation AI roles, Posit Positron release, OncoGAN cancer genome sims, Nature Genetics AI papers, LLM-powered teaching talks, and AI deskilling in endoscopy

Do Data Science Better: Try Posit for Snowflake’s Powerful New Tools (posit​.co). Posit integrates with Snowflake to deliver enhanced data science tools, including Positron IDE, AI assistance, open-source packages like dbplyr and orbital, and secure workflows with R and Python

Unlock your Posit Platform’s full potential, turning usage data into actionable insights (posit​.co). Posit Chronicle, Workbench, Connect, and Package Manager enhancements grant deeper auditing, usage analytics, and CVE-aware publishing for RStudio, Jupyter, and VS Code environments

New Posit platform insights with Chronicle (posit​.co). Chronicle provides centralized metrics collection for Posit products like Connect and Workbench, with a Chronicle Metrics Cookbook, data ownership, and late-August releases for dynamic data science insights

Announcing Positron, a new Data Science IDE (posit​.co). Positron: a free, multi-language data science IDE from Posit PBC, integrating Python and R workflows, centralized management for RStudio/Jupyter/VS Code, with CRAN/PyPI/Bioconductor snapshots, Shiny hosting, Databricks integration, and Elastic License 2.0

Smarter Science, Better Tools: Discover What’s New in Posit for Fall ’25 (posit​.co). Posit unveils Fall ’25 with Posit AI Suite, Data Cloud Accelerators, enterprise governance tools, and partnerships with Snowflake and Databricks for R and Python workflows

Which AI model writes the best R code? (posit​.co). Which AI model writes the best R code? Posit explores AI integrations for data science tooling, Positron, RStudio, Jupyter, VS Code, CRAN, PyPI, Bioconductor, Shiny apps, and AI-driven code evaluation

📩 R Package Development & Tools

Little useless-useful R functions – Absurd converter with Markdown report (tomaztsql​.wordpress​.com). R, Markdown reports, absurd unit conversions, chaos_noise, ggplot2, gganimate, t.test, Markdown report generation, Confuser function, unit_converter_confuser, repository hint, Absurd_Converter.R, GitHub, TomazTsql

Creating and validating standardized R project structures that are psych-DS compliant-ish (mmmdata​.io). Ian Hussey introduces psych-DS-ish, an R package for generating and validating standardized, though flexible, project skeletons aligned with psych-DS principles; includes skeleton structure, validation, and potential future alignment with psych-DS

Using Visual Studio Code to Debug R Packages with C++ Code (pacha​.dev). Using VSCode to debug cpp11/Rcpp in a cpp11armadillo R package: install, configure, and test in VSCode with LLDB; set include paths, launch.json, Makevars, and testthat

📈 Data Visualization & Reporting

Seven accessibility tips for Quarto and R Markdown users (remlapmot​.github​.io). Seven accessibility tips for Quarto and R Markdown users: document formats, alt text, tagged PDFs, table accessibility, cross-checkers, custom Word templates, and uploading complex HTML to Blackboard Ultra

How To Rotate x-axis Text Labels in ggplot2 (datavizpyr​.com). Rotate x-axis labels in ggplot2 using theme() and element_text() with angle and hjust, explore 45° and 90°, fix alignment, and try guide_axis() and scale_x_discrete() alternatives

The (non-)Ethics of Capitalism (freakonometrics​.hypotheses​.org). Gallup honesty survey by profession; salaries from BLS OEWS 2023; data wrangling in R: download.file, read.csv, plot percent_high vs avg_salary_usd with text labels; critique of link between ethics and pay

Kyoto (äșŹéƒœ), Japan, vs. MontrĂ©al, Canada (freakonometrics​.hypotheses​.org). Compares Kyoto and MontrĂ©al using Wikipedia data, visualizes temperature, humidity, daylight variations with R's XML package, tables, rectangles, and custom plotting in a data-driven analysis

📊 Statistical Methods & Inference

Nonparametric serial interval estimation (statsandr​.com). Nonparametric serial interval estimation with uniform mixtures using EpiDelays in R: estimSI, simSI, bootstrap CIs, simulated and real data (Lessler 2009), Gaussian target SI, negative values handled, Gressani & Hens 2025

A short statistical reasoning test (emiruz​.com). Practical statistical reasoning tasks: Bayesian and likelihood-based p-intervals for binomial p, Poisson and multiplicity-based density for unexpected burglary counts, and buses via multinomial-like density with 95% HDI; R code snippets and simulations

Measures of Central Tendency for an Asymmetric Distribution, and Confidence Intervals (fharrell​.com). Measures of central tendency for asymmetric distributions; comparisons of mean, median, pseudomedian; CI accuracy; BCa bootstrap; HD quantile estimator; Hodges-Lehmann; lognormal simulations; R functions pMedian, cimed; confidence interval debate

For Your Syllabus: Statistical Power (carlislerainey​.com). Five readings on statistical power for quantitative political science: Arel-Bundock et al. on underpowered studies; Bloom 1995 on minimum detectable effects; Power Rules; Lakens 2022 on sample size justification; Blair et al. 2019 MIDA framework for design diagnostics

Unnatural Selection (learningfromexamples​.com). Ronald Fisher, maximum likelihood, Fisher information, randomisation, linear discriminant analysis (LDA), Wright–Fisher model, Bayesian notions, and the social implications of statistics across genetics, AI, and bias auditing

Positive and negative descriptions of numeric data (shape-of-code​.com). Examines how positive versus negative descriptions of numeric data shape reporting, exploring quantifier usage in exam results, Bayesian mixed-effects models, and alluvial plots of quantifier flows

Myth busting statistical methods (aliceinstatisticsland​.wordpress​.com). PhD supervisors across ANU; myths in statistics: unchanged methods since 1980s, p-value supremacy, embracing ambiguity; bootstrapping, hierarchies, imputation, ML discussed; workshop inspiration and Statistical Support Network

🔬 Applied Data Analysis

Men's domestic chores and fertility rates - Part I (freerangestats​.info). Time-use data, SDG 5.4.1, UN SDDS, TFR, GDP per capita, gender inequality index, mgcv::gam in R, prop_male, marginaleffects package, deviance analysis, and country-level artefacts in fertility modelling

Making Smarter Business Decisions with Propensity Score Analysis (statisticalhorizons​.com). Explores propensity score analysis to estimate causal effects in business, using R, tidyverse, MatchIt, and the lalonde dataset to address confounding in observational studies

MLOrbs?: MLOps in the database with orbital and dbt (emilyriederer​.com). MLOps in the analytical database using orbital's sklearn-to-sql and tidymodels, sqlglot, and dbt; churn modeling with IBM telecom data; zero-infrastructure deployment inside dbt pipelines

What Launch Angle Revolution? The Washington Nationals in 2025 (conormclaughlin​.net). Ground-ball rate spikes as Nationals eclipse air metrics with sub-10° launch angles; FanGraphs data, baseballr in R, GB% calculation, LA vs GB scatter plot highlight coaching impact

🧬 Bioinformatics & Computational Biology

Learning Antimicrobial Resistance (AMR) genes with Bioconductor (kenkoonwong​.com). AMR gene exploration with Bioconductor on 3,280 E. coli genomes; ESBL detection (CTX-M-15), gene nomenclature, sequence analysis; Biostrings readDNAStringSet; plasmids vs chromosome; class A beta-lactamase sequences; reproducible R Markdown

Reproducible Computing in Bioinformatics: Lessons from My Latest Talk (divingintogeneticsandgenomics​.com). Reproducible computing in bioinformatics: versioned data, Git, conda/renv, Docker, Jupyter/Quarto, Snakemake/Nextflow, targets, marimo, Rocker, BioContainers, and practical tips from AstraZeneca bioinformatics leadership

Inferring habitat clusters with eBird data (ctompkins​.netlify​.app). Using eBird ST pro patterns, habitat clustering employs unsupervised learning with R packages like tidytree, factoextra, and clValid to analyze bird abundance data in Pennsylvania

📚 Academic Research

BKP: An R Package for Beta Kernel Process Modeling (arxiv:stat). BKP: an R package implementing Beta Kernel Process for nonparametric modeling of spatially varying binomial probabilities with conjugate beta priors, kernel options, aggregated responses, and DKP extension

Modelling Skewed and Heavy-Tailed Errors in Bayesian Mediation Analysis (arxiv:stat). Introduces Centred Two-Piece Student t Distribution (CTPT) for Bayesian mediation with skewed, heavy-tailed errors; evaluates via simulations; provides R package FlexBayesMed

👋 Before you go

I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

  • Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
  • First dibs on merch (details still cooking)
  • That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!