Data Scientist (with R)
Published 19th August 2025
đ Training, Community & Resources
Blogroll (jrhawleyâ.ca). A collection of personal blogs spanning Bartosz Ciechanowskiâs interactive explainers, Mandy Brownâs essays, Tatsuya Tanakaâs daily miniature photography, Canadian history via Active History, and tech voices like Julia Evans, Ethan Marcotte, Erin Kissane, Alex Kladov, Yoshua Wuyts, plus statistical and genomic insights from Gelman, Navarro, Heng Li, Tao, and n-Category cafe
How We Do Training at Jumping Rivers: Seamless, Expert-Led, and Tailored to You (jumpingriversâ.com). Jumping Rivers offers customised, expert-led data science training with admin-supported, flexible online/onsite delivery; from SPSS/SAS transitions to R, Python, SQL, Quarto reports, and training audits
Personalized R and Shiny training sessions (pachaâ.dev). 1:1 personalized R and Shiny training tailored to your datasets, questions, and objectives; contrasts with MOOCs; ideal for data analysts transitioning from spreadsheets and researchers automating tasks; includes four sessions for 164 CAD; discounts for students; links to YouTube channel and Buy Me a Coffee
Enabling GenAI Capabilities for Statistical Programmers (rinpharmaâ.com). GenAI for R statistical programmers: GitHub Copilot in RStudio, voxr package for Vox LLM API, text generation, sentiment analysis, debugging, documentation, custom training, responsible use, performance with context, Pfizer
đ§ Posit Platform & IDE Tools
R and the Model Context Protocol (simonpcouchâ.com). mcptools enables MCP in R for server/client roles, enabling tools like btw for package docs, Rscript mcptools::mcp_server(), and ellmer-based tool registration in Claude/Positron
Closing My Tabs: Aug 15 2025 (blogâ.stephenturnerâ.us). Weekly tabs recap: Prompts for bioinformatics, Colossal Foundation AI roles, Posit Positron release, OncoGAN cancer genome sims, Nature Genetics AI papers, LLM-powered teaching talks, and AI deskilling in endoscopy
Do Data Science Better: Try Posit for Snowflakeâs Powerful New Tools (positâ.co). Posit integrates with Snowflake to deliver enhanced data science tools, including Positron IDE, AI assistance, open-source packages like dbplyr and orbital, and secure workflows with R and Python
Unlock your Posit Platformâs full potential, turning usage data into actionable insights (positâ.co). Posit Chronicle, Workbench, Connect, and Package Manager enhancements grant deeper auditing, usage analytics, and CVE-aware publishing for RStudio, Jupyter, and VS Code environments
New Posit platform insights with Chronicle (positâ.co). Chronicle provides centralized metrics collection for Posit products like Connect and Workbench, with a Chronicle Metrics Cookbook, data ownership, and late-August releases for dynamic data science insights
Announcing Positron, a new Data Science IDE (positâ.co). Positron: a free, multi-language data science IDE from Posit PBC, integrating Python and R workflows, centralized management for RStudio/Jupyter/VS Code, with CRAN/PyPI/Bioconductor snapshots, Shiny hosting, Databricks integration, and Elastic License 2.0
Smarter Science, Better Tools: Discover Whatâs New in Posit for Fall â25 (positâ.co). Posit unveils Fall â25 with Posit AI Suite, Data Cloud Accelerators, enterprise governance tools, and partnerships with Snowflake and Databricks for R and Python workflows
Which AI model writes the best R code? (positâ.co). Which AI model writes the best R code? Posit explores AI integrations for data science tooling, Positron, RStudio, Jupyter, VS Code, CRAN, PyPI, Bioconductor, Shiny apps, and AI-driven code evaluation
đŠ R Package Development & Tools
Little useless-useful R functions â Absurd converter with Markdown report (tomaztsqlâ.wordpressâ.com). R, Markdown reports, absurd unit conversions, chaos_noise, ggplot2, gganimate, t.test, Markdown report generation, Confuser function, unit_converter_confuser, repository hint, Absurd_Converter.R, GitHub, TomazTsql
Creating and validating standardized R project structures that are psych-DS compliant-ish (mmmdataâ.io). Ian Hussey introduces psych-DS-ish, an R package for generating and validating standardized, though flexible, project skeletons aligned with psych-DS principles; includes skeleton structure, validation, and potential future alignment with psych-DS
Using Visual Studio Code to Debug R Packages with C++ Code (pachaâ.dev). Using VSCode to debug cpp11/Rcpp in a cpp11armadillo R package: install, configure, and test in VSCode with LLDB; set include paths, launch.json, Makevars, and testthat
đ Data Visualization & Reporting
Seven accessibility tips for Quarto and R Markdown users (remlapmotâ.githubâ.io). Seven accessibility tips for Quarto and R Markdown users: document formats, alt text, tagged PDFs, table accessibility, cross-checkers, custom Word templates, and uploading complex HTML to Blackboard Ultra
How To Rotate x-axis Text Labels in ggplot2 (datavizpyrâ.com). Rotate x-axis labels in ggplot2 using theme() and element_text() with angle and hjust, explore 45° and 90°, fix alignment, and try guide_axis() and scale_x_discrete() alternatives
The (non-)Ethics of Capitalism (freakonometricsâ.hypothesesâ.org). Gallup honesty survey by profession; salaries from BLS OEWS 2023; data wrangling in R: download.file, read.csv, plot percent_high vs avg_salary_usd with text labels; critique of link between ethics and pay
Kyoto (äșŹéœ), Japan, vs. MontrĂ©al, Canada (freakonometricsâ.hypothesesâ.org). Compares Kyoto and MontrĂ©al using Wikipedia data, visualizes temperature, humidity, daylight variations with R's XML package, tables, rectangles, and custom plotting in a data-driven analysis
đ Statistical Methods & Inference
Nonparametric serial interval estimation (statsandrâ.com). Nonparametric serial interval estimation with uniform mixtures using EpiDelays in R: estimSI, simSI, bootstrap CIs, simulated and real data (Lessler 2009), Gaussian target SI, negative values handled, Gressani & Hens 2025
A short statistical reasoning test (emiruzâ.com). Practical statistical reasoning tasks: Bayesian and likelihood-based p-intervals for binomial p, Poisson and multiplicity-based density for unexpected burglary counts, and buses via multinomial-like density with 95% HDI; R code snippets and simulations
Measures of Central Tendency for an Asymmetric Distribution, and Confidence Intervals (fharrellâ.com). Measures of central tendency for asymmetric distributions; comparisons of mean, median, pseudomedian; CI accuracy; BCa bootstrap; HD quantile estimator; Hodges-Lehmann; lognormal simulations; R functions pMedian, cimed; confidence interval debate
For Your Syllabus: Statistical Power (carlisleraineyâ.com). Five readings on statistical power for quantitative political science: Arel-Bundock et al. on underpowered studies; Bloom 1995 on minimum detectable effects; Power Rules; Lakens 2022 on sample size justification; Blair et al. 2019 MIDA framework for design diagnostics
Unnatural Selection (learningfromexamplesâ.com). Ronald Fisher, maximum likelihood, Fisher information, randomisation, linear discriminant analysis (LDA), WrightâFisher model, Bayesian notions, and the social implications of statistics across genetics, AI, and bias auditing
Positive and negative descriptions of numeric data (shape-of-codeâ.com). Examines how positive versus negative descriptions of numeric data shape reporting, exploring quantifier usage in exam results, Bayesian mixed-effects models, and alluvial plots of quantifier flows
Myth busting statistical methods (aliceinstatisticslandâ.wordpressâ.com). PhD supervisors across ANU; myths in statistics: unchanged methods since 1980s, p-value supremacy, embracing ambiguity; bootstrapping, hierarchies, imputation, ML discussed; workshop inspiration and Statistical Support Network
đŹ Applied Data Analysis
Men's domestic chores and fertility rates - Part I (freerangestatsâ.info). Time-use data, SDG 5.4.1, UN SDDS, TFR, GDP per capita, gender inequality index, mgcv::gam in R, prop_male, marginaleffects package, deviance analysis, and country-level artefacts in fertility modelling
Making Smarter Business Decisions with Propensity Score Analysis (statisticalhorizonsâ.com). Explores propensity score analysis to estimate causal effects in business, using R, tidyverse, MatchIt, and the lalonde dataset to address confounding in observational studies
MLOrbs?: MLOps in the database with orbital and dbt (emilyriedererâ.com). MLOps in the analytical database using orbital's sklearn-to-sql and tidymodels, sqlglot, and dbt; churn modeling with IBM telecom data; zero-infrastructure deployment inside dbt pipelines
What Launch Angle Revolution? The Washington Nationals in 2025 (conormclaughlinâ.net). Ground-ball rate spikes as Nationals eclipse air metrics with sub-10° launch angles; FanGraphs data, baseballr in R, GB% calculation, LA vs GB scatter plot highlight coaching impact
đ§Ź Bioinformatics & Computational Biology
Learning Antimicrobial Resistance (AMR) genes with Bioconductor (kenkoonwongâ.com). AMR gene exploration with Bioconductor on 3,280 E. coli genomes; ESBL detection (CTX-M-15), gene nomenclature, sequence analysis; Biostrings readDNAStringSet; plasmids vs chromosome; class A beta-lactamase sequences; reproducible R Markdown
Reproducible Computing in Bioinformatics: Lessons from My Latest Talk (divingintogeneticsandgenomicsâ.com). Reproducible computing in bioinformatics: versioned data, Git, conda/renv, Docker, Jupyter/Quarto, Snakemake/Nextflow, targets, marimo, Rocker, BioContainers, and practical tips from AstraZeneca bioinformatics leadership
Inferring habitat clusters with eBird data (ctompkinsâ.netlifyâ.app). Using eBird ST pro patterns, habitat clustering employs unsupervised learning with R packages like tidytree, factoextra, and clValid to analyze bird abundance data in Pennsylvania
đ Academic Research
BKP: An R Package for Beta Kernel Process Modeling (arxiv:stat). BKP: an R package implementing Beta Kernel Process for nonparametric modeling of spatially varying binomial probabilities with conjugate beta priors, kernel options, aggregated responses, and DKP extension
Modelling Skewed and Heavy-Tailed Errors in Bayesian Mediation Analysis (arxiv:stat). Introduces Centred Two-Piece Student t Distribution (CTPT) for Bayesian mediation with skewed, heavy-tailed errors; evaluates via simulations; provides R package FlexBayesMed
đ Before you go
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching a Patreon page!. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
- Real say in how Blaze evolves â vote on new topics, features, topic curation ideas
- First dibs on merch (details still cooking)
- That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worriesâthe newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!