Data Scientist (with R): 17th June 2025
Published 17th June 2025
๐ Community & Events
Air 0.7.0 (tidyverse.org, 2025-06-11). Air 0.7.0 enhances R code formatting with autobracing, improved Positron support, and a GitHub Action for integration, streamlining installation and project configuration with tools like usethis for managing settings
In-Person and Virtual Events for 2025 (rinpharma.com, 2025-06-14). R/Pharma announces 2025 events including the genAI Day for generative AI in pharma, an in-person Summit on open-source tools, and a Virtual Conference with workshops and talks scheduled across various dates
camtrapdp 0.4.0 (oscibio.inbo.be, 2025-06-11). The R package camtrapdp version 0.4.0 enhances Camera Trap Data Package support, introducing new functions for data manipulation and improved Darwin Core processing for better GBIF compatibility
Volunteer Responsibility Amnesty Day (llrs.dev, 2025-06-11). Lluรญs Revilla Sancho reflects on his involvement in Bioconductor amid the Volunteer Responsibility Amnesty Day initiative, aiming to enhance developer engagement and productive participation in software quality through responsible volunteering
Celebrating 18 Years of ggplot2: A Special Bundle Offer (pacha.dev, 2025-06-15). Celebrate 18 years of ggplot2 with a special bundle offer: two comprehensive books on data visualization in R and Python for just $19.99, including practical exercises and insights from the creator of ggplot2
๐ฆ {alone} v0.6 is now available (gradientdescending.com, 2025-06-14). Alone v0.6 is released, featuring data from Australia season 3 of the show, which saw record-breaking eels and fish caught. Users can access the data via the R package for analysis
R-Adelaide 2025 (combine.org.au, 2025-06-13). RAdelaide 2025 is a 3-day in-person R training course for beginners to intermediates, focusing on biomedical research, held at the University of Adelaide from July 8-10, taught by Dr. Stevie Pederson
R Weekly 2025-W25 Air, Testing Shiny, Databricks (rweekly.org, 2025-06-16). This week highlights include Air 0.7.0, deploying models in Databricks using Orbital, setting up VScode for R, and upcoming events in the R community including R-User and applied stats workshops
๐ป Tutorials & Development
Building Your Own Mini-ChatGPT with R: From Markov Chains to Transformers! (blog.ephorie.de, 2025-06-16). Learn to create a mini-ChatGPT in R using Markov chains and transformers, implementing concepts like word embeddings, self-attention, and next word prediction through detailed coding steps and model training techniques
Using app(), lapp(), and tapp() with SpatRaster in R (pmassicotte.com, 2025-06-12). Learn to use app(), lapp(), and tapp() functions in R's terra package for manipulating SpatRaster objects and performing operations like calculating means across raster layers
Integrating Legends into Titles (policyviz.com, 2025-06-13). Learn how to integrate color into chart titles for clarity and accessibility, using color-coded categories and visual cues to enhance data visualization without sacrificing readability
From lab to real life: How your Shiny application can survive its users (rtask.thinkr.fr, 2025-06-12). Implement a three-level testing strategy for Shiny applications using tools like golem, unit tests, and integration tests to ensure stability and functionality through modifications
๐ Statistical Methods & Analysis
ISS Shareholder Proposals (tidy-finance.org, 2025-06-13). The blog discusses methods for analyzing ISS shareholder proposal voting data, detailing R packages like tidyverse and tidyfinance, and providing insights on proposal submissions and classification
More on power and 'fragile' p-values (freerangestats.info, 2025-06-14). Comprehensive simulations reveal how 'fragile' p-values fluctuate with power differences, using R's parallel computing for 1.32 million simulations assessing p-values between 0.01 and 0.05 based on sample size and actual difference
Impact of COVID Lockdowns on Barcelona Air Quality (jmsallan.netlify.app, 2025-06-15). COVID lockdowns in Spain significantly impacted air quality in Barcelona, analyzed using tidyverse visualizations comparing PM10 and NO2 pollutants from 2020 to 2024
From Math to Code: Building GAM with Penalty Functions From Scratch (kenkoonwong.com, 2025-06-11). Explore penalized Generalized Additive Models (GAM) through matrix calculus, GCV optimization, and customized GAM function implementation with an emphasis on penalty matrices and B-spline basis functions
Bayesian Emax regression using brms (blog.djnavarro.net, 2025-06-13). Explore Bayesian Emax regression models for exposure-response analysis using the brms R package, including data simulation, implementation of continuous and binary response variables, and examples of plotting results
Stepwise Regression Explained with Example and Application (statisticalaid.com, 2025-06-13). Stepwise regression is a method for automatic variable selection in regression models, utilizing techniques like forward selection and backward elimination while addressing issues like overfitting and biased estimates
Rย : Graphical Multiple Testing Procedure (blog.devgenius.io, 2025-06-10). Graphical Multiple Testing Procedure (gMCP) in R aids in controlling Family-Wise Error Rate (FWER) for multiple primary endpoints in clinical trials using Bonferroni-based approaches and intuitive alpha recycling
๐ Academic Papers
<code>ggret</code>: An R package for visualising and manipulating treeโbased phylogenetic networks (joss.theoj.org, 2025-06-12). ggret is an R package designed for visualising and manipulating tree-based phylogenetic networks, supporting concepts like ancestral recombination graphs and reticulated evolution for enhanced data plotting and analysis
Yau-YauAL: A computer tool for solving nonlinear filtering problems (arxiv:stat, 2025-06-10). Yau-YauAL is an R-based software package for nonlinear filtering, utilizing finite difference methods to solve the Kolmogorov equation, featuring a Shiny interface for parameter tuning and visualization, aimed at diverse scientific applications
Lower-dimensional posterior density and cluster summaries for overparameterized Bayesian models (arxiv:stat, 2025-06-11). A novel method integrates flexible Bayesian models with lower-dimensional parametric summaries, improving interpretability while preserving fit. It involves fitting models, projecting distributions, and quantifying uncertainty on density and cluster summaries
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!