Data Scientist (with R): 22nd April 2025
🚀 R Releases & Announcements
cleanepi v1.1.0 (epiverse-trace.github.io, 2025-04-21). cleanepi v1.1.0 introduces bug fixes and performance improvements, enhancing functions like print_report(), standardize_dates(), and remove_constants(), while adopting tidyverse pipes and adding multi-language support
R 4.5.0 and Bioconductor 3.21 (blog.stephenturner.us, 2025-04-17). R 4.5.0 and Bioconductor 3.21 introduce faster package installations, built-in Palmer penguins data, new functions like use() for function importing, and a range of new bioinformatics tools across various omics fields
R Weekly 2025-W16 R 4.5.0, recipes, ShinyConf (rweekly.org, 2025-04-21). R 4.5.0 released, featuring the recipes package for data preprocessing, highlights from ShinyConf 2025, and security risks of AI-generated code. New packages and updates in R are also discussed
Version 1.2 of package gDefrag (geekcologist.wordpress.com, 2025-04-17). Version 1.2 of the gDefrag R package has been released, addressing a bug in the edge.creation function. Users are encouraged to update to the latest version available on GitHub
📢 Community & Events
Johnson & Johnson x Posit Live Event March 2025 Q&A (posit.co, 2025-04-17). Johnson & Johnson shares insights on their adoption of open-source tools like R, Posit Workbench, and containerized environments for clinical trials, focusing on infrastructure, compliance, training, and collaboration with statistical programming teams
Announcing the Jumping Rivers Dashboard Gallery (jumpingrivers.com, 2025-04-15). Jumping Rivers announces a new dashboard gallery showcasing applications developed using Shiny, Dash, Streamlit, and Observable, highlighting their expertise in data visualisation and custom data-widgets
Clinica de Aplicación para el Programa de Campeon(e|a)s de rOpenSci (ropensci.org, 2025-04-15). Join the rOpenSci Clinica de Aplicación on April 15, 2025, with Yanina Bellini Saibene to work on applications for the Champion(e)s program. Participate anytime during the 2-hour session online
🛠️ R Tutorials & Tips
Running R on Windows on ARM on GitHub Actions (remlapmot.github.io, 2025-04-16). Learn how to set up and run the AARCH64 version of R on Windows on ARM using GitHub Actions, including installation of R 4.5.0 and RTools45 for workflows
FAQ débutants R (delladata.fr, 2025-04-21). Discover key responses to common R programming questions including code execution in RStudio, using assignment operators, importing files, handling errors, and managing R packages for beginners
Exploring RSQLite
With DBI
: A Note To Myself (kenkoonwong.com, 2025-04-18). Learn how to use RSQLite with DBI to manage databases in R, handling data storage, queries, and table manipulation without complex server setups
How to Use data.table to Fill NA with the Previous Value in R (marsja.se, 2025-04-15). Learn how to efficiently use the data.table package in R to fill NA values with the previous value using last observation carried forward (LOCF) and compare its performance against dplyr
The birthday problem in R (erikgahner.dk, 2025-04-21). Explore the birthday problem using R functions qbirthday() and pbirthday() to calculate the required number of people for shared birthdays and probabilities of coincidences in a class setting
Using Service Account identities to query external data sources in published content on Posit Connect (posit.co, 2025-04-15). Utilize Service Account identities for querying external data sources in Posit Connect, enhancing data transformation and analysis efficiency in RStudio, Jupyter, and VS Code environments
📊 Applied Analysis & Visualization
All models are wrong, but some are useless (datamares.netlify.app, 2025-04-19). Using Pearson correlation for time series analysis may lead to erroneous interpretations. Instead, techniques like cross-correlogram analysis are recommended. Significance testing is crucial for valid results in statistical modeling
Development & Analysis Of A UK Storm Indicator (part 10) (jdeeclimate.substack.com, 2025-04-18). A GLM analysis of wind speed data from 26 Irish weather stations reveals strong correlations with UK named storm counts, using independent variables such as 95 percentile windy day count and mean maximum gust speed
Learning data viz from the best: the Financial Times (danielroelfs.com, 2025-04-16). Daniel Roelfs explores data visualization techniques used by the Financial Times, focusing on John Burn-Murdoch's work during the COVID-19 pandemic and the use of R, ggplot2, and data smoothing methods
Bayesian Superiority Estimation with R2D2 Priors: A Practical Guide for Protein Screening (dspn.substack.com, 2025-04-16). Explore Bayesian techniques like R2D2 priors and superiority calculations to improve protein screening analysis, offering intuitive measures over traditional statistical methods like p-values
K-Means Clustering Analysis of Apple, Microsoft, and Nvidia (datageeek.com, 2025-04-15). An analysis using K-Means clustering reveals Apple's distinct performance relative to Microsoft and Nvidia, especially after April 2 tariffs, leveraging R packages like 'tidyverse' and 'timetk' for data visualization
🎓 Academic & Scholarly R
Mathematical Genealogy (dustysturner.com, 2025-04-17). Dusty Turner explores his academic lineage via the Mathematics Genealogy Project, employing R libraries like tidyverse and igraph, and using ChatGPT to help create a genealogy visualization script
Projecting a continuous character onto the branches of multiple trees using a consistent color gradient with contMap (blog.phytools.org, 2025-04-15). Utilizing the phytools package, the contMap function projects continuous traits onto multiple phylogenetic trees with a consistent color gradient for better visual representation of species characteristics like body mass
eva3dm: A R-package for model evaluation of 3D weather and air quality models (joss.theoj.org, 2025-04-17). eva3dm is an R-package designed for evaluating 3D weather and air quality models, specifically WRF and WRF-Chem, facilitating improved model assessment and comparisons
Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology (arxiv:stat, 2025-04-15). A scalable Bayesian inference framework for hierarchical mixed-effects models is proposed, using amortized approximations and mixtures of experts, effectively handling stochastic differential equations in systems biology with competitive accuracy and speed
Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations (arxiv:stat, 2025-04-15). GPCCA integrates multi-modal data using a probabilistic approach while managing missing values, enhancing clustering and dimensionality reduction across cancer genomics and multi-view images, supported by an accessible R package
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!