📊

Data Scientist (with R): 6th May 2025

Newsletters sent once a week, unsubscribe anytime.

Published 6th May 2025

📰 Community News & Roundups

30 Day Chart Challenge 2025 (nrennie.rbind.io, 2025-05-01). Nicola Rennie reflects on her experience in the 30 Day Chart Challenge 2025, utilizing tools like R, Python, Observable, and D3 to visualize data on income inequality and more over 30 daily prompts

I’m Speaking at a Professional Conference! (for 5 minutes) (NextChapter.machlis.com, 2025-05-04). Sharon Machlis reflects on her acceptance to give a 5-minute lightning talk at posit::conf(2025), discussing her retirement journey, returns to the R programming community, and the value of personal growth over external validation

R Weekly 2025-W19, Top 40 New CRAN Packages, recipes, 30 Day Chart Challenge, Rotation with Modulo (rweekly.org, 2025-05-06). R Weekly 2025-W19 covers the latest top 40 CRAN packages, including tools for protein design and data visualization. Highlights also include recipes, 30 Day Chart Challenge, and insights from R in academia and organizations

🛠️ R Packages & Tool Announcements

March 2025 Top 40 New CRAN Packages (rworks.dev, 2025-04-30). Top 40 new CRAN packages from March 2025 feature tools in various fields, including climate modeling, epidemiology, machine learning, and time series analysis, with methodologies like Bayesian structures and particle swarm optimization

New R/exams Version: exams2forms, Written NOPS Exams, and More (R-exams.org, 2025-05-05). New versions of R/exams and exams2forms have enhanced interactive web exercises, multiple-choice exams processing, and added features like obfuscation, interactive quality control, and extended language support for testing and educational resources

Building local packages for WebR (josiahparry.com, 2025-04-30). WebR enables R programming in WebAssembly, with extendr support. A Docker-based method simplifies local package building for WebR using the rwasm::build() function

📊 Data Visualization & Case Studies

Quick and dirty analysis of 10mila 2025 results (markuskainu.fi, 2025-05-04). Markus Kainu analyzes the 2025 10mila results using R, visualizing data from the NTNUI win in the men's overnight relay and presenting time differences among the top teams through graphs and data sheets

Visualizing daily global temperature - part 2 (theclimatebrink.com, 2025-04-30). The article explores innovative visualizations of global temperatures using ERA5 reanalysis data, presenting continuous spiral graphs and 3D helixes to depict temperature anomalies and trends, emphasizing the influence of seasonal cycles

From Data Chaos to Statistical Clarity: A Laboratory Transformation Story (dspn.substack.com, 2025-04-30). Dr. Chen's team transformed their chaotic data analysis into a streamlined process, using techniques like R2D2 prior and robust statistical methods to tackle issues such as position effects and outliers

⚡ High-Performance Data Wrangling

Samesum normalized log2 transformation of counts (logarithmic.net, 2025-05-03). Transform counts using a samesum normalized log2 technique with tailored scale per sample, utilizing Newton's method for efficient numerical optimization, implemented in the varistran package

Cross checking OSM IDs between OSM and Wikidata (r.iresmi.net, 2025-05-03). The text details a workflow for cross-checking OSM IDs with Wikidata entities using R libraries like osmdata, WikidataR, and dplyr to enhance geographical data accuracy

Fast Grouped Counts and Means in R (lorentzen.ch, 2025-04-30). Explores efficient methods for grouped counts and means in R using tools like base R functions, dplyr, data.table, duckdb, collapse, and polars, analyzing their performance with a benchmark for large datasets

Replace NA in data.table: Replacing with 0 and Other Values (marsja.se, 2025-05-05). This tutorial demonstrates how to efficiently replace NA values in data.table in R, using methods such as substituting with 0 or the mean of non-missing values to clean and prepare datasets

🧮 Statistical Modeling & Inference

Explained vs. Predictive Power: R², Adjusted R², and Beyond (mfatihtuzen.netlify.app, 2025-04-29). Explore R², Adjusted R², and Predicted R² in statistical modeling. Understand their meanings, implications, and the importance of evaluating models' explanatory and predictive power using the tidymodels framework with the Boston Housing Dataset

Working with Ordinal Ranks in {marginaleffects} (blog.msbstats.info, 2025-04-30). Ordinal ranks can be summarized using the marginaleffects package and the mean.class mode from emmeans. This allows for average class ranks and effect assessments in ordinal regression models

Simulating A Simple Response Adaptive Randomization - I Have To See It To Believe It (kenkoonwong.com, 2025-05-04). Response Adaptive Randomization (RAR) simulates patient allocation in clinical trials, comparing it with fixed 50-50 strategies, addressing ethical concerns, statistical validity, and utilizing Bayesian methods and the Thall and Wathen formula

A new semi-threshold trait evolution model for phytools (blog.phytools.org, 2025-04-29). A new semi-threshold trait evolution model in phytools uses discretized diffusion approximation to analyze non-Gaussian evolutionary models such as threshold traits and bounded Brownian motion through visual simulations

🎓 Academic & Research Papers

fastfrechet: An R package for fast implementation of Fréchet regression with distributional responses (joss.theoj.org, 2025-05-01). fastfrechet, an R package, enables fast Fréchet regression with distributional responses, utilizing concepts like variable selection and the Wasserstein metric space

sae4health: An R Shiny Application for Small Area Estimation in Low- and Middle-Income Countries (arxiv:stat, 2025-05-02). sae4health is an R Shiny app for small area estimation (SAE) of health indicators in low- and middle-income countries, utilizing Bayesian inference via INLA to support over 150 indicators from Demographic and Health Surveys

Joint Modelling of Line and Point Data on Metric Graphs (arxiv:stat, 2025-05-02). Metric graphs support joint spatial modeling of line-referenced and point-referenced data, utilizing Gaussian Random Fields and R packages inlabru and MetricGraph to enhance traffic state predictions in Trondheim, Norway

You may also like

About Data Scientist (with R)

Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.

Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.

Subscribe now to join thousands of professionals who receive our weekly updates!