Data Scientist (with R)

Tuesday 15th April, 2025

Subscribe to this newsletter!

Newsletters sent once a week, unsubscribe anytime.

📰 R News & Releases

R Version 4.5.0 is Out! (njtierney.com, 2025-04-14). R Version 4.5.0 introduces significant updates including new functions, faster package installation via libcurl, and the addition of new datasets like 'penguins' for enhanced usability

R 4.5: importing only selected objects & functions from R packages (tomsing1.github.io, 2025-04-12). R 4.5 introduces the use() function for selectively importing objects and functions, allowing users to avoid function name clashes, but caution is advised due to its limitations in R sessions

Choroplethr v4.0.0 is now on CRAN (arilamstein.com, 2025-04-14). Choroplethr v4.0.0 is released on CRAN with new maintainer Zhaochen He, addressing previous issues and simplifying functions for state and county demographics, while highlighting the importance of updated maps and features

What's new in R 4.5.0? (jumpingrivers.com, 2025-04-10). R 4.5.0 introduces key updates including the Palmer Penguins dataset, new 'use()' function for selective imports, and parallel downloading of packages to improve efficiency

naijR 0.6.2 (victorordu.wordpress.com, 2025-04-10). naijR 0.6.2 released on CRAN, improving lgas output by sorting Local Government Areas for easier use. Installation instructions for both CRAN and development versions are provided

Introducing chores (posit.co, 2025-04-08). Posit introduces chores, enhancing dynamic data insights management for RStudio, Jupyter, and VS Code. Centralized management, secure package repositories, and easy sharing of Python and R projects are key features

tempodisco: an R package for temporal discounting (joss.theoj.org, 2025-04-08). tempodisco is an R package designed for analyzing temporal discounting, a key concept in psychology and economics, focusing on decision-making related to delay discounting and intertemporal choice

rjaf: Regularized Joint Assignment Forest with Treatment Arm Clustering (joss.theoj.org, 2025-04-09). rjaf utilizes machine learning for causal inference in multi-arm randomized controlled trials, focusing on heterogeneous treatment effects, personalized treatment rules, and optimal assignment with treatment arm clustering

🎉 Community & Events

The PRISM project’s progress report: 8 months into the MSCA-PF journey (jakubnowosad.com, 2025-04-13). The PRISM project explores spatial patterns in machine learning through tools like R packages simsam and spatialexplain, while engaging in workshops, summer schools, and conferences to enhance collaboration and learning in the field

R Dev Days 2025 (forwards.github.io, 2025-04-10). Join R Dev Days 2025 to collaborate on R programming contributions, with events scheduled in Belgium, the USA, and the UK. Registration deadlines approach for tasks involving coding, documentation, and translation

Data Science Weekly - Issue 594 (datascienceweekly.substack.com, 2025-04-10). Curated insights on data science trends, including advancements in graph learning, the evolution of AI through datasets, and tools like Shiny for application deployment across multiple environments

A Quick Guide to Getting the Most Out of ShinyConf 2025 (appsilon.com, 2025-04-08). Prepare for ShinyConf 2025 with expert-led workshops, keynotes on AI and clinical trials, virtual networking opportunities, and access to recorded sessions featuring Shiny apps and advanced programming techniques

R Weekly 2025-W16 R 4.5.0, plumber2, chores (rweekly.org, 2025-04-14). R Weekly 2025-W16 covers the release of R 4.5.0, introduces the chores package, showcases new tools like Positron IDE, and features contributions, new and updated packages, podcasts, and R community events

posit::glimpse() Newsletter – April 2025 (posit.co, 2025-04-11). The posit::glimpse() Newsletter highlights new tools for R and Python, including AIR code formatter, Tidymodels updates, Plotnine v0.15.0, data quality validation with Pointblank, and resources for AI data science integration

Shiny in Production 2024 Videos (jumpingrivers.com, 2025-04-08). Explore Shiny in Production 2024 videos featuring six in-depth talks and additional sessions, including an overview of R package validation with Litmus. Early-bird tickets are available for the upcoming 2025 conference

💻 R Programming Tips

Create and edit TOML in R with {tomledit} (josiahparry.com, 2025-04-10). The tomledit package enables users to create and edit TOML files in R, offering support for arrays and inline tables, enhancing flexibility for package management and project configuration

How to Find First Non-NA Value in data.table (marsja.se, 2025-04-10). Learn to find the first non-NA value in data.table for grouped and ungrouped data using R, specifically filtering with '!is.na()' and selecting results with '.SD[1]'

Use use() in R (erikgahner.dk, 2025-04-14). The use() function in R simplifies package loading by allowing users to specify only the functions they need, reducing name conflicts and enhancing code clarity for better long-term reproducibility

Un meilleur historique Git, sans difficulté (ropensci.org, 2025-04-11). Git practices enhance coding and writing with R, emphasizing atomic commits and informative messages. Learn how to effectively manage Git history with the R package saperlipopette in this online session

funkyheatmap: Visualising data frames with mixed data types (joss.theoj.org, 2025-04-11). funkyheatmap provides tools for visualizing data frames containing mixed data types, enhancing data interpretation through effective visualization techniques. The project aims to support benchmarking visualizations in mixed data analytics

🔧 Applied R Workflows

Generate Route Titles and Descriptions from GPX Files with LLMs and {ellmer} (martinctc.github.io, 2025-04-11). Utilize R and the ellmer package to generate titles and descriptions from GPX files, optimizing cycling route management with AI-driven insights from LLMs, improving efficiency for cyclists and programmers alike

Alue- ja kuntavaalien 2025 tulokset ja geofi-paketin geofacet-datat (markuskainu.fi, 2025-04-13). The 2025 municipal and regional election results are analyzed using R's geofi package, geofacets, and ggplot2, focusing on party-specific vote shares across regions to overcome graphical representation challenges

R with RAGS: An Introduction to rchroma and ChromaDB (cynkra.com, 2025-04-10). Learn how to enhance R workflows with the rchroma package and ChromaDB, enabling real-time document retrieval for language models through vector-based searches and seamless integration with Docker

Integration testing in Epiverse-TRACE (epiverse-trace.github.io, 2025-04-14). Integration testing in Epiverse-TRACE ensures interoperability of R packages like simulist and cleanepi through principles of consistency, composability, and modularity, utilizing the testthat framework

Down a Rabbit Hole with ARIMA Models (rworks.dev, 2025-04-11). Exploration of ARIMA models reveals distinct outcomes using forecast and fable packages, highlighting the identifiability problem in time series forecasting with electricity usage data

New Soil "Near Infrared Spectroscopy" Training Material (nir-quimiometria.blogspot.com, 2025-04-13). A new training material for soil analysis using Near Infrared Spectroscopy is available, provided by FAO, to help beginners understand spectroscopy concepts and analytics using R software

📚 R Books & Resources

Roderick Little’s new book: Seminal Ideas and Controversies in Statistics (errorstatistics.com, 2025-04-14). Roderick Little's 'Seminal Ideas and Controversies in Statistics' explores key statistical topics, including philosophical approaches, statistical methodology, and randomization designs, aimed at deepening understanding among doctoral students and statisticians

Introduktion til R #4 (erikgahner.dk, 2025-04-13). Erik Gahner Larsen discusses the upcoming third edition of 'Introduktion til R,' emphasizing improvements, feedback incorporation, and the use of tidyverse for a robust introduction to R programming

Big Book of R (bigbookofr.com, 2025-04-10). The Big Book of R, curated by Oscar Baruffa, offers over 400 free and affordable R-related programming books, covering topics like data science, machine learning, and statistics, supported by Fathom Data and built with Quarto

📊 Academic Research & Methods

AI-generated code comes with security risks (f.briatte.org, 2025-04-14). AI-generated code, especially in R, poses significant security threats due to untrusted packages and malicious instructions, raising urgent concerns for students and developers using such technologies

Foothills of Bayesian Analysis (hunsley.io, 2025-04-13). The piece explores the basics of Bayesian analysis, focusing on Bayes' theorem, conditional probabilities, diagrams like the Bayes hourglass, and visual tools like frequency strips for better understanding complex data correlations

eDNAjoint: a Modeling Tool for Environmental DNA Data (ropensci.org, 2025-04-10). eDNAjoint is an R package that employs Bayesian modeling to integrate environmental DNA with traditional surveys, offering insights into species detection probabilities and false positive rates

Extending the Theta forecasting method to GLMs, GAMs, GLMBOOST and attention: benchmarking on Tourism, M1, M3 and M4 competition data sets (28000 series) (thierrymoudiki.github.io, 2025-04-14). The expanded Theta forecasting method incorporates GLMs, GAMs, GLMBOOST, and attention mechanisms, benchmarking performance on Tourism, M1, M3, and M4 datasets using advanced statistical packages and techniques

Propensity Scores, R Packages, and Practical Advice with Noah Greifer | Season 6 Episode 3 (casualinfer.libsyn.com, 2025-04-10). Noah Greifer discusses propensity scores and R packages including WeightIt and MatchIt, offering practical advice for statistical consulting in causal inference on the Casual Inference podcast

Are You Sure Your Posterior Makes Sense? (towardsdatascience.com, 2025-04-11). MCMC samplers require careful evaluation. Techniques like R-hat and Effective Sample Size (ESS) help diagnose convergence and sampler performance, ensuring reliable Bayesian parameter estimation through robust diagnostics

Communicating complex statistical models to a public health audience: translating science into action with the FARSI approach (arxiv:stat, 2025-04-09). The FARSI approach enhances public health communication of statistical models through an R Shiny web application, enabling stakeholders to explore chronic disease prevalence and facilitating evidence-based decision-making

Don't miss next week's newsletter!

Newsletters sent once a week, unsubscribe anytime.