Data Scientist (with R)
Tuesday 15th April, 2025
Subscribe to this newsletter!
📰 R News & Releases
R Version 4.5.0 is Out! (njtierney.com, 2025-04-14). R Version 4.5.0 introduces significant updates including new functions, faster package installation via libcurl, and the addition of new datasets like 'penguins' for enhanced usability
R 4.5: importing only selected objects & functions from R packages (tomsing1.github.io, 2025-04-12). R 4.5 introduces the use() function for selectively importing objects and functions, allowing users to avoid function name clashes, but caution is advised due to its limitations in R sessions
Choroplethr v4.0.0 is now on CRAN (arilamstein.com, 2025-04-14). Choroplethr v4.0.0 is released on CRAN with new maintainer Zhaochen He, addressing previous issues and simplifying functions for state and county demographics, while highlighting the importance of updated maps and features
What's new in R 4.5.0? (jumpingrivers.com, 2025-04-10). R 4.5.0 introduces key updates including the Palmer Penguins dataset, new 'use()' function for selective imports, and parallel downloading of packages to improve efficiency
naijR 0.6.2 (victorordu.wordpress.com, 2025-04-10). naijR 0.6.2 released on CRAN, improving lgas output by sorting Local Government Areas for easier use. Installation instructions for both CRAN and development versions are provided
Introducing chores (posit.co, 2025-04-08). Posit introduces chores, enhancing dynamic data insights management for RStudio, Jupyter, and VS Code. Centralized management, secure package repositories, and easy sharing of Python and R projects are key features
tempodisco: an R package for temporal discounting (joss.theoj.org, 2025-04-08). tempodisco is an R package designed for analyzing temporal discounting, a key concept in psychology and economics, focusing on decision-making related to delay discounting and intertemporal choice
rjaf: Regularized Joint Assignment Forest with Treatment Arm Clustering (joss.theoj.org, 2025-04-09). rjaf utilizes machine learning for causal inference in multi-arm randomized controlled trials, focusing on heterogeneous treatment effects, personalized treatment rules, and optimal assignment with treatment arm clustering
🎉 Community & Events
The PRISM project’s progress report: 8 months into the MSCA-PF journey (jakubnowosad.com, 2025-04-13). The PRISM project explores spatial patterns in machine learning through tools like R packages simsam and spatialexplain, while engaging in workshops, summer schools, and conferences to enhance collaboration and learning in the field
R Dev Days 2025 (forwards.github.io, 2025-04-10). Join R Dev Days 2025 to collaborate on R programming contributions, with events scheduled in Belgium, the USA, and the UK. Registration deadlines approach for tasks involving coding, documentation, and translation
Data Science Weekly - Issue 594 (datascienceweekly.substack.com, 2025-04-10). Curated insights on data science trends, including advancements in graph learning, the evolution of AI through datasets, and tools like Shiny for application deployment across multiple environments
A Quick Guide to Getting the Most Out of ShinyConf 2025 (appsilon.com, 2025-04-08). Prepare for ShinyConf 2025 with expert-led workshops, keynotes on AI and clinical trials, virtual networking opportunities, and access to recorded sessions featuring Shiny apps and advanced programming techniques
R Weekly 2025-W16 R 4.5.0, plumber2, chores (rweekly.org, 2025-04-14). R Weekly 2025-W16 covers the release of R 4.5.0, introduces the chores package, showcases new tools like Positron IDE, and features contributions, new and updated packages, podcasts, and R community events
posit::glimpse() Newsletter – April 2025 (posit.co, 2025-04-11). The posit::glimpse() Newsletter highlights new tools for R and Python, including AIR code formatter, Tidymodels updates, Plotnine v0.15.0, data quality validation with Pointblank, and resources for AI data science integration
Shiny in Production 2024 Videos (jumpingrivers.com, 2025-04-08). Explore Shiny in Production 2024 videos featuring six in-depth talks and additional sessions, including an overview of R package validation with Litmus. Early-bird tickets are available for the upcoming 2025 conference
💻 R Programming Tips
Create and edit TOML in R with {tomledit} (josiahparry.com, 2025-04-10). The tomledit package enables users to create and edit TOML files in R, offering support for arrays and inline tables, enhancing flexibility for package management and project configuration
How to Find First Non-NA Value in data.table (marsja.se, 2025-04-10). Learn to find the first non-NA value in data.table for grouped and ungrouped data using R, specifically filtering with '!is.na()' and selecting results with '.SD[1]'
Use use() in R (erikgahner.dk, 2025-04-14). The use() function in R simplifies package loading by allowing users to specify only the functions they need, reducing name conflicts and enhancing code clarity for better long-term reproducibility
Un meilleur historique Git, sans difficulté (ropensci.org, 2025-04-11). Git practices enhance coding and writing with R, emphasizing atomic commits and informative messages. Learn how to effectively manage Git history with the R package saperlipopette in this online session
funkyheatmap: Visualising data frames with mixed data types (joss.theoj.org, 2025-04-11). funkyheatmap provides tools for visualizing data frames containing mixed data types, enhancing data interpretation through effective visualization techniques. The project aims to support benchmarking visualizations in mixed data analytics
🔧 Applied R Workflows
Generate Route Titles and Descriptions from GPX Files with LLMs and {ellmer} (martinctc.github.io, 2025-04-11). Utilize R and the ellmer package to generate titles and descriptions from GPX files, optimizing cycling route management with AI-driven insights from LLMs, improving efficiency for cyclists and programmers alike
Alue- ja kuntavaalien 2025 tulokset ja geofi-paketin geofacet-datat (markuskainu.fi, 2025-04-13). The 2025 municipal and regional election results are analyzed using R's geofi package, geofacets, and ggplot2, focusing on party-specific vote shares across regions to overcome graphical representation challenges
R with RAGS: An Introduction to rchroma and ChromaDB (cynkra.com, 2025-04-10). Learn how to enhance R workflows with the rchroma package and ChromaDB, enabling real-time document retrieval for language models through vector-based searches and seamless integration with Docker
Integration testing in Epiverse-TRACE (epiverse-trace.github.io, 2025-04-14). Integration testing in Epiverse-TRACE ensures interoperability of R packages like simulist and cleanepi through principles of consistency, composability, and modularity, utilizing the testthat framework
Down a Rabbit Hole with ARIMA Models (rworks.dev, 2025-04-11). Exploration of ARIMA models reveals distinct outcomes using forecast and fable packages, highlighting the identifiability problem in time series forecasting with electricity usage data
New Soil "Near Infrared Spectroscopy" Training Material (nir-quimiometria.blogspot.com, 2025-04-13). A new training material for soil analysis using Near Infrared Spectroscopy is available, provided by FAO, to help beginners understand spectroscopy concepts and analytics using R software
📚 R Books & Resources
Roderick Little’s new book: Seminal Ideas and Controversies in Statistics (errorstatistics.com, 2025-04-14). Roderick Little's 'Seminal Ideas and Controversies in Statistics' explores key statistical topics, including philosophical approaches, statistical methodology, and randomization designs, aimed at deepening understanding among doctoral students and statisticians
Introduktion til R #4 (erikgahner.dk, 2025-04-13). Erik Gahner Larsen discusses the upcoming third edition of 'Introduktion til R,' emphasizing improvements, feedback incorporation, and the use of tidyverse for a robust introduction to R programming
Big Book of R (bigbookofr.com, 2025-04-10). The Big Book of R, curated by Oscar Baruffa, offers over 400 free and affordable R-related programming books, covering topics like data science, machine learning, and statistics, supported by Fathom Data and built with Quarto
📊 Academic Research & Methods
AI-generated code comes with security risks (f.briatte.org, 2025-04-14). AI-generated code, especially in R, poses significant security threats due to untrusted packages and malicious instructions, raising urgent concerns for students and developers using such technologies
Foothills of Bayesian Analysis (hunsley.io, 2025-04-13). The piece explores the basics of Bayesian analysis, focusing on Bayes' theorem, conditional probabilities, diagrams like the Bayes hourglass, and visual tools like frequency strips for better understanding complex data correlations
eDNAjoint: a Modeling Tool for Environmental DNA Data (ropensci.org, 2025-04-10). eDNAjoint is an R package that employs Bayesian modeling to integrate environmental DNA with traditional surveys, offering insights into species detection probabilities and false positive rates
Extending the Theta forecasting method to GLMs, GAMs, GLMBOOST and attention: benchmarking on Tourism, M1, M3 and M4 competition data sets (28000 series) (thierrymoudiki.github.io, 2025-04-14). The expanded Theta forecasting method incorporates GLMs, GAMs, GLMBOOST, and attention mechanisms, benchmarking performance on Tourism, M1, M3, and M4 datasets using advanced statistical packages and techniques
Propensity Scores, R Packages, and Practical Advice with Noah Greifer | Season 6 Episode 3 (casualinfer.libsyn.com, 2025-04-10). Noah Greifer discusses propensity scores and R packages including WeightIt and MatchIt, offering practical advice for statistical consulting in causal inference on the Casual Inference podcast
Are You Sure Your Posterior Makes Sense? (towardsdatascience.com, 2025-04-11). MCMC samplers require careful evaluation. Techniques like R-hat and Effective Sample Size (ESS) help diagnose convergence and sampler performance, ensuring reliable Bayesian parameter estimation through robust diagnostics
Communicating complex statistical models to a public health audience: translating science into action with the FARSI approach (arxiv:stat, 2025-04-09). The FARSI approach enhances public health communication of statistical models through an R Shiny web application, enabling stakeholders to explore chronic disease prevalence and facilitating evidence-based decision-making