Data Scientist (with R)
Tuesday 29th April, 2025
Subscribe to this newsletter!
đ° Newsletters & Community Roundup
Data Science Weekly - Issue 596 (datascienceweekly.substack.com, 2025-04-24). This weekâs Data Science Weekly discusses AI tools, encapsulating concepts like YAGRI, generative AI, deep learning behaviors, and load balancing in AI networks, alongside innovative tools like Air for R formatter
rOpenSci News Digest, April 2025 (ropensci.org, 2025-04-28). rOpenSci's April 2025 Digest highlights new software packages like gtexr and c3dr, features events for data scientists, and expands accessibility with Spanish translations, emphasizing open science and community contributions
Weekly digest: In Plain Cite, Registered Reports and AIÂ training (openpharma.blog, 2025-04-25). This week features insights on plain language summaries, an upcoming webinar on Registered Reports, and a training course on AI for pharma professionals, alongside discussions on open science and journal integrity
R Weekly 2025-W18 Which programming language, Posit's VS Code Extension, My Journey Learning with R (rweekly.org, 2025-04-28). This week features early-career programming language guidance, Posit's VS Code Extension for Shiny, and a humanities student's experience learning R. Also includes package updates and upcoming R events
đ Package Updates & Tooling News
Sensitivity to C math library and mingw-w64 v12 (blog.r-project.org, 2025-04-24). Rtools45 adopted mingw-w64 v11 to address inaccuracies in math libraries affecting 21 CRAN packages due to mingw-w64 v12's optimizations. Debugging revealed many packages rely on precise numeric results not guaranteed by C standards
recipes 1.3.0 (tidyverse.org, 2025-04-28). The release of recipes 1.3.0 introduces significant updates, including changes to strings_as_factors specification and the deprecation of step_select, while enhancing features like step_impute_bag and adding a new contrasts argument for step_dummy
scales 1.4.0 (tidyverse.org, 2025-04-23). Scales version 1.4.0 introduces new color manipulation functions, palette classifications, and labeling functions, enhancing ggplot2 integration with tools like col_shift(), label_glue(), and label_dictionary()
linelist v2.0.0 (epiverse-trace.github.io, 2025-04-28). Version v2.0.0 of the linelist package introduces dependencies on R 4.1.0, removes deprecated functions, and updates to using dynamic dots for tag selection and list slicing in make_linelist()
Posit Package Manager 2025.04.0: Level Up Your Package Security & Compliance (posit.co, 2025-04-23). Posit Package Manager 2025.04.0 introduces repository-level authentication, a new --no-archived flag for compliance, and encryption key rotation, enhancing security and control for R and Python package management in regulated industries
đ Practical Tutorials & Use Cases
World Economic Outlook (freerangestats.info, 2025-04-26). IMF's World Economic Outlook highlights downward revisions in GDP growth for Palau, Marshall Islands, and Fiji, while some nations like Samoa show improvement. Tools like R and SDMX are utilized for data analysis
Function Generators vs Partial Application in R (jcarroll.com.au, 2025-04-25). Explores function generators and partial application in R, focusing on the label_glue function from the glue package, comparing it to Python f-strings, and illustrating its use in creating dynamic labels in plots
Tidy Monte Carlo simulations (mmmdata.io, 2025-04-27). Discover a tidy workflow for Monte Carlo simulations using R's tidyverse packages, focusing on statistical power and false positive rates in hypothesis testing with practical examples and clear code structure
VÄrkurser i SPSS, R och Stata (statistikakademin.se, 2025-04-24). Upcoming courses in SPSS, R, and Stata cover essential statistical concepts, data visualization, regression modeling, and survival analysis, equipping participants with skills to effectively analyze and interpret data
âBayesianâ optimization of hyperparameters in a R machine learning model using the bayesianrvfl package (thierrymoudiki.github.io, 2025-04-25). This tutorial demonstrates Bayesian optimization of hyperparameters in an XGBoost model using the bayesianrvfl package, optimizing model performance with the Sonar dataset and employing a Non-Bayesian RVFL network as a surrogate model
In defense of 3D pie charts (erikgahner.dk, 2025-04-26). Erik Gahner Larsen defends 3D pie charts, arguing they can engage audiences if they communicate effectively, citing research on aesthetic preference and visual information processing, and mentioning the ggthreed package for R
đ Academic & Research Articles
Reducing Probability to Arithmetic (jiha-kim.github.io, 2025-04-28). The exploration of merging probability and arithmetic using the Principle of Inclusion-Exclusion, indicator functions, and expectation to facilitate a deeper understanding of set theory through algebraic manipulation
Random Walks and the Dickey-Fuller Test - Part II (bytepawn.com, 2025-04-26). The article explores the behavior of stochastic processes with the Dickey-Fuller test, addressing deterministic drifts and offering code and simulations for different autoregressive models
Error statistics doesnât blame for possible future crimes of QRPs (i) (errorstatistics.com, 2025-04-27). A debate on statistical inference highlights disagreement between frequentist and Bayesian perspectives on error probabilities and their relevance, particularly in the context of sequential trials and optional stopping procedures
proximal sampler (xianblog.wordpress.com, 2025-04-27). Andre Wibisono presented on the proximal sampler targeting demarginalised densities, demonstrating its convergence properties compared to Metropolis algorithms under log-concavity assumptions during a Columbia workshop
Critical Value: Secrets of Statistical Significance (statisticalaid.com, 2025-04-23). Critical values define boundaries in hypothesis testing, determining statistical significance. They aid in decision-making by rejecting or accepting null hypotheses in tests like t-tests and chi-square tests
What is probabilistic programming? (notebook.drmaciver.com, 2025-04-23). Probabilistic programming involves developing tools for constructing random samplers with precise distributional properties, primarily to improve Monte Carlo methods and facilitate Bayesian inference in statistical analysis
BREADR: An R Package for the Bayesian Estimation of Genetic Relatedness from Low-coverage Genotype Data (joss.theoj.org, 2025-04-28). BREADR is an R Package that provides Bayesian estimation of genetic relatedness from low-coverage genotype data, focusing on kinship and pairwise mismatch rates
An introduction to R package mvs
(arxiv:stat, 2025-04-24). The R package 'mvs' offers methods for multi-view data analysis using multi-view stacking (MVS), allowing model fitting with various penalty terms and efficient handling of high-dimensional datasets and missing data
Toward a Principled Workflow for Prevalence Mapping Using Household Survey Data (arxiv:stat, 2025-04-23). A proposed workflow for prevalence mapping in low- and middle-income countries emphasizes model choice and interpretation, illustrated with a case study on antenatal care visits in Kenya, utilizing household survey data and reproducible code
PhyloProfile v2 -- Exploring multi-layered phylogenetic profiles at scale (arxiv:q-bio, 2025-04-28). PhyloProfile v2 enables visualization of gene presence-absence patterns using 2D or 3D dimensionality reduction, facilitating evolutionary analysis and functional predictions across millions of orthology relationships. Available as an R package on Bioconductor