Data Scientist (with R)
Tuesday 11th March, 2025
Subscribe to this newsletter!
🎉 Conferences & Community
rOpenSci Champions Program 2025: In Spanish! (ropensci.org, 2025-03-10). rOpenSci's 2025 Champions Program invites Latin American individuals proficient in R and Spanish, fostering sustainable research software and community building within open science through mentorship and project development
Enabling the pharmaceutical programming community to develop ADaM datasets in R: Four Perspectives From the Maintainers (posit.co, 2025-03-10). Key contributors discuss the admiral R package, emphasizing its modularity for ADaM dataset development, its impact across pharmaceutical companies, and upcoming improvements to streamline clinical trial data analysis
First Triangle Symposium on Statistics and Data Science (TSSDS) (bayesian.org, 2025-03-04). The First Triangle Symposium on Statistics and Data Science (TSSDS) will occur on May 12-13, 2025, at NC State University, focusing on local and visiting researchers with various presentation formats
Posit at ShinyConf 2025 (posit.co, 2025-03-10). Posit will present advancements at ShinyConf 2025, focusing on AI integration with Shiny, scalable app development, and deployment strategies using tools like RStudio, Jupyter, ellmer, and Posit Connect
Register for R/Pharma at posit::conf(2025) (posit.co, 2025-03-07). Register for the R/Pharma Summit and workshops at posit::conf(2025), focusing on open-source drug development and featuring RStudio, Jupyter, and VS Code environments for streamlined processes in pharmaceutical analytics
Shiny in Production 2025: Abstracts Deadline Extension (jumpingrivers.com, 2025-03-11). The abstract submission deadline for Shiny in Production 2025 has been extended to April 3, 2025. The conference features insights on Shiny applications for developers and data scientists, scheduled for October 8-9, 2025
📰 R News & Highlights
Teaching omics methods to LCG-UNAM students (lcolladotor.github.io, 2025-03-07). Students at LCG-UNAM learn omics methods including RNA-seq, microarrays, and Visium technology. Concepts such as gene annotation, transcript quantification, and data analysis with R and Bioconductor were discussed during the class
9 new books added to Big Book of R (oscarbaruffa.com, 2025-03-09). The Big Book of R adds nine new titles, featuring tools like R and ggplot2, with a focus on data visualization, credit risk modeling, and quantitative text analysis for life sciences and social data science
Our March issue is out now! (methodsblog.com, 2025-03-05). The March issue features Bayesian views on generalized additive modelling, introduces new R packages like calibrar and RRphylogeography, and presents a novel method called multiview cross-mapping for ecological data analysis
🔧 R Package Tools & Comparisons
Customize your expedition: Create a unique documentation for your R Package (rtask.thinkr.fr, 2025-03-07). Customize your R package documentation using pkgdown and its configuration file pkgdown.yml for unique navbars, function organization, Bootstrap themes, and CSS styles to create an engaging user experience
data.table vs dplyr: A Side-by-Side Comparison (albert-rapp.de, 2025-03-08). A comparison of data cleaning operations using R's dplyr and data.table packages, demonstrating their capabilities through the ames dataset and highlighting key functionalities
A practical benchmark of duckplyr (blog.schochastics.net, 2025-03-09). David Schoch benchmarks the performance of duckplyr against established R libraries like dplyr, data.table, and polars using football game results data, focusing on summarization and fast joins
R Weekly 2025-W11 LaTeX Typesetting in R, Create a unique documentation for your R Package (rweekly.org, 2025-03-10). Explore LaTeX typesetting in R, unique R package documentation, new packages like forgts and hunexpl; plus updates and resources for the R community from R Weekly 2025-W11
📊 Probability & Inference
World counting as a tool for understanding probability (notebook.drmaciver.com, 2025-03-05). World counting, a technique for understanding probability, helps in solving conditional problems by enumerating possible worlds and calculating probabilities. The approach also extends to weighted world enumeration and Bayesian reasoning
De Moivre–Laplace Theorem (gregorygundersen.com, 2025-03-08). The De Moivre–Laplace theorem shows the asymptotic relationship between the binomial distribution and the normal distribution, utilizing binomial variables and the Central Limit Theorem concepts, with a modern proof using Stirling's approximation
Varios asuntos relacionados con la causalidad (datanalytics.com, 2025-03-11). The discussion explores causal models, Bayesian analysis, and their distinctions, highlighting Rubin's causal model, Neyman-Rubin model, and the complexities of inferring causality in statistics and history
One-Tailed Vs. Two-Tailed Tests (towardsdatascience.com, 2025-03-06). Explore the differences between one-tailed and two-tailed tests in hypothesis testing, including their implications on A/B testing, sample size determination, and significance interpretation using tools like t-tests in R or SciPy
🧪 Statistical Methods & Applications
[A Function to Draw Complex Multiplets (chemospec.org, 2025-03-06)](http://chemospec.org/posts/2025-03-06 More Multiplets/MoreMultiplets.html). Bryan Hanson introduces an R function called 'multiplet' for drawing complex NMR multiplets, utilizing the updated SpecHelpers package for spectroscopic analysis and educational purposes
(News from) Probabilistic Forecasting of univariate and multivariate Time Series using Quasi-Randomized Neural Networks (Ridge2) and Conformal Prediction (thierrymoudiki.github.io, 2025-03-09). Probabilistic forecasting of univariate and multivariate time series using Ridge2 neural networks and conformal prediction offers efficient techniques for handling nonlinear interactions and enhancing prediction intervals
lpcde: Estimation and Inference for Local Polynomial Conditional Density Estimators (joss.theoj.org, 2025-03-07). lpcde provides estimation and inference tools for local polynomial conditional density estimators, utilizing advanced statistical methods including kernel techniques and local polynomials for enhanced density estimation
fastfrechet: An R package for fast implementation of Fréchet regression with distributional responses (arxiv:stat, 2025-03-09). Fastfrechet is an R package for efficient Fréchet regression with distributional responses, incorporating variable selection and resampling tools, applicable to large datasets like the UK Biobank, leveraging the 2-Wasserstein metric
Shiny-MAGEC: A Bayesian R Shiny Application for Meta-analysis of Censored Adverse Events (arxiv:stat, 2025-03-07). Shiny-MAGEC is an R Shiny application for Bayesian meta-analysis of censored adverse event data, providing users with tools to estimate AE incidence and assess drug safety accurately while addressing reporting biases
A Statistical Interpretation of Multi-Item Rating and Recommendation Problems (arxiv:stat, 2025-03-04). A Bayesian method for interpreting ordinal user ratings and quantifying uncertainty is introduced, demonstrating competitive performance in parameter estimation and prediction on simulated and real data, including speed dating datasets
The impact of the storytelling fallacy on real data examples in methodological research (arxiv:stat, 2025-03-05). The paper discusses 'researcher degrees of freedom' and the 'storytelling fallacy,' highlighting how selective anecdotal evidence can skew conclusions in methodological research, particularly in data analysis practices