Data Scientist (with R): 1st April 2025
Published 1st April 2025
📰 Community News
Exploring Tidy Tuesday: A Community Initiative for Notre Dame Data Science (datascience.nd.edu, 2025-03-25). Notre Dame's Tidy Tuesday fosters data skills in a community setting, focusing on data cleaning and visualization using various programming languages, providing learning opportunities for students, alumni, and professionals alike
February 2025 Top 40 New CRAN Packages (rworks.dev, 2025-03-28). February 2025 introduces 40 new CRAN packages across 15 categories, including tools for AI, computational methods, health sciences, and time series analysis, featuring frameworks like GitAI, PatientLevelPrediction, and BayesChange
R Weekly 2025-W14 Mall, Gradients (rweekly.org, 2025-03-31). This week features resources on text summarization with LLMs, R package tests, igraph, and accessibility in greenhouse gas reporting, alongside new CRAN packages and tools for effective data visualization in R
rOpenSci News Digest, March 2025 (ropensci.org, 2025-03-27). rOpenSci March 2025 Digest: New 2025 Champions Program in Spanish, improved R-universe documentation, participation in NumFOCUS DISC Unconf, and new software packages including pangoling and mbquartR
Advancing Open-Source Scientific Software with Peer Review (ropensci.org, 2025-03-26). rOpenSci's Noam Ross discusses a decade of enhancing open-source scientific software through peer review, fostering collaboration, skills, and a diverse community of researchers and developers
💻 R Development Tools
checkglobals: an(other) R-package for static code analysis (openanalytics.eu, 2025-03-25). checkglobals is a new R package for static code analysis, efficiently identifying undefined global variables and missing imports, thereby enhancing reproducibility and maintainability in R scripting and package development
Should I Use Your R Package? (jumpingrivers.com, 2025-03-31). Choosing R packages hinges on individual risk appetite. Factors include package reliability, unit testing, and documentation. The Litmus framework aids in validating packages, addressing diverse user needs across risk spectrums
A Very Informal Guide to R Programming (blog.devgenius.io, 2025-03-27). R is an open-source statistical computing language created in 1993, used for data visualization and analysis. RStudio is a recommended IDE for R, enhancing the coding experience with integrated features
🛠 Data & Viz Tutorials
A 3D map of downtown Madison (haraldkliems.netlify.app, 2025-03-31). Utilizing digital surface model data and the rayshader package, a highly detailed 3D map of downtown Madison was created, featuring techniques for data preparation, 3D rendering, and animation of geographical visualizations
File Management With The {fs} Package (albert-rapp.de, 2025-03-29). Explore file management using the fs package in R, with functions to assemble paths, modify file extensions, and retrieve directory information efficiently, balancing ease of use and technicality
How to Filter in data.table in R (marsja.se, 2025-03-25). Learn how to efficiently filter data in R using data.table by applying conditions, multiple filters, the %in% operator, and group-based filtering, all while benefiting from the package's speed advantages
How to Get Number of Rows in R Using data.table (marsja.se, 2025-03-29). Learn to count rows in R using nrow() for data.frames and data.tables, comparing their performance with microbenchmark to demonstrate data.table's efficiency for large datasets
Sparklines in Reactable Tables in Shiny Apps (jumpingrivers.com, 2025-03-27). Learn to integrate sparklines in Reactable tables within Shiny apps using the sparkline and reactable R packages, including dynamic images for data representation
RObservations #51: Download Kaggle Datasets into the R Console with {RKaggle} (bensstats.wordpress.com, 2025-03-30). The RKaggle package allows seamless downloading of datasets from Kaggle directly into R, utilizing tools like devtools or remotes for installation from GitHub, enhancing data accessibility for R users
Converting arbitrarily large CSVs to Parquet with R (lorentzen.ch, 2025-03-30). Utilize R with DuckDB and Polars to efficiently convert large CSV files (2.2 GB) to Parquet format, demonstrating significant performance with 3.5 seconds for DuckDB and 9 seconds for Polars
📚 Academic Methods
Frequently Asked Questions (metafor-project.org, 2025-03-27). metafor is an R package for meta-analysis, featuring technical tools for fitting random/mixed effects models, validating analysis results, and estimating statistics like $I^2$ and $H^2$ using comprehensive methods
Multiple imputation for coarsened (grouped) factor covariates (thestatsgeek.com, 2025-03-27). Multiple imputation (MI) for coarsened factor covariates is enhanced in the smcfcs package for R, enabling the inclusion of partial information about missing values while respecting data integrity
Bayesian proportional hazards model for a stepped-wedge design (rdatagen.net, 2025-04-01). A Bayesian proportional hazards model integrating non-linear time trends and random effects for stepped-wedge trials is presented, using R libraries like simstudy and cmdstanr for simulation and analysis
Nonlinear conformalized Generalized Linear Models (GLMs) with R package ‘rvfl’ (and other models) (thierrymoudiki.github.io, 2025-03-31). Explore nonlinear conformalized Generalized Linear Models (GLMs) using the R package 'rvfl', implementing neural networks, Poisson, Quasi-Poisson, and zero-inflated models with practical coding examples
Prologus 56: Probability Pyramiding (A. Neher) (nulliusinverba.podbean.com, 2025-03-28). Discussion focuses on A. Neher's 1967 paper addressing probability pyramiding, research errors, and the necessity of independent replication, in light of Ioannidis' 2005 work on the reliability of published research findings
Nicolas Mongiardino Koch: Chronospaces: An R package for the statistical exploration of divergence times promotes the assessment of methodological sensitivity (methodsblog.com, 2025-03-25). Nicolas Mongiardino Koch's R package, Chronospaces, aids in exploring divergence times, assessing methodological sensitivity, and visualizing impacts of calibration choices on phylogenetic analyses using genome-scale datasets
consexpressionR: an R package for consensus differential gene expression analysis (arxiv:q-bio, 2025-03-27). consexpressionR is an R package that automates differential expression analysis with consensus from seven methodologies, significantly enhancing accuracy and reducing false positives in RNA-Seq studies using qPCR data as a reference
🔍 Applied Research
A "polymorphic" trait evolution model with a null or absent condition (blog.phytools.org, 2025-03-28). A junior researcher explored a polymorphic trait evolution model in R with a null condition using fitMk() function to analyze gene presence and absence while considering custom transition rates
Visualizing Urban and Demographic Data in R with ggplot2 (programminghistorian.org, 2025-03-27). Explore R's ggplot2 package to visualize urban demographic data through sister-city relationships in post-WWII Europe, revealing patterns using scatter plots, bar charts, and histograms within an exploratory framework
You cant keep a good dog down! (wherearethenumbers.substack.com, 2025-03-28). An update on a paper discussing miscategorization in COVID vaccine studies, highlighting new discoveries related to observational studies, RCTs, and case exclusions, along with impacts on reported vaccine efficacy
You may also like
About Data Scientist (with R)
Our Data Scientist newsletter covers the latest developments, packages, techniques, and insights in R programming and data science. Each week, we curate the most important content from your favourite R blogs so you don't have to spend hours searching.
Whether you're a beginner or expert in data science with R, our newsletter provides valuable information to keep you informed.
Subscribe now to join thousands of professionals who receive our weekly updates!