### AS BIOF

Advanced Studies in Bioinformatics and Data Science

The FAES Academic Programs at NIH offers a unique Advanced Studies in Bioinformatics and Data Science to serve the quickly evolving needs of today’s biomedical research community. As one of the most dynamic fields intersecting biology and computer science, bioinformatics and its data analysis tools equip life sciences researchers and professionals with highly in-demand skills in the pharmaceutical and biotechnology industries. Courses are offered in the evenings, making it convenient for working professionals and postgraduate Fellows to gain expertise and experience in the theoretical foundations and practical skills required to harvest the wealth of information contained in the vast amount of biological phenomena. The courses have been designed to train today’s biomedical researchers in new methods and techniques in data science and prepare them to translate and analyze the immensity of biological data.

### General Requirements

The program is designed for participants who hold an advanced degree in life sciences or STEM fields.

The Advanced Studies comprises a 14-credit curriculum required and elective courses. Courses are held in the evenings to fit the needs of working professionals and postgraduate fellows.

### Required Courses

BIOF 309 | Introduction to Python

BIOF 518 | Theoretical and Applied Bioinformatics I

BIOF 519 | Theoretical and Applied Bioinformatics II

BIOF 521 | Bioinformatics for Analysis of Next Generation Sequencing

### Electives

BIOF 339 | Practical R

BIOF 395 | Introduction to Text Mining

BIOF 450 | Bioinformatics, Evolutionary Genomics, and Computational Biology

BIOF 475 | Introduction to New Technologies in Data Science

BIOF 501 | Introduction to R: Step-by-Step Guide

BIOF 509 | Applied Machine Learning

STAT 500 I | Statistics for Biomedical Scientists I

STAT500 II | Statistics for Biomedical Scientists II

#### Credits

14#### Learning Objectives:

### Upon completion, students will be able to:

- Learn to use effectively different techniques to analyze biological data from high throughout approaches
- Perform statistical analysis and visualization of biological data
- Apply bioinformatics techniques for analysis of genomic, expression and proteomic data
- Understand the uses and limitations of bioinformatics data analysis tools and technologies
- Learn how the computational methods are used in new applications in basic biology and also how they are translated into the development of new drugs and diagnostic tools

### BIOF 017

Introductory R Boot Camp

In this workshop, learners will learn the basics of how to use R to wrangle data, create visualizations, and conduct exploratory analyses. The workshop will use the popular “tidyverse” suite of packages and will also teach learners the concept of tidy data and how it facilitates analysis.

#### Credits

-#### Learning Objectives:

- Use RStudio and the tidyverse suite of packages to load and work with datasets
- Describe what makes data “tidy,” why tidy data is useful to work with, and how to make datasets tidy
- Transform data to prepare it for visualization and analysis
- Describe the “Grammar of Graphics” philosophy that underlies the ggplot2 visualization package
- Create visualizations including barplots, scatterplots, line graphs, and more using ggplot2
- Customize all aspects of visualizations
- Save and export visualizations for print, submission to journals, and other applications
- Use visualizations to explore and identify patterns in data Identify and handle missing data Identify covariation in variables
- Build simple linear models to identify relationships between variables

### BIOF 018

Intermediate R Boot Camp

This workshop builds upon the principles of using R for data science by introducing intermediate concepts that will help learners advance their knowledge and use R for more complex tasks. These tasks include working with APIs and packages to access data on remote servers, iterating tasks over datasets, and writing custom functions.

#### Credits

-#### Prerequisites

Or adequate familiarity with R.

#### Learning Objectives:

- Use the apply family to repeat functions over multiple data objects
- Use if/else statements for conditional functions
- Use case_when to vectorize multiple if/else statements
- Use for loops to repeat functions
- Understand how to use tidyverse “verbs” to wrangle data
- Understand how and why to convert data from wide to long format
- Summarize and transform data using tidyverse verbs
- Understand when to create a custom function
- Write custom functions to carry out complex tasks
- Troubleshoot and debug functions

### BIOF 019

Designing Effective Data Visualizations in R

This workshop will explore both the design side and the coding side of creating visualizations in R. The first session will introduce best practices for designing effective visualizations, and learners will put these into practice in the next two sessions to create static and interactive visualizations. Learners will be introduced to Shiny, an R package used to build interactive web apps.

#### Credits

-#### Prerequisites

The above workshop or adequate knowledge of coding in R.

#### Learning Objectives:

- Understand how to use principles of human visual perception to create effective visualizations
- Describe elements of design such as line, shape, value, texture, and space and understand how to effectively use them in visualizations
- Use color to convey meaning, including using color-blind friendly palettes
- Describe the “Grammar of Graphics” philosophy that underlies the ggplot2 visualization package
- Create visualizations including barplots, scatterplots, line graphs, and more using ggplot2
- Customize all aspects of visualizations
- Save and export visualizations for print, submission to journals, and other applications
- Understand how to use the UI and server functions to create Shiny objects
- Create visualizations that change based on user input
- Build simple web apps incorporating visualizations

### BIOF 020

Python for Beginners

In this introductory workshop, users will learn how to create, read, transform, and visualize data for scientific analysis. Learners will also gain skills in applying basic computer logic to scientific applications.

#### Credits

-#### Prerequisites

Basic computer skills, including where to quickly find specific directories and files.

#### Learning Objectives:

Script creation and execution Array creation Two-variable plotting (x-y line graphs) Saving plots Data transformation and manipulation Logic statements and loops Reading data from a file and saving data to a file Handling missing data in files Simple linear model (e.g. time series analysis) Advanced visualizations: scatterplots and bar charts Statistics in Python Customizing visualizations: titles and subtitles, axes labeling, colors Finalize visualization project Real-world biological concepts such as a deeper exploration (e.g., COVID-19 data, population modeling)

### BIOF 021

R for Analysis of Text Data

This workshop will provide an introduction to working with text data in R and explore various approaches to analyzing text data. The first session will cover principles for wrangling text data as well as some basic text mining applications. The subsequent two sessions will delve into specific techniques to enable automated analysis of text data.

#### Credits

-#### Prerequisites

Any of the above courses and workshops or basic familiarity with R.

#### Learning Objectives:

- Read text data into R and prepare it for analysis
- Understand and select from various options in preparing text, such as stemming, lemmatization, term frequency weighting, term frequency-inverse document frequency weighting (tf-idf), and tokenization
- Conduct simple text mining to explore content of a text corpus
- Describe how unsupervised approaches can be used to identify clusters of related documents
- Process text data to prepare for unsupervised analysis; o Build, train, and evaluate models for text clustering
- Interpret outputs of clustering algorithms
- Describe how supervised approaches can be used to develop text-based models for multi-class classification
- Process text data to prepare for supervised analysis; o Build, train, and test models for text classification

### BIOF 043

For True BeginRs | Hands-on R Training

R is a free, cross-platform – Windows, Mac, and Linux – programming language, designed specifically to facilitate data management, analysis, and visualization. Boasting vibrant development and support communities, R has become an indispensable tool for bioinformaticians, statisticians, and data scientists. Created with true beginRs in mind, this training will teach participants the fundamental, transferable skills needed to unleash R’s full potential for producing publication-worthy analyses and visualizations.

#### Credits

-#### Prerequisites

Participants should be comfortable with basic computer skills.

#### Learning Objectives:

- Interfacing with R using RStudio
- Using RStudio’s built-in help function – ? – as well as resources for troubleshooting, including rdocumentation.org, cheat sheets, vignettes, YouTube channels, and stackexchange.com
- Creating project files; Working with the RStudio command line
- Identifying and changing the current working file directory
- Variables – local vs. global – naming conventions, and assignment operators
- Writing their first R script and how to properly document their code via commenting
- Using the ‘$’ accessor function
- The most common data types, including character strings, numerical, integers, and logicals
- How to access data entries using [] and [[ ]]; The most common data structure types, including vectors, lists, factors, data frames, and tibbles
- Package libraries and how to install them; Loading data into R and basic troubleshooting when importing data
- Data management, manipulation, subsetting, piping, and exploration using dplyr
- Creating and exporting highly customizable, publication-quality data visualizations with ggplot2
- Using R to perform statistical analyses, including simple linear regression, χ2 contingency table analysis, t-tests, and analysis of variance

### BIOF 045

Next Generation Sequencing Data Analysis

Next generation sequencing technologies are producing enormous amount of sequencing data. Analyzing this massive amount of data requires the ability to use the sophisticated tools and techniques.

This workshop introduces participants to bioinformatics tools and methods for analyzing next generation sequencing data, particularly for DNA-seq (Variant analysis), RNA-seq (Transcriptome analysis), ChIP-seq (Transcriptional factor binding analysis), and network-based integration of NGS data.

#### Credits

-### BIOF 048

Singe Cell RNA Seq Analysis

#### Credits

-### BIOF 050

Introduction to Deep Learning

In the past decade, neural networks have become a valuable tool for data scientists, revolutionizing fields such as text processing, image analysis, genomic/proteomic data analysis, data clustering, and much more. However, these algorithms can be very difficult to understand, interpret, and program. This workshop will first cover the theory and proper applications of various neural networks (multilayer perceptrons, convolutional neural networks, long-short term memory models, autoencoders, etc.). From there, powerful deep learning packages, such as Pytorch and Keras, will be introduced. Proper coding techniques will be shown through examples and practiced through exercises that will be completed in the Python 3 programming language. Finally, concepts in data visualization and software engineering will be discussed, helping researchers use neural networks in an effective and reproducible way to improve the impact of projects with a computational component.

#### Credits

-#### Prerequisites

### BIOF 052

Artificial Intelligence in Your Lab

Artificial intelligence (AI) in biomedical research has grown exponentially in the past decade. AI can be used to uncover powerful new insights in data that your lab is already collecting. This workshop has two primary components. First, participants will engage in discussions that cover recent advances in artificial intelligence (AI) and how these developments can be used in biomedical research. Topics will include active learning, adversarial learning, Bayesian deep learning, reinforcement learning, semi-supervised learning, self-supervised learning, and transfer learning. These topics will be covered in an integrated manner: the discussions will explore how different facets of AI can interact with each other to generate high-quality results. Second, participants will work with the instructor to design and implement AI project(s). These projects will have direct relevance to the research being done by each participant. This workshop will have 1 day of discussion followed by 1 week of offline work where the participants communicate directly with the instructors about project development.

#### Credits

-### BIOF 075

Metagenomics Data Analysis

Metagenomics is gaining importance due to low-cost next generation sequencing technologies. This workshop introduces end-to-end solutions for analyzing metagenomic data, including data-quality analysis, alignment, community profiling, taxonomic comparison, and novel taxa discovery.

#### Credits

-### BIOF 076

Visualization with R

R is the industry standard for creating specific graphs and plots. This workshop walks participants through creating interactive, static, and shareable plots using popular R packages. The workshop will cover formatting data, loading data, setting parameters, creating images, and saving outputs.

#### Credits

-### BIOF 077

Molecular Modeling and Molecular Dynamics: Hands-on Training

Predicting the effect of a mutation on the structure and function of a protein is not just for researchers with computer facilities. Users with basic molecular biology background can set up and run intensive computational modeling and dynamics experiments. In this workshop, participants will use open-source tools and techniques to conduct molecular modeling and dynamics experiments.

#### Credits

-### BIOF 082

Introduction to Bioinformatics: Theory and Application

Bioinformatics (Computational Biology) is a must skill required in every modern biomedical research lab. Installing and configuring a wide variety of computational biology tools is a cumbersome task that requires software engineering skills.

This workshop provides an introduction to basic concepts in using popular tools and techniques for sequence analysis, structure analysis, function prediction, biological database searching, “omics” data analysis, pathway analysis, data visualization, data curation and integration, and scripting basics.

#### Credits

-### BIOF 084

Pharmacometric Dose-Response Analyses in Clinical Trials using R

In order for a drug to get approved by the FDA for market in the USA, the sponsor must ultimately demonstrate the drug has: 1) a predictable exposure profile with dose; 2) a good safety profile; and 3) is effective at safe doses. Therefore, the pharmacology of a drug is essentially being reviewed by the FDA. The ability of scientists to analyze drug exposure/response relationships is crucial to understanding what exposure amount will elicit the safest, most effective response, and ultimately what dose amount and frequency will produce the optimal exposure amount. Additionally, the ability to identify sub-populations that may produce differing exposure or response levels is key to providing as many subjects as possible a safe and effective dose. This quantitative exposure/response analyses, often referred to pharmacometrics, is key to making go/no go decisions both during clinical trials by investigators and by the FDA during the subsequent review period. Participants will learn basic pharmacology theory with introductory statistics using a popular open-source software program (R Studio) that is capable of conducting pharmacokinetic (PK) exposure and pharmacodynamic (PD) response analyses from example clinical trial data. Ultimately, the framework of analyzing exposure/response relationships will be demonstrated in order to make go/no go decisions. This course is designed for researchers and clinicians interested in learning how to utilize freely available software to explore, visualize, and understand drug exposure/response relationships where responses include any clinical endpoint collected on a trial, or for researchers and clinicians interested in understanding and predicting the effect of different doses on drug exposure as well as the effect of exposure on a variety of clinically relevant response endpoints (biomarkers), or for medical, pharmacy, dental, nursing, and lab-based graduate-school students interesting in obtaining a deeper understanding of pharmacokinetics, exposure/response analyses, as well as a broad understanding of clinical drug development and the impact of pharmacometrics on decisions.

#### Credits

-### BIOF 085

Introduction to Data Science with Python

Scientists generate more data than ever before. It can be daunting to determine how to extract insights from a mountain of data. Data science is a relatively new discipline that combines traditional statistics and analytics with programming to produce novel insights, intelligently automated processes, and data-driven decisions.

This course will equip you with everything you need to complete a basic data science project using Python from beginning to end. Participants will be exposed to a practical, real-world use case, which will be built on throughout the course. Students will use the data from this use case to perform exploratory analysis and build their skills up to advanced analytics.

#### Credits

-### BIOF 089

Microbiome Bioinformatics with QIIME2

Members of the QIIME development group will lead this hands-on workshop on bioinformatics tools for microbial ecology. The workshop will include lectures covering basic QIIME usage and theory, and hands-on work with QIIME, to perform microbiome analysis from raw sequence data through publication-quality statistics and visualizations. The workshop will also cover related bioinformatics tools including DADA2,Emperor, scikit-bio, and an introduction to applied bioinformatics. This workshop will provide the foundation on which participants can begin using these tools to advance their own studies of microbiome analysis or microbial ecology. This is a hands-on workshop. Participants must bring their laptop. This workshop will be on QIIME2 platform only.

#### Credits

-### BIOF 090

MATLAB Fundamentals

MATLAB® is a programming platform designed for scientists and engineers. This course provides a comprehensive introduction to the MATLAB® technical computing environment. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course.

#### Credits

-### BIOF 091

Image Processing and Computer Vision with MATLAB

This course provides hands-on experience with performing image analysis. Examples and exercises demonstrate the use of appropriate MATLAB® and Image Processing Toolbox™ functionality throughout the analysis process. The course also provides hands-on experience with performing computer vision tasks. Examples and exercises demonstrate the use of appropriate MATLAB® and Computer Vision System Toolbox™ functionality.

#### Credits

-#### Prerequisites

The above workshop or equivalent MATLAB experience.

### BIOF 093

Machine Learning with MATLAB

MATLAB is a high-level language that enables you to quickly perform computation and visualization through easy-to-use programming constructs.

This two-day course focuses on data analytics and machine learning techniques in MATLAB® using functionality within Statistics and Machine Learning Toolbox™ and Deep Learning Toolbox™. The course demonstrates the use of unsupervised learning to discover features in large data sets and supervised learning to build predictive models. Examples and exercises highlight techniques for visualization and evaluation of results. Topics include:

- Organizing and preprocessing data
- Clustering data
- Creating classification and regression models
- Interpreting and evaluating models
- Simplifying data sets
- Using ensembles to improve model performance

#### Credits

-### BIOF 097

Practical Scientific Statistics

As big data becomes the norm and experiments continue to increase in scale, proper understanding and use of statistics is becoming increasingly important for scientists in every field. While experimental researchers are expert in concepts related to their respective fields and receive extensive scientific education, statistical training is relatively lacking. As a result, experimental researchers may feel overwhelmed or uncertain about how to correctly use statistics to quantify their experimental results and how to properly interpret the results of those statistical tests. Unfortunately, this knowledge gap can result in both reduced understanding of reported results in scientific publications as well as superficial or potentially inaccurate reported statistics. This course serves as a practical, hands-on workshop to close the knowledge gap and help experimental researchers learn how to choose a statistical test for their data, how to perform those tests, and how to interpret the results. The workshop starts by establishing a solid foundation in basic statistical theory before advancing to practical applications of statistical tests on real data.

#### Credits

-#### Prerequisites

Attendees should have access to and basic knowledge of Excel. Analyses will also be demonstrated in SPSS and R, but no formal programming skills are required.

#### Learning Objectives:

As big data becomes the norm and experiments continue to increase in scale, proper understanding and use of statistics is becoming increasingly important for scientists in every field. While experimental researchers are expert in concepts related to their respective fields and receive extensive scientific education, statistical training is relatively lacking. As a result, experimental researchers may feel overwhelmed or uncertain about how to correctly use statistics to quantify their experimental results and how to properly interpret the results of those statistical tests. Unfortunately, this knowledge gap can result in both reduced understanding of reported results in scientific publications as well as superficial or potentially inaccurate reported statistics. This course serves as a practical, hands-on workshop to close the knowledge gap and help experimental researchers learn how to choose a statistical test for their data, how to perform those tests, and how to interpret the results. The workshop starts by establishing a solid foundation in basic statistical theory before advancing to practical applications of statistical tests on real data.

### BIOF 098

Introduction to Statistical Analysis in R

R is a popular platform to perform statistical analysis. This workshop introduces how to perform basic statistical analysis using the R platform. First, participants will learn to use the R and RStudio software platforms. Followed by an introduction to basic statistical concepts, participants will do hands-on exercises to perform basic statistical analysis using the R platform.

#### Credits

-### BIOF 101

Introductory Coding Skills

Computer programming has become an essential part of many different careers, and programming skills are extremely advantageous when applying for educational or employment opportunities. This course will teach students the needed programming skills for success in academic and industrial settings. Students will first learn the basic components of programming, such as variables, conditionals, loops, object oriented programming and simple data structures. Students will also be taught how to properly prepare a computer for efficient and effective programming. Students will then apply basic programming principles by using the Python programming language to complete assignments that reflect real-word tasks. Finally, students will put together a plan for further programming education.

#### Credits

1#### Prerequisites

#### Learning Objectives:

- Define and understand foundational programming concepts (i.e. algorithms, functions)
- Gain familiarity with key components of Python programming such as variables, conditionals, and loops
- Analyze basic functions written in python, and use Python to implement basic functions
- Assess the relevant applications of programming in their own work/life

### BIOF 309

Introduction to Python

Python is a free, open-source and powerful programming language that is easy to learn. This course is intended for non- programmers who want to learn how to write programs that expand the breadth and depth of their daily research. Most elementary concepts in modern software engineering will be covered, including basic syntax, reading from and writing texts files, debugging python programs, regular expressions, and creating reusable code modules that are distributable to peers. The course will also focus on potential applications of Python to bioinformatics, including sequence analysis, data visualization and data analysis. Students will also learn to use the Jupyter Notebook and the PyCharm integrated development environment (IDE), which are available at no cost.

**INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS.**

*Continuum Analytics Installer Anaconda (V3) will be utilized to install Python and the necessary packages. *

#### Credits

2#### Learning Objectives:

- Gain basic understanding of elementary concepts ubiquitous in modern software engineering: regular expressions; reading from and writing to text files; and, recursion
- Apply Python to important functions in bioinformatics such as sequence analysis, data analysis and data visualization

- Learn how to obtain and rework an existing script to meet current needs
- Gain experience in two programming environments (Jupyter Notebook and PyCharm IDE)

### BIOF 339

Practical R

The goal of this course is to introduce biomedical research scientists to R as an analysis platform rather than a programming language. Throughout the course, emphasis will be placed on example-driven learning. Topics to be covered include: installation of R and R packages; command line R; R data types; loading data in R; manipulating data; exploring data through visualization; statistical tests; correcting for multiple comparisons; building models; and, generating publication-quality graphics. No prior programming experience is required.

**INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS. **

#### Credits

2#### Learning Objectives:

- Run R GUI and make use of command line features, including command history and help pages
- Find and make use of the extensive libraries (R add-ons) available for analyzing biological and other forms of data
- Load, manipulate, and combine data to make it amenable to further analyses
- Visualize data with extensive graphics capabilities of R (including ggplot)
- Use appropriate statistical tests on data within R that will conform to standards expected in scientific journals

### BIOF 395

Introduction to Text Mining

Between Electronic Medical Records and Electronic Health Records, PubMed, and collections of biomedical grant applications, there exist large quantities of medical information stored in databases waiting to be explored. Besides tables of numbers, medical records also contain a great amount of free-text paragraphs that are comprehensible to human readers but challenging to computers. Text mining is an interdisciplinary area that primarily combines advances in Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML) to help the computers understand human written language and thus extract medical and clinical information from free-text records. This class aims to introduce fundamental subjects in text mining such as tokenization, named entity recognition (NER), grammars, parsing, relation extraction, and document classification. The class is oriented towards hands-on experience with Python and Natural Language Toolkit (NLTK).

#### Credits

2#### Prerequisites

Prior exposure to programming and Python is encouraged but not required to attend this class

#### Learning Objectives:

- Learn basic programming in Python
- Master fundamental building blocks of Natural Language Processing
- Acquire hands-on experience with NLTK, a Python toolkit for NLP
- Gain an introduction to statistical models of Machine Learning applied to NLP and IR

### BIOF 398

Practical Deep Learning

Deep learning (DL) is emerging as a major disruptive technology in biomedical and clinical research. It is also a skill with high demand in the decade to come. This course aims to teach the foundations to understand how neural network works and also introduce latest developments. You will build your own neural networks and gain skills to apply deep learning to your field. The course consists of a set of lectures over the 7 weeks. A number of course videos and assignments will be released every week to cover basis and advanced topics in deep learning. The assignments consist of multi-choice and short-answering questions, and coding problems. We will start from basics of neural networks, introduce the loss function, optimization and how to setup and manage the training session. The next section is the convolutional neural network for imaging and vision tasks. We will learn the recurrent neural network (RNN) for the sequence data. More recently, attention mechanism and transformer models (BERT, GPT family etc.) are very popular. They are introduced after RNN. We will teach generative model and in details the GAN (generative adversarial network). The technique to visualize the neural network is introduced to help understand how and why the neural network works. The course will end with a focus on how to handle "small dataset" use case, as in many practical applications, we may not be able to acquire large labelled dataset. Three techniques are introduced, transfer learning, meta learning and contrastive learning (as the more recent development of self-supervised learning).

#### Credits

2#### Learning Objectives:

- Introduce the theory of deep learning
- Present in-depth how DL model works
- Present the widely used DL architecture
- Grow the mindsets of machine learning and DL based problem solving
- Provide practices to build your own model
- Prepare students for DL related job opportunities.

### BIOF 399

Deep Learning for Healthcare Image Analysis

In this course, students will learn how to apply Convolutional Neural Networks (CNNs) to MRI scans to perform a variety of medical tasks and calculations. Upon completion of this course, students will be able to apply CNNs to MRI scans to conduct a variety of medical tasks. INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS (Mac, Linux or Windows).

#### Credits

2#### Prerequisites

Previous programming experience is not required, but is recommended.

#### Learning Objectives:

- Understand how to use popular image classification neural networks for semantic segmentation
- Use the popular R programming language with deep learning framework MXNet to create a powerful GPU accelerated convolution neural network (CNN) solution for quantitative medical image analysis
- Use deep-learning techniques to predict genomic biomarkers from medical image analysis
- Explore other areas of innovation and research
- Get hands-on guidance to try many different deep-learning frameworks

### BIOF 439

Data Visualization with R

This course will demonstrate and practice the use of R in creating and presenting data visualizations. After a short introduction to R tools, especially the tidy verse packages, the course will cover principles for data visualization, examples of good and bad visualizations, and the use of ggplot2 to create static publication-quality graphs. Students will also have the chance to learn about modern web-based interactive graphics using the html widgets packages as well as dynamic graphics and dashboards that can be created using flex dashboard and Shiny. The course will explore ways in which bioinformatics data can be presented using static and dynamic visualizations. Finally, RMarkdown and other packages will be used to develop webpages for presenting data visualizations as self-explanatory and possibly interactive storyboards.

#### Credits

1#### Prerequisites

none, however, BIOF 339 Practical R or equivalent introductory course to R would be useful.

#### Learning Objectives:

- Understand principles of good data visualization to avoid poor or inappropriate data visualization
- Gain knowledge of appropriate use of color, symbols, and small multiples
- Learn about static and dynamic data visualizations, using the web as a presentation medium

### BIOF 440

Data Visualization with Python

This course will demonstrate and practice the use of Python in creating and presenting data visualizations.

#### Credits

1#### Prerequisites

None, however, the above course or equivalent introductory course to Python would be useful.

#### Learning Objectives:

- Understand principles of good data visualization to avoid poor or inappropriate data visualization
- Gain knowledge of appropriate use of color, symbols, and small multiples
- Learn about static and dynamic data visualizations, using the web as a presentation medium

### BIOF 450

Evolutionary Genomics

Enormously large series of complex and chaotic events have shaped the genomes of eukaryotes, prokaryotes, and viruses.This course will address cutting-edge approaches to the computational investigation of these events, with an eye toward developments in translational systems biology. The course will begin by presenting the fundamentals of evolutionary genomics, including basic properties of genomes and comparative genomics, population genetics, and sequence-structure-function relationships. Experimental design and biological project integration will be a major theme of the course. Specific lectures on statistical analysis, similarity searches, Next Generation Sequencing, epigenomics, and other specialized topics will supplement those given in the earlier part of the course.

#### Credits

2#### Learning Objectives:

- Perform statistical analysis and display data
- Learn applications of evolutionary genomics, including cancer genomics, evolution of immune systems, and analysis of brain developmental problems
- Apply the skills acquired to complete a computational biology project

### BIOF 475

Introduction to Data Science

Learning from data in order to make useful predictions or obtain insights is a cornerstone of modern science. The goal of this course is to introduce students to the basic tools and workflows for doing this, with a focus on biological- and health-related data. In this course, students will learn how to use Python-based tools, particularly Numpy, SciKit-learn, Pandas, and Matplotlib.

#### Credits

2#### Prerequisites

Previous programming experience is not required, but is recommended.

#### Learning Objectives:

- Load and clean data
- Choose what type of model (e.g. supervised or unsupervised) to use based on the questions being asked of the data
- Build and validate the chosen model
- Visualize and explain what that model learned from the data

### BIOF 501

Introduction to R: Step-by-Step Guide

R is a free statistics software that is becoming increasingly popular and important for data analysis in biology. During the course, students will first learn how to handle the R programming environment. Next, students will learn how to simulate data for analysis, while the background for R programming will be provided in accompanying lectures. At the end of the course, students will become familiar with simple R programming, which they will be then able to apply for their own data analysis.

#### Credits

2#### Learning Objectives:

- Introduce R programming environments for scientific analysis
- Understand the concepts of basic data structures, such as Vectors, Matrices, Arrays, List, and Data Frames
- Introduce data handling and visualization in R
- Understand the concepts of Packages and simple R programming

### BIOF 509

Applied Machine Learning

Machine learning is a computational field that consists of techniques allowing computers to learn from data and make data-driven predictions or decisions. The ability to effectively implement machine learning approaches is a crucial component of data analysis. BIOF 509 provides a comprehensive overview of machine learning concepts, project design, and implementation. The course will give a conceptual overview of the most popular machine learning algorithms with examples of how/when to apply them to datasets. Algorithms that will be covered include: support vector machines, decision trees, random forests, multiple clustering approaches, and deep learning. Best practices in designing machine learning projects will also be emphasized, and this course will introduce strategies to avoid common pitfalls and to accurately interpret results. To reinforce key concepts, this course contains 4 written homework assignments and a research project. Through the homework assignments, students will (i) study theory behind common machine learning algorithms and (ii) explore examples of successful machine learning projects in biomedical research. For the research project, students will use python machine learning packages (Scikit-Learn, Tensorflow, Pytorch) to design a multistep pipeline to analyze a dataset of their choice. Students will also be expected to use Github to demonstrate proper documentation and version control practices when completing the project.

#### Credits

2#### Prerequisites

Students should have previously completed BIOF 309 Introduction to Python or have equivalent experience. While the course will include a brief Python refresher, the emphasis of the course will be on applying machine learning.

#### Learning Objectives:

- Choose appropriate machine learning techniques for data analyses and interpret their results
- Design properly machine learning analysis pipelines and avoid common pitfalls
- Complete a short research project using machine learning

### BIOF 510

Advanced Applications of Artificial Intelligence

In the past decade, big data has become increasingly prominent in many fields, including healthcare and biomedical research. These increasingly large datasets pose a unique challenge to researchers. In these cases, nuanced approaches to machine learning are often necessary to extract important information. BIOF 510, a continuation of BIOF 509, will cover advanced applications of popular machine learning algorithms, including support vector machines, random forests, and neural networks. Neural network algorithms that will be covered include multi-layer perceptrons, convolutional neural networks, recurrent neural networks, probabilistic neural networks, and autoencoders. To reinforce key concepts, this course contains 4 written homework assignments and a research project. Through the homework assignments, students will (i) study theory behind common machine learning algorithms and (ii) explore examples of successful machine learning projects in biomedical research. For the research project, students will use python machine learning packages (Scikit-Learn, Tensorflow, Pytorch) to design a multistep pipeline to analyze a dataset of their choice. Students will also be expected to use Github to demonstrate proper documentation and version control practices when completing the project.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Choose appropriate machine learning techniques for data analyses and interpret their results
- Design properly machine learning analysis pipelines and avoid common pitfalls
- Complete a short research project using machine learning

### BIOF 518

Theoretical and Applied Bioinformatics I

The objective of this course is to give students an introduction into the theory and practice of a wide range of bioinformatic techniques and applications, enabling them to use these tools in their own research. This course will be divided into five modules: statistical approaches in sequence analysis; phylogenetic analysis of nucleotide and protein sequences; acquisition and analysis of sequence datasets, including EST and RNA-seq data; analysis of genomic datasets from an evolutionary perspective; and, prediction of protein secondary structure. Two or three of the five sessions in each module will be divided roughly 60 percent theoretical lecture and 40 percent learning to use relevant computational tools. The final session of each module will be split between a discussion of computational tools, a journal club, and a discussion of work on a project assigned for each module. By the end of the course, students should be able to acquire many types of sequence data, identify orthologous and paralogous genes, predict domains and motifs, identify alternative splicing, analyze genomic/protein alignments, and make a prediction of secondary protein structure from primary sequence.

#### Credits

2#### Prerequisites

Solid understanding of biology, computer science and mathematics.

#### Learning Objectives:

- Introduce the theory and practice of a wide range of bioinformatic techniques and applications, enabling students to use these tools in their own research
- Search database searches using BLAST and hidden Markov models
- Predict gene structure and analyze domains and motifs
- Conduct phylogenetic analysis of nucleotide and protein sequences and identify orthologous and paralogous genes
- Analyze genomic and protein alignments, prediction of secondary protein structure from primary sequence

### BIOF 519

Theoretical and Applied Bioinformatics II

The objective of this course is to give students an introduction into the theory and practice of a wide range of bioinformatic techniques and applications, enabling them to use these tools in their own research. This course will be divided into five modules: statistical approaches in sequence analysis; phylogenetic analysis of nucleotide and protein sequences; acquisition and analysis of sequence datasets, including EST and RNA-seq data; analysis of genomic datasets from an evolutionary perspective; and, prediction of protein secondary structure. Two or three of the five sessions in each module will be divided roughly 60 percent theoretical lecture and 40 percent learning to use relevant computational tools. The final session of each module will be split between a discussion of computational tools, a journal club, and a discussion of work on a project assigned for each module. By the end of the course, students should be able to acquire many types of sequence data, identify orthologous and paralogous genes, predict domains and motifs, identify alternative splicing, analyze genomic/protein alignments, and make a prediction of secondary protein structure from primary sequence.

This is the second part of a two-part course. The completion of the first part (prerequisite) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Solid understanding of biology, computer science and mathematics.

#### Learning Objectives:

- Introduce the theory and practice of a wide range of bioinformatic techniques and applications, enabling students to use these tools in their own research
- Search database searches using BLAST and hidden Markov models
- Predict gene structure and analyze domains and motifs
- Conduct phylogenetic analysis of nucleotide and protein sequences and identify orthologous and paralogous genes
- Analyze genomic and protein alignments, prediction of secondary protein structure from primary sequence

### BIOF 521

Bioinformatics for Analysis of Data Generated by Next Generation Sequencing

In this course, students will learn to analyze data generated by a variety of sequencing techniques (such as DNAseq, RNAseq and CHIP-seq) particularly in relation to biomedical applications (such as analysis of gene expression and identification of medically relevant sequence variation). While recorded lectures and readings will provide necessary background, the course emphasizes hands-on, self-paced lessons featuring real-world data sets to give the learner experience with all major steps of sequencing analyses, from filtering of raw data to creating polished figures. As the course progresses, students will work on a term project in which they design a sequencing project based on their own research interests. To make this course accessible to all students, we will focus on the use of publicly available resources, such as the NCBI SRA and the Galaxy platform, that can be accessed from anywhere.

#### Credits

2#### Learning Objectives:

Students in the course will:

- Compare and contrast a variety of modern sequencing techniques and their applications.
- Utilize and compare several platforms for the analysis of sequencing data.
- Carry out bioinformatics analyses on biomedically relevant sequencing data sets.
- Interpret the results of these analyses by generating figures and written summaries.
- Develop a sequencing and analysis plan for a project relevant to their own research interests.

### BIOF 540

Gene Expression Analysis

The gene expression programs that instantiate eukaryotic cell states are complex and dynamic, but ultimately essential to understanding development, homeostasis, real-time environmental adaptation and cellular dysregulation. This course will aim to equip you with a broad range of tools for analyzing gene expression and elucidating the regulatory influences affecting it. By the end, students will have an appreciation for the many layers of expression regulation and a familiarity with common methods for analyzing gene expression and its regulation that will enable interpretation of such results in the literature and the ability to choose the right tool for answering their own gene expression-related research questions in the future.

#### Credits

2#### Learning Objectives:

- Develop an understanding of the many layers of regulation influencing gene expression
- Become familiar with common gene expression measurement methods and know how to choose the right one for the job
- Be able to perform differential gene expression analyses, and identify and use gene expression signatures
- Know how to find genomic regulatory elements that may influence a gene’s expression
- Appreciate gene expression in the context of functional pathways and dynamic gene regulatory networks/programs

### MATH 127

Elementary Calculus I, part 1

This is a first course in calculus and is aimed at students of diverse backgrounds who have previously not taken any formal course on the subject. The course includes a brief review of pre-calculus topics, including functions and algebra, and then moves on to computations using infinity and beyond: infinitesimal quantities, differentials, infinite sequences, and whether it is possible to divide by zero. Scientific applications and achievements will motivate the exploration of the essential single-variable calculus concepts of limits, derivatives, and integrals.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

A pre-calculus course (including online) is recommended, but not required. Knowledge of trigonometry, basic algebra, and graphing is required.

#### Learning Objectives:

- Understand the concept of functions, their limits, and continuity
- Become familiar with differentiation and integration techniques of single-variable functions
- Introduce applications of calculus to scientific research

### MATH 128

Elementary Calculus I, part 2

This is a first course in calculus and is aimed at students of diverse backgrounds who have previously not taken any formal course on the subject. The course includes a brief review of pre-calculus topics, including functions and algebra, and then moves on to computations using infinity and beyond: infinitesimal quantities, differentials, infinite sequences, and whether it is possible to divide by zero. Scientific applications and achievements will motivate the exploration of the essential single-variable calculus concepts of limits, derivatives, and integrals.

This is the second part of a two-part course. The completion of the first part (MATH 127) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. A pre-calculus course (including online) is recommended, but not required. Knowledge of trigonometry, basic algebra, and graphing is required.

#### Learning Objectives:

- Understand the concept of functions, their limits, and continuity
- Become familiar with differentiation and integration techniques of single-variable functions
- Introduce applications of calculus to scientific research

### MATH 129

Elementary Calculus II, part 1

This course is a continuation of MATH 127 and is focused on multivariable and vector calculus. It covers calculus of curves in space, vector functions, functions of more than one variable, and introduces vector calculus. Applications of this more general descriptions of calculus to scientific research will also be presented.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand how to describe curves in space and apply calculus parametric functions
- Understand functions of more than one variable, partial derivatives and multiple integrals
- Become acquainted with vector calculus

### MATH 130

Elementary Calculus II, part 2

This course is a continuation of MATH 127 and is focused on multivariable and vector calculus. It covers calculus of curves in space, vector functions, functions of more than one variable, and introduces vector calculus. Applications of this more general descriptions of calculus to scientific research will also be presented.

This is the second part of a two-part course. The completion of the first part (MATH 129) is required before taking the second part. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand how to describe curves in space and apply calculus parametric functions
- Understand functions of more than one variable, partial derivatives and multiple integrals
- Become acquainted with vector calculus

### MATH 215

Introduction to Linear Algebra With Applications in Statistics, part 1

This is a first course in linear algebra, aimed at students with diverse backgrounds. It covers the content of a standard textbook: linear systems, vectors and matrices, dimensions and bases of vector spaces, eigenvalues and eigenvectors, singular value decomposition. It is also dedicated to explain applications of these linear algebra concepts in classic analysis methods as well as state-of-the-art statistical inference and machine learning approaches -- in this applications portion of the course we will strive to tailor the content to the interests and research needs of the students.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

One semester of analytic geometry or calculus is recommended, but not required. Basic knowledge of vectors, cartesian coordinates, and algebra is required.

#### Learning Objectives:

- Understand systems linear equations and their matrix representation
- Learn the concept of vector spaces, subspaces, and linear dependence
- Learn spectral methods for analyzing matrices
- Understand statistical methods based on linear models

### MATH 216

Introduction to Linear Algebra With Applications in Statistics, part 2

This is a first course in linear algebra, aimed at students with diverse backgrounds. It covers the content of a standard textbook: linear systems, vectors and matrices, dimensions and bases of vector spaces, eigenvalues and eigenvectors, singular value decomposition. It is also dedicated to explain applications of these linear algebra concepts in classic analysis methods as well as state-of-the-art statistical inference and machine learning approaches -- in this applications portion of the course we will strive to tailor the content to the interests and research needs of the students.

This is the second part of a two-part course. The completion of the first part (MATH 215) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. One semester of analytic geometry or calculus is recommended, but not required. Basic knowledge of vectors, cartesian coordinates, and algebra is required.

#### Learning Objectives:

- Understand systems linear equations and their matrix representation
- Learn the concept of vector spaces, subspaces, and linear dependence
- Learn spectral methods for analyzing matrices
- Understand statistical methods based on linear models

### STAT 201

Experimental Statistics I, part 1

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 202

Experimental Statistics I, part 2

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the second part of a two-part course. The completion of the first part (prerequisite) is required before taking the second part. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 203

Experimental Statistics II, part 1

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 204

Experimental Statistics II, part 2

This is the second part of a two-part course. The completion of the first part (STAT 203) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 321

Methodology in Clinical Trials

The objective of this course is to learn the concepts and methodology used in the design and conduct of randomized clinical trials. Topics to be covered will include the description of main types of trial designs, principles of randomization and stratification, issues in protocol development (defining objectives and endpoints, blinding, choice of control), recruitment and retention, data collection and quality control issues, monitoring, and analyses of trials reports. Textbook material will be frequently supplemented by material from the literature. Guest lecturers will give lectures on power and sample size calculations, life table analysis, quality of life and cost evaluation. Examples from the cardiovascular, pulmonary, and cancer areas will be used when appropriate. The course is intended for biomedical researchers desiring exposure to the clinical-trial field. In order to run this course, minimum 10 students need to register.

#### Credits

3#### Learning Objectives:

- Acquire a fundamental understanding of methodological principles and concepts in clinical trials
- Describe essential elements of clinical trials and use this knowledge to contribute to the successful conduct of a clinical trial
- Read critically clinical trials literature

### STAT 325

Epidemiologic Research Methods

The objective of this course is to provide a deeper understanding of epidemiologic research methodology that can be used to interpret critically the results of epidemiologic research. This understanding will result from investigating conceptual models for study designs, disease frequency, measures of association and impact, imprecision, bias, and effect modification. The course will emphasize the interpretation of research, even when the design or execution of the respective research is less than ideal.

#### Credits

3#### Prerequisites

STAT 200 or STAT 500 and STAT 317.

#### Learning Objectives:

- Be able to distinguish design options in the conduct of epidemiologic research
- Learn about choices for measures of disease frequency, association, and impact
- Understand the origin of selection, information, and confounding biases, and its effect on research results
- Know the origin of imprecision and its effect on research results
- Recognize the origin of effect modification and its effect on research results

### STAT 330

Introduction to SAS

The course will cover the fundamentals of the SAS program and its variables, creating data, importing data (from text and Excel files), exporting data (to text, pdf, and Microsoft-related formats), manipulating data, and providing descriptive statistics. Students will have the opportunity to practice in class, using sample datasets. Homework and project assignments will be provided as well.

#### Credits

2#### Prerequisites

Basic understanding of Microsoft Excel; prior programming experience and basic knowledge of statistics (i.e. mean vs. median vs. mode) would be beneficial, but is not required.

#### Learning Objectives:

- Recognize different types of raw data and learn how to import them into SAS
- Understand different types of variables as well as how to manipulate and convert between them
- Understand how to set up and conduct merging and transposing of data tables
- Obtain descriptive statistics such as mean, median, min, and max
- Generate reports and output reports into a variety of file types.

### STAT 430

Advanced SAS

The course will cover advanced SAS coding concepts such as the use of SAS Macro, SAS SQL, as well as a combination of both. The course will also introduce students to SAS STAT coding for common statistical tests (such as t-test, ANOVA, linear regression, and others). Students will have the opportunity to practice in class, using sample datasets. Homework and project assignments will be provided as well.

#### Credits

2#### Prerequisites

STAT 330 Introduction to SAS or equivalent at another college/university.

#### Learning Objectives:

- Understand the principles of Macro variables and Macro functions
- Become proficient with writing Macro coding for new programs and adding Macro coding to existing programs
- Understand how to create tables using SAS SQL with a variety of conditions
- Combine knowledge of STAT330 concepts with SAS SQL and SAS Macro to solve complex data issues
- Use SAS STAT to perform statistical tests (t-test, ANOVA, correlation, linear regression, Chi-Squared, logistic regression)

### STAT 500

Statistics for Biomedical Scientists I, part 1

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. The second semester expands on the material covered in the first semester.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Learning Objectives:

- Understand the role of chance in biomedical research
- Become knowledgeable about processes of estimation and statistical inference
- Learn about the statistical methods most often used in biomedical research
- Select appropriate statistical approach to analyze a set of biomedical research data
- Use Excel to analyze biomedical research data

### STAT 501

Statistics for Biomedical Scientists I, part 2

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. The second semester expands on the material covered in the first semester.

This is the second part of a two-part course. The completion of the first part (STAT 500) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand the role of chance in biomedical research
- Become knowledgeable about processes of estimation and statistical inference
- Learn about the statistical methods most often used in biomedical research
- Select appropriate statistical approach to analyze a set of biomedical research data
- Use Excel to analyze biomedical research data

### STAT 502

Statistics for Biomedical Scientists II, part 1

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. Those who will be routinely engaged in computing statistical procedures should consider STAT 200.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Learn the statistical aspects of processes planning and execution of biomedical research
- Know the assumptions of statistical methods, how to evaluate them, and how to respond to concerns
- Learn more complicated statistical methods than those presented in STAT 500 I
- Be able to build multivariable models and learn how they contribute to causal inference

### STAT 503

Statistics for Biomedical Scientists II, part 2

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. Those who will be routinely engaged in computing statistical procedures should consider STAT 200.

This is the second part of a two-part course. The completion of the first part (STAT 502) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Learn the statistical aspects of processes planning and execution of biomedical research
- Know the assumptions of statistical methods, how to evaluate them, and how to respond to concerns
- Learn more complicated statistical methods than those presented in STAT 500 I
- Be able to build multivariable models and learn how they contribute to causal inference

### STAT 510

Statistics for Healthcare Providers

This seminar course provides a non-mathematical review of statistical tests commonly encountered in medical research. Readings and class discussions will focus on understanding and interpreting the results from studies, as well as critiquing observational and experimental studies. The target audiences for this course are clinicians, fellows who are participating in journal club/literature review activities, and researchers who want to strengthen skills in interpreting statistical tests.

#### Learning Objectives:

- Describe observational and experimental study designs commonly encountered in biomedical research.
- Discuss applications, strengths, and limitations of various statistical tests commonly seen in the medical literature.
- Interpret point estimates in terms of magnitude, precision, and statistical significance.
- Identify factors that influence sample size and discuss how these factors ultimately influence point estimates.
- Evaluate and critique a study in terms of appropriate design, application of statistical methods, and interpretation of results.

### STAT 515

Statistics for Biomedical Researchers

Statistical analyses are a fundamental component of experimental design in many biomedical research fields. Particularly when working with large, messy data, proper understanding of statistics is essential to perform proper statistical analyses. This course will build on students' existing knowledge of statistics to help them expand their analysis toolkits and will cover topics including modeling, bootstrapping, simulations, imputation, and basic machine learning. Students will attend lectures to gain theoretical understanding of topics before applying concepts through practice problems and projects using the R programming language. Note that students should have basic proficiency in R and simple statistical analyses before enrolling in this course to be successful.

#### Credits

2#### Learning Objectives:

By the end of this course, students should be able to:

- Fit and interpret multiple regression models including interaction and non-linear terms
- Fit and interpret logistic regression models
- Evaluate model fit to perform model selection
- Perform bootstrapping and simulation analyses to quantify statistical confidence
- Fit basic machine learning models
- Understands the strengths and weaknesses of different machine learning models.