### AS BIOF

Advanced Studies in Bioinformatics and Data Science

The FAES Academic Programs at NIH offers a unique Advanced Studies in Bioinformatics and Data Science to serve the quickly evolving needs of today’s biomedical research community. As one of the most dynamic fields intersecting biology and computer science, bioinformatics and its data analysis tools equip life sciences researchers and professionals with highly in-demand skills in the pharmaceutical and biotechnology industries. Courses are offered in the evenings, making it convenient for working professionals and postgraduate Fellows to gain expertise and experience in the theoretical foundations and practical skills required to harvest the wealth of information contained in the vast amount of biological phenomena. The courses have been designed to train today’s biomedical researchers in new methods and techniques in data science and prepare them to translate and analyze the immensity of biological data.

### General Requirements

The program is designed for participants who hold an advanced degree in life sciences or STEM fields.

The Advanced Studies comprises a 14-credit curriculum required and elective courses. Courses are held in the evenings to fit the needs of working professionals and postgraduate fellows.

### Required Courses

BIOF 309 | Introduction to Python

BIOF 518 | Theoretical and Applied Bioinformatics I

BIOF 519 | Theoretical and Applied Bioinformatics II

BIOF 521 | Bioinformatics for Analysis of Next Generation Sequencing

### Electives

BIOF 339 | Practical R

BIOF 395 | Introduction to Text Mining

BIOF 450 | Bioinformatics, Evolutionary Genomics, and Computational Biology

BIOF 475 | Introduction to New Technologies in Data Science

BIOF 501 | Introduction to R: Step-by-Step Guide

BIOF 509 | Applied Machine Learning

STAT 500 I | Statistics for Biomedical Scientists I

STAT500 II | Statistics for Biomedical Scientists II

#### Credits

14#### Learning Objectives:

### Upon completion, students will be able to:

- Learn to use effectively different techniques to analyze biological data from high throughout approaches
- Perform statistical analysis and visualization of biological data
- Apply bioinformatics techniques for analysis of genomic, expression and proteomic data
- Understand the uses and limitations of bioinformatics data analysis tools and technologies
- Learn how the computational methods are used in new applications in basic biology and also how they are translated into the development of new drugs and diagnostic tools

### BIOF 017

Introductory R Boot Camp

#### Credits

-#### Learning Objectives:

- Use RStudio and the tidyverse suite of packages to load and work with datasets
- Describe what makes data “tidy,” why tidy data is useful to work with, and how to make datasets tidy
- Transform data to prepare it for visualization and analysis
- Describe the “Grammar of Graphics” philosophy that underlies the ggplot2 visualization package
- Create visualizations including barplots, scatterplots, line graphs, and more using ggplot2
- Customize all aspects of visualizations
- Save and export visualizations for print, submission to journals, and other applications
- Use visualizations to explore and identify patterns in data Identify and handle missing data Identify covariation in variables
- Build simple linear models to identify relationships between variables

### BIOF 018

Intermediate R Boot Camp

This workshop builds upon the principles of using R for data science by introducing intermediate concepts that will help learners advance their knowledge and use R for more complex tasks. These tasks include working with APIs and packages to access data on remote servers, iterating tasks over datasets, and writing custom functions.

#### Credits

-#### Prerequisites

Or adequate familiarity with R.

#### Learning Objectives:

- Use the apply family to repeat functions over multiple data objects
- Use if/else statements for conditional functions
- Use case_when to vectorize multiple if/else statements
- Use for loops to repeat functions
- Understand how to use tidyverse “verbs” to wrangle data
- Understand how and why to convert data from wide to long format
- Summarize and transform data using tidyverse verbs
- Understand when to create a custom function
- Write custom functions to carry out complex tasks
- Troubleshoot and debug functions

### BIOF 019

Designing Effective Data Visualizations in R

#### Credits

-#### Prerequisites

The above workshop or adequate knowledge of coding in R.

#### Learning Objectives:

- Understand how to use principles of human visual perception to create effective visualizations
- Describe elements of design such as line, shape, value, texture, and space and understand how to effectively use them in visualizations
- Use color to convey meaning, including using color-blind friendly palettes
- Describe the “Grammar of Graphics” philosophy that underlies the ggplot2 visualization package
- Create visualizations including barplots, scatterplots, line graphs, and more using ggplot2
- Customize all aspects of visualizations
- Save and export visualizations for print, submission to journals, and other applications
- Understand how to use the UI and server functions to create Shiny objects
- Create visualizations that change based on user input
- Build simple web apps incorporating visualizations

### BIOF 020

Python for Beginners

#### Credits

-#### Prerequisites

Basic computer skills, including where to quickly find specific directories and files.

#### Learning Objectives:

### BIOF 021

R for Analysis of Text Data

This workshop will provide an introduction to working with text data in R and explore various approaches to analyzing text data. The first session will cover principles for wrangling text data as well as some basic text mining applications. The subsequent two sessions will delve into specific techniques to enable automated analysis of text data.

#### Credits

-#### Prerequisites

Any of the above courses and workshops or basic familiarity with R.

#### Learning Objectives:

- Read text data into R and prepare it for analysis
- Understand and select from various options in preparing text, such as stemming, lemmatization, term frequency weighting, term frequency-inverse document frequency weighting (tf-idf), and tokenization
- Conduct simple text mining to explore content of a text corpus
- Describe how unsupervised approaches can be used to identify clusters of related documents
- Process text data to prepare for unsupervised analysis; o Build, train, and evaluate models for text clustering
- Interpret outputs of clustering algorithms
- Describe how supervised approaches can be used to develop text-based models for multi-class classification
- Process text data to prepare for supervised analysis; o Build, train, and test models for text classification

### BIOF 043

For True BeginRs | Hands-on R Training

R is a free, cross-platform – Windows, Mac, and Linux – programming language, designed specifically to facilitate data management, analysis, and visualization. Boasting vibrant development and support communities, R has become an indispensable tool for bioinformaticians, statisticians, and data scientists. Created with true beginRs in mind, this training will teach participants the fundamental, transferable skills needed to unleash R’s full potential for producing publication-worthy analyses and visualizations.

#### Credits

-#### Prerequisites

Participants should be comfortable with basic computer skills.

#### Learning Objectives:

- Interfacing with R using RStudio
- Using RStudio’s built-in help function – ? – as well as resources for troubleshooting, including rdocumentation.org, cheat sheets, vignettes, YouTube channels, and stackexchange.com
- Creating project files; Working with the RStudio command line
- Identifying and changing the current working file directory
- Variables – local vs. global – naming conventions, and assignment operators
- Writing their first R script and how to properly document their code via commenting
- Using the ‘$’ accessor function
- The most common data types, including character strings, numerical, integers, and logicals
- How to access data entries using [] and [[ ]]; The most common data structure types, including vectors, lists, factors, data frames, and tibbles
- Package libraries and how to install them; Loading data into R and basic troubleshooting when importing data
- Data management, manipulation, subsetting, piping, and exploration using dplyr
- Creating and exporting highly customizable, publication-quality data visualizations with ggplot2
- Using R to perform statistical analyses, including simple linear regression, χ2 contingency table analysis, t-tests, and analysis of variance

### BIOF 045

Bioinformatic Analysis of Next Generation Sequencing (NGS) Data

Next generation sequencing technologies are producing enormous amount of sequencing data. Analyzing this massive amount of data requires the ability to use the sophisticated tools and techniques.

This workshop introduces participants to bioinformatics tools and methods for analyzing next generation sequencing data, particularly for DNA-seq (Variant analysis), RNA-seq (Transcriptome analysis), ChIP-seq (Transcriptional factor binding analysis), and network-based integration of NGS data.

#### Credits

-### BIOF 052

Artificial Intelligence in Your Lab

Artificial intelligence (AI) in biomedical research has grown exponentially in the past decade. AI can be used to uncover powerful new insights in data that your lab is already collecting. This workshop has two primary components. First, participants will engage in discussions that cover recent advances in artificial intelligence (AI) and how these developments can be used in biomedical research. Topics will include active learning, adversarial learning, Bayesian deep learning, reinforcement learning, semi-supervised learning, self-supervised learning, and transfer learning. These topics will be covered in an integrated manner: the discussions will explore how different facets of AI can interact with each other to generate high-quality results. Second, participants will work with the instructor to design and implement AI project(s). These projects will have direct relevance to the research being done by each participant. This workshop will have 1 day of discussion followed by 1 week of offline work where the participants communicate directly with the instructors about project development.

#### Credits

-### BIOF 074

Advanced Transcriptomics (RNA-Seq) Analysis

This workshop introduces advanced RNA-Seq analysis techniques and tools for detecting snps, fusion genes, allele specific expressions, circular RNAs, viral/bacterial sequence identification, alternative polyadenylation, and transcriptional regulatory network analysis.

#### Credits

-#### Prerequisites

*Experience in next generation sequencing data analysis.*

### BIOF 075

Metagenomics Data Analysis

Metagenomics is gaining importance due to low-cost next generation sequencing technologies. This workshop introduces end-to-end solutions for analyzing metagenomic data, including data-quality analysis, alignment, community profiling, taxonomic comparison, and novel taxa discovery.

#### Credits

-### BIOF 076

Creating Plots, Graphs, and Maps Using R

R is the industry standard for creating specific graphs and plots. This workshop walks participants through creating interactive, static, and shareable plots using popular R packages. The workshop will cover formatting data, loading data, setting parameters, creating images, and saving outputs.

#### Credits

-### BIOF 077

Molecular Modeling and Molecular Dynamics

Predicting the effect of a mutation on the structure and function of a protein is not just for researchers with computer facilities. Users with basic molecular biology background can set up and run intensive computational modeling and dynamics experiments. In this workshop, participants will use open-source tools and techniques to conduct molecular modeling and dynamics experiments.

#### Credits

-### BIOF 079

Variant Analysis

Next generation sequencing technologies have made genotyping a day to day research and diagnostic tool. Genotyping has come all the way from bench-to-bedside. Genetic variants are being used in personalized medicine to identify susceptibility genes, common disease variants and mutations relevant for diagnosis and therapy.

This workshop will cover the use of popular open-source tools and techniques necessary for analyzing variants starting from raw data-quality control. In addition to alignment, variant calling, and annotation, this workshop will walk participants through several advanced variant analysis methods and techniques.

#### Credits

-### BIOF 082

Bioinformatics for Beginners

Bioinformatics (Computational Biology) is a must skill required in every modern biomedical research lab. Installing and configuring a wide variety of computational biology tools is a cumbersome task that requires software engineering skills.

This workshop provides an introduction to basic concepts in using popular tools and techniques for sequence analysis, structure analysis, function prediction, biological database searching, “omics” data analysis, pathway analysis, data visualization, data curation and integration, and scripting basics.

#### Credits

-### BIOF 084

Pharmacometric Dose-Response Analyses in Clinical Trials using R

#### Credits

-### BIOF 085

Intro to Data Science with Python

Scientists generate more data than ever before. It can be daunting to determine how to extract insights from a mountain of data. Data science is a relatively new discipline that combines traditional statistics and analytics with programming to produce novel insights, intelligently automated processes, and data-driven decisions.

This course will equip you with everything you need to complete a basic data science project using Python from beginning to end. Participants will be exposed to a practical, real-world use case, which will be built on throughout the course. Students will use the data from this use case to perform exploratory analysis and build their skills up to advanced analytics.

#### Credits

-### BIOF 087

Programming for Biomedical Researchers

Computer programs are meant to perform repeated, monotonous, fast, reproducible tasks, handling any amount of data. Researchers often come across situations where existing programs don't suit their needs. In the era of BigData, without the ability to quickly put together a program that would solve their problem, researchers face a road block that is not efficiently solvable by a human. This training will walk through participants in writing programs that would help them solve their own problems.

#### Credits

-### BIOF 089

Microbiome Bioinformatics with QIIME2

#### Credits

-### BIOF 090

MATLAB Fundamentals

MATLAB® is a programming platform designed for scientists and engineers. This course provides a comprehensive introduction to the MATLAB® technical computing environment. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course.

#### Credits

-### BIOF 091

Image Processing and Computer Vision with MATLAB

This course provides hands-on experience with performing image analysis. Examples and exercises demonstrate the use of appropriate MATLAB® and Image Processing Toolbox™ functionality throughout the analysis process. The course also provides hands-on experience with performing computer vision tasks. Examples and exercises demonstrate the use of appropriate MATLAB® and Computer Vision System Toolbox™ functionality.

#### Credits

-#### Prerequisites

*BIOF 090 or equivalent MATLAB experience.*

### BIOF 096

Rosetta for Molecular Modeling and Design

Rosetta is a set of tools used for protein structure prediction and designing. Rosetta is capable of predicting structures either with or without prior knowledge. Apart from large and small molecular docking, Rosetta is also popular for designing novel proteins and peptides.

The course will go over the capabilities of Rosetta, followed by hands-on walk-through for predicting protein structures, docking and protein designing.

#### Credits

-### BIOF 097

Practical Scientific Statistics

As big data becomes the norm and experiments continue to increase in scale, proper understanding and use of statistics is becoming increasingly important for scientists in every field. While experimental researchers are expert in concepts related to their respective fields and receive extensive scientific education, statistical training is relatively lacking. As a result, experimental researchers may feel overwhelmed or uncertain about how to correctly use statistics to quantify their experimental results and how to properly interpret the results of those statistical tests. Unfortunately, this knowledge gap can result in both reduced understanding of reported results in scientific publications as well as superficial or potentially inaccurate reported statistics. This course serves as a practical, hands-on workshop to close the knowledge gap and help experimental researchers learn how to choose a statistical test for their data, how to perform those tests, and how to interpret the results. The workshop starts by establishing a solid foundation in basic statistical theory before advancing to practical applications of statistical tests on real data.

#### Credits

-#### Prerequisites

Attendees should have access to and basic knowledge of Excel. Analyses will also be demonstrated in SPSS and R, but no formal programming skills are required.

#### Learning Objectives:

As big data becomes the norm and experiments continue to increase in scale, proper understanding and use of statistics is becoming increasingly important for scientists in every field. While experimental researchers are expert in concepts related to their respective fields and receive extensive scientific education, statistical training is relatively lacking. As a result, experimental researchers may feel overwhelmed or uncertain about how to correctly use statistics to quantify their experimental results and how to properly interpret the results of those statistical tests. Unfortunately, this knowledge gap can result in both reduced understanding of reported results in scientific publications as well as superficial or potentially inaccurate reported statistics. This course serves as a practical, hands-on workshop to close the knowledge gap and help experimental researchers learn how to choose a statistical test for their data, how to perform those tests, and how to interpret the results. The workshop starts by establishing a solid foundation in basic statistical theory before advancing to practical applications of statistical tests on real data.

### BIOF 098

Statistical Analysis Using R

R is a popular platform to perform statistical analysis. This workshop introduces how to perform basic statistical analysis using the R platform. First, participants will learn to use the R and RStudio software platforms. Followed by an introduction to basic statistical concepts, participants will do hands-on exercises to perform basic statistical analysis using the R platform.

#### Credits

-### BIOF 101

Introductory Coding Skills

Computer programming has become an essential part of many different careers, and programming skills are extremely advantageous when applying for educational or employment opportunities. This course will teach students the needed programming skills for success in academic and industrial settings. Students will first learn the basic components of programming, such as variables, conditionals, loops, object oriented programming and simple data structures. Students will also be taught how to properly prepare a computer for efficient and effective programming. Students will then apply basic programming principles by using the Python programming language to complete assignments that reflect real-word tasks. Finally, students will put together a plan for further programming education.

#### Credits

0.5### BIOF 309

Introduction to Python

Python is a free, open-source and powerful programming language that is easy to learn. This course is intended for non- programmers who want to learn how to write programs that expand the breadth and depth of their daily research. Most elementary concepts in modern software engineering will be covered, including basic syntax, reading from and writing texts files, debugging python programs, regular expressions, and creating reusable code modules that are distributable to peers. The course will also focus on potential applications of Python to bioinformatics, including sequence analysis, data visualization and data analysis. Students will also learn to use the Jupyter Notebook and the PyCharm integrated development environment (IDE), which are available at no cost.

**INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS.**

*Continuum Analytics Installer Anaconda (V3) will be utilized to install Python and the necessary packages. *

#### Credits

2#### Learning Objectives:

- Gain basic understanding of elementary concepts ubiquitous in modern software engineering: regular expressions; reading from and writing to text files; and, recursion
- Apply Python to important functions in bioinformatics such as sequence analysis, data analysis and data visualization

- Learn how to obtain and rework an existing script to meet current needs
- Gain experience in two programming environments (Jupyter Notebook and PyCharm IDE)

### BIOF 339

Practical R

The goal of this course is to introduce biomedical research scientists to R as an analysis platform rather than a programming language. Throughout the course, emphasis will be placed on example-driven learning. Topics to be covered include: installation of R and R packages; command line R; R data types; loading data in R; manipulating data; exploring data through visualization; statistical tests; correcting for multiple comparisons; building models; and, generating publication-quality graphics. No prior programming experience is required.

**INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS. **

#### Credits

2#### Learning Objectives:

- Run R GUI and make use of command line features, including command history and help pages
- Find and make use of the extensive libraries (R add-ons) available for analyzing biological and other forms of data
- Load, manipulate, and combine data to make it amenable to further analyses
- Visualize data with extensive graphics capabilities of R (including ggplot)
- Use appropriate statistical tests on data within R that will conform to standards expected in scientific journals

### BIOF 395

Introduction to Text Mining

Between Electronic Medical Records and Electronic Health Records, PubMed, and collections of biomedical grant applications, there exist large quantities of medical information stored in databases waiting to be explored. Besides tables of numbers, medical records also contain a great amount of free-text paragraphs that are comprehensible to human readers but challenging to computers. Text mining is an interdisciplinary area that primarily combines advances in Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML) to help the computers understand human written language and thus extract medical and clinical information from free-text records. This class aims to introduce fundamental subjects in text mining such as tokenization, named entity recognition (NER), grammars, parsing, relation extraction, and document classification. The class is oriented towards hands-on experience with Python and Natural Language Toolkit (NLTK).

#### Credits

2#### Prerequisites

Prior exposure to programming and Python is encouraged but not required to attend this class

#### Learning Objectives:

- Learn basic programming in Python
- Master fundamental building blocks of Natural Language Processing
- Acquire hands-on experience with NLTK, a Python toolkit for NLP
- Gain an introduction to statistical models of Machine Learning applied to NLP and IR

### BIOF 399

Deep Learning for Healthcare Image Analysis

In this course, students will learn how to apply Convolutional Neural Networks (CNNs) to MRI scans to perform a variety of medical tasks and calculations. Upon completion of this course, students will be able to apply CNNs to MRI scans to conduct a variety of medical tasks. INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS (Mac, Linux or Windows).

#### Credits

2#### Prerequisites

Previous programming experience is not required, but is recommended.

#### Learning Objectives:

- Understand how to use popular image classification neural networks for semantic segmentation
- Use the popular R programming language with deep learning framework MXNet to create a powerful GPU accelerated convolution neural network (CNN) solution for quantitative medical image analysis
- Use deep-learning techniques to predict genomic biomarkers from medical image analysis
- Explore other areas of innovation and research
- Get hands-on guidance to try many different deep-learning frameworks

### BIOF 439

Data Visualization with R

This course will demonstrate and practice the use of R in creating and presenting data visualizations. After a short introduction to R tools, especially the tidy verse packages, the course will cover principles for data visualization, examples of good and bad visualizations, and the use of ggplot2 to create static publication-quality graphs. Students will also have the chance to learn about modern web-based interactive graphics using the html widgets packages as well as dynamic graphics and dashboards that can be created using flex dashboard and Shiny. The course will explore ways in which bioinformatics data can be presented using static and dynamic visualizations. Finally, RMarkdown and other packages will be used to develop webpages for presenting data visualizations as self-explanatory and possibly interactive storyboards.

#### Credits

1#### Prerequisites

none, however, BIOF 339 Practical R or equivalent introductory course to R would be useful.

#### Learning Objectives:

- Understand principles of good data visualization to avoid poor or inappropriate data visualization
- Gain knowledge of appropriate use of color, symbols, and small multiples
- Learn about static and dynamic data visualizations, using the web as a presentation medium

### BIOF 440

Data Visualization with Python

This course will demonstrate and practice the use of Python in creating and presenting data visualizations.

#### Credits

1#### Prerequisites

None, however, the above course or equivalent introductory course to Python would be useful.

#### Learning Objectives:

- Understand principles of good data visualization to avoid poor or inappropriate data visualization
- Gain knowledge of appropriate use of color, symbols, and small multiples
- Learn about static and dynamic data visualizations, using the web as a presentation medium

### BIOF 450

Bioinformatics, Computational Biology, and Evolutionary Genomics

Enormously large series of complex and chaotic events have shaped the genomes of eukaryotes, prokaryotes, and viruses.This course will address cutting-edge approaches to the computational investigation of these events, with an eye toward developments in translational systems biology. The course will begin by presenting the fundamentals of evolutionary genomics, including basic properties of genomes and comparative genomics, population genetics, and sequence-structure-function relationships. Experimental design and biological project integration will be a major theme of the course. Specific lectures on statistical analysis, similarity searches, Next Generation Sequencing, epigenomics, and other specialized topics will supplement those given in the earlier part of the course.

#### Credits

3#### Learning Objectives:

- Perform statistical analysis and display data
- Learn applications of evolutionary genomics, including cancer genomics, evolution of immune systems, and analysis of brain developmental problems
- Apply the skills acquired to complete a computational biology project

### BIOF 475

Introduction to New Technologies in Data Science

#### Credits

2#### Prerequisites

Previous programming experience is not required, but is recommended.

#### Learning Objectives:

- Technical Side:
- Gain basic understanding of elementary concepts common in Data Science analytics, such as distributed file system, NoSQL databases, job scheduling, and more
- Gain experience with integrating Data Science components into a Data Science platform, loading data, querying, and extracting value
- Gain hands-on experience connecting to and modifying installations and scripts Be able to rework an existing script to meet the students’ needs
- Data Side:
- Learn predictive modeling: find correlations; supervised segmentation; visualization segmentation; probability estimation
- Fit a model to data and avoid overfitting: choose goals for data; loss functions; cross validation; tree pruning; regularization
- Find natural clusters and neighbors—nearest neighbor, clustering methods, distance similarity
- Pivot from thinking about data to solving a problem
- Complete a short research project using Data Science techniques and technologies

### BIOF 501

Introduction to R: Step-by-Step Guide

R is a free statistics software that is becoming increasingly popular and important for data analysis in biology. During the course, students will first learn how to handle the R programming environment. Next, students will learn how to simulate data for analysis, while the background for R programming will be provided in accompanying lectures. At the end of the course, students will become familiar with simple R programming, which they will be then able to apply for their own data analysis.

#### Credits

2#### Learning Objectives:

- Introduce R programming environments for scientific analysis
- Understand the concepts of basic data structures, such as Vectors, Matrices, Arrays, List, and Data Frames
- Introduce data handling and visualization in R
- Understand the concepts of Packages and simple R programming

### BIOF 509

Applied Machine Learning

Machine learning is a computational field that consists of techniques allowing computers to learn from data and make data-driven predictions or decisions. The ability to effectively implement machine learning approaches is a crucial component of data analysis. BIOF 509 provides a comprehensive overview of machine learning concepts, project design, and implementation. The course will give a conceptual overview of the most popular machine learning algorithms with examples of how/when to apply them to datasets. Algorithms that will be covered include: support vector machines, decision trees, random forests, multiple clustering approaches, and deep learning. Best practices in designing machine learning projects will also be emphasized, and this course will introduce strategies to avoid common pitfalls and to accurately interpret results. To reinforce key concepts, this course contains 4 written homework assignments and a research project. Through the homework assignments, students will (i) study theory behind common machine learning algorithms and (ii) explore examples of successful machine learning projects in biomedical research. For the research project, students will use python machine learning packages (Scikit-Learn, Tensorflow, Pytorch) to design a multistep pipeline to analyze a dataset of their choice. Students will also be expected to use Github to demonstrate proper documentation and version control practices when completing the project.

#### Credits

2#### Prerequisites

Students should have previously completed BIOF 309 Introduction to Python or have equivalent experience. While the course will include a brief Python refresher, the emphasis of the course will be on applying machine learning.

#### Learning Objectives:

- Choose appropriate machine learning techniques for data analyses and interpret their results
- Design properly machine learning analysis pipelines and avoid common pitfalls
- Complete a short research project using machine learning

### BIOF 510

Advanced Applications of Artificial Intelligence

In the past decade, big data has become increasingly prominent in many fields, including healthcare and biomedical research. These increasingly large datasets pose a unique challenge to researchers. In these cases, nuanced approaches to machine learning are often necessary to extract important information. BIOF 510, a continuation of BIOF 509, will cover advanced applications of popular machine learning algorithms, including support vector machines, random forests, and neural networks. Neural network algorithms that will be covered include multi-layer perceptrons, convolutional neural networks, recurrent neural networks, probabilistic neural networks, and autoencoders. To reinforce key concepts, this course contains 4 written homework assignments and a research project. Through the homework assignments, students will (i) study theory behind common machine learning algorithms and (ii) explore examples of successful machine learning projects in biomedical research. For the research project, students will use python machine learning packages (Scikit-Learn, Tensorflow, Pytorch) to design a multistep pipeline to analyze a dataset of their choice. Students will also be expected to use Github to demonstrate proper documentation and version control practices when completing the project.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Choose appropriate machine learning techniques for data analyses and interpret their results
- Design properly machine learning analysis pipelines and avoid common pitfalls
- Complete a short research project using machine learning

### BIOF 518

Theoretical and Applied Bioinformatics I

The objective of this course is to give students an introduction into the theory and practice of a wide range of bioinformatic techniques and applications, enabling them to use these tools in their own research. This course will be divided into five modules: statistical approaches in sequence analysis; phylogenetic analysis of nucleotide and protein sequences; acquisition and analysis of sequence datasets, including EST and RNA-seq data; analysis of genomic datasets from an evolutionary perspective; and, prediction of protein secondary structure. Two or three of the five sessions in each module will be divided roughly 60 percent theoretical lecture and 40 percent learning to use relevant computational tools. The final session of each module will be split between a discussion of computational tools, a journal club, and a discussion of work on a project assigned for each module. By the end of the course, students should be able to acquire many types of sequence data, identify orthologous and paralogous genes, predict domains and motifs, identify alternative splicing, analyze genomic/protein alignments, and make a prediction of secondary protein structure from primary sequence.

#### Credits

2#### Prerequisites

Solid understanding of biology, computer science and mathematics.

#### Learning Objectives:

- Introduce the theory and practice of a wide range of bioinformatic techniques and applications, enabling students to use these tools in their own research
- Search database searches using BLAST and hidden Markov models
- Predict gene structure and analyze domains and motifs
- Conduct phylogenetic analysis of nucleotide and protein sequences and identify orthologous and paralogous genes
- Analyze genomic and protein alignments, prediction of secondary protein structure from primary sequence

### BIOF 519

Theoretical and Applied Bioinformatics II

The objective of this course is to give students an introduction into the theory and practice of a wide range of bioinformatic techniques and applications, enabling them to use these tools in their own research. This course will be divided into five modules: statistical approaches in sequence analysis; phylogenetic analysis of nucleotide and protein sequences; acquisition and analysis of sequence datasets, including EST and RNA-seq data; analysis of genomic datasets from an evolutionary perspective; and, prediction of protein secondary structure. Two or three of the five sessions in each module will be divided roughly 60 percent theoretical lecture and 40 percent learning to use relevant computational tools. The final session of each module will be split between a discussion of computational tools, a journal club, and a discussion of work on a project assigned for each module. By the end of the course, students should be able to acquire many types of sequence data, identify orthologous and paralogous genes, predict domains and motifs, identify alternative splicing, analyze genomic/protein alignments, and make a prediction of secondary protein structure from primary sequence.

This is the second part of a two-part course. The completion of the first part (prerequisite) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Solid understanding of biology, computer science and mathematics.

#### Learning Objectives:

- Introduce the theory and practice of a wide range of bioinformatic techniques and applications, enabling students to use these tools in their own research
- Search database searches using BLAST and hidden Markov models
- Predict gene structure and analyze domains and motifs
- Conduct phylogenetic analysis of nucleotide and protein sequences and identify orthologous and paralogous genes
- Analyze genomic and protein alignments, prediction of secondary protein structure from primary sequence

### BIOF 521

Bioinformatics for Analysis of Data Generated by Next Generation Sequencing

#### Credits

3#### Learning Objectives:

- Learn to analyze Next Generation Sequencing data, including DNA-seq, RNA-seq, and CHIP-seq in Graphical User Interface, using Galaxy or in command line
- Write short scripts to do this analysis using command line resources

### MATH 127

Elementary Calculus I, part 1

This is a first course in calculus and is aimed at students of diverse backgrounds who have previously not taken any formal course on the subject. The course includes a brief review of pre-calculus topics, including functions and algebra, and then moves on to computations using infinity and beyond: infinitesimal quantities, differentials, infinite sequences, and whether it is possible to divide by zero. Scientific applications and achievements will motivate the exploration of the essential single-variable calculus concepts of limits, derivatives, and integrals.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

A pre-calculus course (including online) is recommended, but not required. Knowledge of trigonometry, basic algebra, and graphing is required.

#### Learning Objectives:

- Understand the concept of functions, their limits, and continuity
- Become familiar with differentiation and integration techniques of single-variable functions
- Introduce applications of calculus to scientific research

### MATH 128

Elementary Calculus I, part 2

This is a first course in calculus and is aimed at students of diverse backgrounds who have previously not taken any formal course on the subject. The course includes a brief review of pre-calculus topics, including functions and algebra, and then moves on to computations using infinity and beyond: infinitesimal quantities, differentials, infinite sequences, and whether it is possible to divide by zero. Scientific applications and achievements will motivate the exploration of the essential single-variable calculus concepts of limits, derivatives, and integrals.

This is the second part of a two-part course. The completion of the first part (MATH 127) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. A pre-calculus course (including online) is recommended, but not required. Knowledge of trigonometry, basic algebra, and graphing is required.

#### Learning Objectives:

- Understand the concept of functions, their limits, and continuity
- Become familiar with differentiation and integration techniques of single-variable functions
- Introduce applications of calculus to scientific research

### MATH 129

Elementary Calculus II, part 1

This course is a continuation of MATH 127 and is focused on multivariable and vector calculus. It covers calculus of curves in space, vector functions, functions of more than one variable, and introduces vector calculus. Applications of this more general descriptions of calculus to scientific research will also be presented.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand how to describe curves in space and apply calculus parametric functions
- Understand functions of more than one variable, partial derivatives and multiple integrals
- Become acquainted with vector calculus

### MATH 130

Elementary Calculus II, part 2

This course is a continuation of MATH 127 and is focused on multivariable and vector calculus. It covers calculus of curves in space, vector functions, functions of more than one variable, and introduces vector calculus. Applications of this more general descriptions of calculus to scientific research will also be presented.

This is the second part of a two-part course. The completion of the first part (MATH 129) is required before taking the second part. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand how to describe curves in space and apply calculus parametric functions
- Understand functions of more than one variable, partial derivatives and multiple integrals
- Become acquainted with vector calculus

### MATH 215

Introduction to Linear Algebra With Applications in Statistics, part 1

This is a first course in linear algebra, aimed at students with diverse backgrounds. It covers the content of a standard textbook: linear systems, vectors and matrices, dimensions and bases of vector spaces, eigenvalues and eigenvectors, singular value decomposition. It is also dedicated to explain applications of these linear algebra concepts in classic analysis methods as well as state-of-the-art statistical inference and machine learning approaches -- in this applications portion of the course we will strive to tailor the content to the interests and research needs of the students.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

One semester of analytic geometry or calculus is recommended, but not required. Basic knowledge of vectors, cartesian coordinates, and algebra is required.

#### Learning Objectives:

- Understand systems linear equations and their matrix representation
- Learn the concept of vector spaces, subspaces, and linear dependence
- Learn spectral methods for analyzing matrices
- Understand statistical methods based on linear models

### MATH 216

Introduction to Linear Algebra With Applications in Statistics, part 2

This is a first course in linear algebra, aimed at students with diverse backgrounds. It covers the content of a standard textbook: linear systems, vectors and matrices, dimensions and bases of vector spaces, eigenvalues and eigenvectors, singular value decomposition. It is also dedicated to explain applications of these linear algebra concepts in classic analysis methods as well as state-of-the-art statistical inference and machine learning approaches -- in this applications portion of the course we will strive to tailor the content to the interests and research needs of the students.

This is the second part of a two-part course. The completion of the first part (MATH 215) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. One semester of analytic geometry or calculus is recommended, but not required. Basic knowledge of vectors, cartesian coordinates, and algebra is required.

#### Learning Objectives:

- Understand systems linear equations and their matrix representation
- Learn the concept of vector spaces, subspaces, and linear dependence
- Learn spectral methods for analyzing matrices
- Understand statistical methods based on linear models

### STAT 201

Experimental Statistics I, part 1

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the first part of a two-part course. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 202

Experimental Statistics I, part 2

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the second part of a two-part course. The completion of the first part (prerequisite) is required before taking the second part. Registration is required separately for each part of the course.#### Credits

2#### Prerequisites

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 203

Experimental Statistics II, part 1

This course introduces statistical concepts and essential techniques that are frequently used in biomedical data analysis. The emphasis will be equally divided between solid understanding of basic principles and their applications. R software is introduced and used for demonstration throughout the course. Topics covered in the second semester: test of statistical hypothesis; one- and two-sample tests; power and sample size calculation; analysis of variance (ANOVA); nonparametric tests; linear regression; analysis of categorical data; permutation and bootstrap; data analysis using R.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 204

Experimental Statistics II, part 2

This is the second part of a two-part course. The completion of the first part (STAT 203) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor. Working knowledge of Algebra II and one semester of Calculus is preferred.

#### Learning Objectives:

- Understand basic principles of probability and statistics
- Use appropriate statistical tools to analyze data for research

### STAT 321

Methodology in Clinical Trials

The objective of this course is to learn the concepts and methodology used in the design and conduct of randomized clinical trials. Topics to be covered will include the description of main types of trial designs, principles of randomization and stratification, issues in protocol development (defining objectives and endpoints, blinding, choice of control), recruitment and retention, data collection and quality control issues, monitoring, and analyses of trials reports. Textbook material will be frequently supplemented by material from the literature. Guest lecturers will give lectures on power and sample size calculations, life table analysis, quality of life and cost evaluation. Examples from the cardiovascular, pulmonary, and cancer areas will be used when appropriate. The course is intended for biomedical researchers desiring exposure to the clinical-trial field. In order to run this course, minimum 10 students need to register.

#### Credits

3#### Learning Objectives:

- Acquire a fundamental understanding of methodological principles and concepts in clinical trials
- Describe essential elements of clinical trials and use this knowledge to contribute to the successful conduct of a clinical trial
- Read critically clinical trials literature

### STAT 325

Epidemiologic Research Methods

The objective of this course is to provide a deeper understanding of epidemiologic research methodology that can be used to interpret critically the results of epidemiologic research. This understanding will result from investigating conceptual models for study designs, disease frequency, measures of association and impact, imprecision, bias, and effect modification. The course will emphasize the interpretation of research, even when the design or execution of the respective research is less than ideal.

#### Credits

3#### Prerequisites

STAT 200 or STAT 500 and STAT 317.

#### Learning Objectives:

- Be able to distinguish design options in the conduct of epidemiologic research
- Learn about choices for measures of disease frequency, association, and impact
- Understand the origin of selection, information, and confounding biases, and its effect on research results
- Know the origin of imprecision and its effect on research results
- Recognize the origin of effect modification and its effect on research results

### STAT 330

Introduction to SAS

The course will cover the fundamentals of the SAS program and its variables, creating data, importing data (from text and Excel files), exporting data (to text, pdf, and Microsoft-related formats), manipulating data, and providing descriptive statistics. Students will have the opportunity to practice in class, using sample datasets. Homework and project assignments will be provided as well.

#### Credits

2#### Prerequisites

Basic understanding of Microsoft Excel; prior programming experience and basic knowledge of statistics (i.e. mean vs. median vs. mode) would be beneficial, but is not required.

#### Learning Objectives:

- Recognize different types of raw data and learn how to import them into SAS
- Understand different types of variables as well as how to manipulate and convert between them
- Understand how to set up and conduct merging and transposing of data tables
- Obtain descriptive statistics such as mean, median, min, and max
- Generate reports and output reports into a variety of file types.

### STAT 430

Advanced SAS

The course will cover advanced SAS coding concepts such as the use of SAS Macro, SAS SQL, as well as a combination of both. The course will also introduce students to SAS STAT coding for common statistical tests (such as t-test, ANOVA, linear regression, and others). Students will have the opportunity to practice in class, using sample datasets. Homework and project assignments will be provided as well.

#### Credits

2#### Prerequisites

STAT 330 Introduction to SAS or equivalent at another college/university.

#### Learning Objectives:

- Understand the principles of Macro variables and Macro functions
- Become proficient with writing Macro coding for new programs and adding Macro coding to existing programs
- Understand how to create tables using SAS SQL with a variety of conditions
- Combine knowledge of STAT330 concepts with SAS SQL and SAS Macro to solve complex data issues
- Use SAS STAT to perform statistical tests (t-test, ANOVA, correlation, linear regression, Chi-Squared, logistic regression)

### STAT 500

Statistics for Biomedical Scientists I, part 1

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. The second semester expands on the material covered in the first semester.

This is the first part of a two-part course. Registration is required separately for each part of the course.

#### Credits

2#### Learning Objectives:

- Understand the role of chance in biomedical research
- Become knowledgeable about processes of estimation and statistical inference
- Learn about the statistical methods most often used in biomedical research
- Select appropriate statistical approach to analyze a set of biomedical research data
- Use Excel to analyze biomedical research data

### STAT 501

Statistics for Biomedical Scientists I, part 2

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. The second semester expands on the material covered in the first semester.

This is the second part of a two-part course. The completion of the first part (STAT 500) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Understand the role of chance in biomedical research
- Become knowledgeable about processes of estimation and statistical inference
- Learn about the statistical methods most often used in biomedical research
- Select appropriate statistical approach to analyze a set of biomedical research data
- Use Excel to analyze biomedical research data

### STAT 502

Statistics for Biomedical Scientists II, part 1

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. Those who will be routinely engaged in computing statistical procedures should consider STAT 200.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Learn the statistical aspects of processes planning and execution of biomedical research
- Know the assumptions of statistical methods, how to evaluate them, and how to respond to concerns
- Learn more complicated statistical methods than those presented in STAT 500 I
- Be able to build multivariable models and learn how they contribute to causal inference

### STAT 503

Statistics for Biomedical Scientists II, part 2

The objective of this course is to provide an overview of statistics for biomedical researchers and clinicians who are interested in the interpretation of the results of statistical analyses. This is a series of integrated lectures, readings, and exercises on analysis and interpretation of medical research data using Excel. Emphasis is on ideas and understanding rather than mechanics. Topics covered include the foundation of statistical logic, interpretation of the most commonly encountered statistical procedures in medical research, and selection of an appropriate method to analyze a particular set of data. Those who will be routinely engaged in computing statistical procedures should consider STAT 200.

This is the second part of a two-part course. The completion of the first part (STAT 502) is required before taking the second part. Registration is required separately for each part of the course.

#### Credits

2#### Prerequisites

The above course(s) or permission from the instructor.

#### Learning Objectives:

- Learn the statistical aspects of processes planning and execution of biomedical research
- Know the assumptions of statistical methods, how to evaluate them, and how to respond to concerns
- Learn more complicated statistical methods than those presented in STAT 500 I
- Be able to build multivariable models and learn how they contribute to causal inference