Introduction to New Technologies in Data Science
What is Data Science, and how can one use Big Data technologies to unlock value in massive data stores? How can one explore scientific data to gain new insights or make better data-management decisions? The objective of this course is to provide an overview of the history of Data Science platforms and its current landscape in order to enable students to implement their own Data Science solution, while also providing hands-on experience with these tools. The course will cover the basics of some tools that students can subsequently use to work with Data Science, such as Hadoop’s MapReduce,Apache Spark, Pig, Hive, Python, and R. In addition, the course will cover advanced data structures as well as real-world data scraping, cleansing, and wrangling. The course will also include a high-level overview of machine-learning concepts. INDIVIDUAL LAPTOP IS NEEDED FOR EACH CLASS (Mac, Linux or Windows).
Previous programming experience is not required, but is recommended.
- Technical Side:
- Gain basic understanding of elementary concepts common in Data Science analytics, such as distributed file system, NoSQL databases, job scheduling, and more
- Gain experience with integrating Data Science components into a Data Science platform, loading data, querying, and extracting value
- Gain hands-on experience connecting to and modifying installations and scripts Be able to rework an existing script to meet the students’ needs
- Data Side:
- Learn predictive modeling: find correlations; supervised segmentation; visualization segmentation; probability estimation
- Fit a model to data and avoid overfitting: choose goals for data; loss functions; cross validation; tree pruning; regularization
- Find natural clusters and neighbors—nearest neighbor, clustering methods, distance similarity
- Pivot from thinking about data to solving a problem
- Complete a short research project using Data Science techniques and technologies