Week #1
|
Module
|
This is a week of introductions. First, you'll be introduced to the learning objectives we hope to accomplish and the learning process we'll be following. Then, you'll be introduced to the concepts and applications of data mining and machine learning. We will then begin to examine the Weka data mining software and the data mining and machine learning algorithms that are available. Finally, we'll take a look at the objectives and process for completing the course project.
After completing this lesson, you should be able to:
- Describe the course outline and objectives
- Compare applications of data mining and machine learning
- Explain the capabilities of Weka
- Install and run examples in Weka
- Identify the objectives and requirements of the course project
|
Lecture
|
Introduction
Discover what the Data Mining for Cybersecurity course is all about. Begin examining how data mining and machine learning provide automated techniques for discovering patterns in large amounts of data that can predict outcomes and explain how predictions were derived.
|
Lecture
|
Weka Overview
How do you use Weka? This lecture not only explains what Weka is and what Weka does, it also shows you how to obtain the Weka data mining software and how to begin using it.
|
Lecture
|
Project Overview
An important element of this course is the completion of a project in which you research and implement data mining techniques for cybersecurity. This lecture shows you what needs to be done to complete the proposal and final report phases of the project.
|
Week #2
|
Module
|
This week, we'll focus on preparing and applying input to data mining systems. We'll also explore the input, analysis, and output capabilities of the Weka Explorer. We'll begin by examining three input components – concepts, instances, and attributes – and four basic styles of learning in data mining – classification, association, clustering, and numeric prediction. Then, we'll practice preparing data, loading input, and generating output with the Weka Explorer interface.
After completing this lesson, you should be able to:
- Explain and apply input of concepts, relations, and attributes to data mining software
- Use the ARFF format
- Apply data preparation methods for Weka
- Load data into Weka
- Use the Weka Explorer interface
|
Lecture
|
Input
Examine what can be input into a data mining system. Discuss and see examples of the four basic styles of learning in data mining and the input of concepts, instances, and attributes to data mining software. Learn how to prepare data for input using the ARFF format.
|
Lecture
|
Weka Explorer
Take a closer look at how to use Weka. Walk through the Weka Explorer interface to prepare data for input, preview data, generate data output, and visualize output. See examples of input, analysis, and output capabilities.
|
Week #3
|
Module
|
In data mining, there are a variety of ways to represent patterns with knowledge representation – and we want to be able to select the correct one for a given situation. This week, we'll examine the various methods of knowledge representation that are available for use with different learning objectives.
After completing this lesson, you should be able to:
- Apply knowledge representation methods and explain
- Tables
- Linear models
- Trees
- Rules
|
Lecture
|
Knowledge Representation
Explore how to use tables, linear regression, trees, classification rules, association rules, and other knowledge representation methods for different types of learning objectives.
|
Week #4
|
Module
|
This week, we’ll begin examining the algorithmic methods that are at the heart of successful data mining. We’ll explore the fundamental algorithms used in data mining and see step-by-step examples of how they work.
After completing this lesson, you should be able to:
- Use data mining algorithms for
- Inference
- Statistical modeling
- Decision trees
- Compare rules, trees, and lists
|
Lecture
|
Algorithms, Part 1
In this lecture, we’ll discuss the importance of algorithms in data mining for cybersecurity. We'll explore three key data mining algorithms used for inference, statistical modeling, and decision trees. We'll also see these algorithms demonstrated in an ongoing weather example.
|
Week #5
|
Module
|
This week, we’ll examine other important data mining algorithms. We'll begin by exploring a smart approach to produce association rules. We'll also look at how to use linear regression – with numeric attributes and for classification. Then, we'll identify four distance metrics for use in instance-based learning.
After completing this lesson, you should be able to:
- Mine association rules
- Explain and apply
- Linear models
- Instance-based learning
|
Lecture
|
Algorithms, Part 2
In this lecture, we'll continue our exploration of data mining algorithms. First, we'll focus on mining association rules. Then, we'll explore using linear regression and classification. We'll conclude by looking at how to use distance functions for instance-based learning.
|
Week #6
|
Module
|
Evaluation is key to making good use of data mining output. This week, we’ll explore various methods for evaluating data and predicting performance. We'll discuss the cost of errors and when and how to use cross-validation. We'll examine performance measurement and the steps for creating ROC curves. We'll also compare algorithms for effectiveness and identify other evaluation metrics to consider.
After completing this lesson, you should be able to:
- Explain and apply methods for
- Training and testing
- Predicting performance
- Cross-validation
- Leave-one-out
- Compare approaches
- Explain cost
- Evaluate numeric predictions
|
Lecture
|
Evaluating Results
In data mining, not all errors are equal. This lecture introduces some effective methods to consider for measuring classifiers, evaluating data, and correctly predicting how well a model will perform. This lecture also includes a good explanation of true and false negatives and positives in prediction.
|
Week #7
|
Module
|
This week, we’ll examine two cybersecurity applications of data mining: signature detection and anomaly detection. We'll define what these two types of detection are and discuss the challenges they present. We'll also walk through signature and anomaly detection processes. Finally, we'll see an example of the use of association rules in signature and anomaly detection.
After completing this lesson, you should be able to:
- Apply signature and anomaly detection
|
Lecture
|
Cybersecurity Applications
The lecture explains how signature detection can be used to look for known computer system vulnerabilities and how anomaly detection can detect new and previously unknown attacks that fall outside of normal use.
|
Week #8
|
Module
|
This week, we'll examine the goals and methods of privacy preserving data mining. We'll begin by identifying the personal information that should not be revealed and the importance of preventing unauthorized users from obtaining access to private user information. Then, we'll explore key privacy preserving methods such as modifying information and partitioning data instead of keeping it in a central location.
After completing this lesson, you should be able to:
- Explain privacy-preserving data mining
|
Lecture
|
Privacy Preserving Data Mining
In this lecture, we’ll discuss how to use data mining to protect private information. We'll explore data modification and horizontal and vertical partitioning as ways to keep personal information from unauthorized users.
|