The Mahalanobis Distance Test For Outliers Published 11/2024 MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz Language: English | Size: 1.30 GB | Duration: 4h 18m
Using a 7-Step Procedure, learn how calculate and validate the Mahalanobis Distance Test
What you'll learn
Calculating distances in datasets
A 7-Step procedure for calculating the Mahalanobis Distance Matrix
Using the Chi-Square DIstribution to identify outliers
Matrix operations and how it is used in the Mahalanobis Distance Matrix development
The Covariance Matrix and how it is calculated and used in the Mahalanobis Distance Matrix
Requirements
A working knowledge of Excel
A working knowledge of Matrix algebra (supported by a detailed Lecture in the course)
A working knowledge of statistical parameters: Covariance, Variance and Correlation (also suppored by a lecture in the course)
Description
A) The Purpose of the CourseIn most of the Machine Learning methods and algorithms that analyze datasets, there is a need to investigate how "close" or "far" the items of the dataset are to each other. This can allow analysts to look for outliers, anomalies, classify data items into clusters, establish if there are associations between the items or not and such issues.To do that, Machine Learning methods rely on the use of a mathematical concept: the distance between items in a dataset. We are used to consider distance as the length between two points. Mathematicians have a wider use of the term. A customer dataset consisting of 1000s of customers will have a set of M attributes about each customer. M can be in the 2 digit range. If M = 1, 2 or 3, we can visualize the distance between points in terms of charts. This stops being possible for M > 3. The distance becomes a mathematical expression consisting of a vector for each item in the dataset where the vector is a set of the instances or values of the attributes for each item in the dataset.What makes this more interesting is that there are various ways distances can be calculated: Euclidian, Manhattan, Minkovsky and Chebyshev distances. The Euclidian is the most common. However, with time, Machine Learning methods using the Euclidian Distance resulted in anomalies in the results giving invalid answers to the use of the distances.Since the Euclidian Distance is calculated in multivariate space by multiplying the Transpose of the dataset, PT with the dataset P. This is where Mr. Prasad Mahalanobis with his genius in statistics, came up with the idea: why not transform that dataset P before multiplying it by its transpose. This resolve a large number of issues with the Euclidian Distance.The objective of the course is to present a 7-Step procedure used to calculate the Mahalanobis Distances and from the resulting matrix, identify the outliers. Identification will be based on specifying a significance level (such as 0.1%, 1% and 5%).The course will also provide support lectures that are required as pre-requisites or knowledge and practices needed to apply the 7 steps.B) So, why do we Present a Course based on Excel?The course will then use Excel specifically for educational purposes and not as a machine learning tool. The course is not setup to show you how to use the 7-Step in real life. That would require more advanced programming environments. The course mainly aims to clarify the procedure for identifying outliers using the Mahalanobis Distance test. For that, Excel is used as an educational tool as it is easy to understand and is well known by mots business analysts.B) What Does the Course Cover?The course is made up of 3 sectionsSection 1: Introducing the CourseThis section consists of one lecture that presents the objectives of the course, its structure and resources as well as what to expect and what not to expect.Section 2: This is the heart of the course and consists of 7 Lectures:2) Introducing Distances, Specifically the Mahalanobis Distance A. Introducing Prasanta Chandra Mahalanobis B. Introducing Mahalanobis Distance C. The Importance of Measuring Distance between Items of Data3) Practices in our Data and the Matrix Representation of the Euclidian Distance D. Introducing Some Terms and Practices in our Data E. Starting with the Matrix Representation of the Euclidian Distance4) Shortcoming of Euclidian Distances F. Shortcoming of Euclidian Distances5) The 7-Step Procedure for Calculating Mahalanobis Distances G. The 7-Step Procedure for Calculating Mahalanobis Distances6) Conditions for the Covariance Matrix to be Positive Definite H. Interlude: Conditions for the Covariance Matrix to be Positive Definite7) How to Identify Outliers and More Examples I. Calculating the Mahalanobis Distance for the Two Equidistance Points J. How to Identify Outliers by using the Chi-Square DistributionSection 3: Support PresentationsThis section consists of 4 lectures covering material that should be known, as a pre-requisite, to appreciate and use the calculations in the 7-Step Procedure that results in identification of outliers through the Mahalanobis Distance test:8) Support - Matrices and Transformation9) Support - The Cholesky Decomposition10) Support - Multivariate Data and their Parameters K. Introducing Univariate and Bivariate Data and their Parameters L. Calculating the Variance, Standard Deviation and Covariance11) Support - Covariance and Correlation M. The Covariance Matrix N. 6 Methods for Calculating the Covariance Matrix O. Correlation P. How to Calculate the Correlation Coefficient RResourcesAll lectures will be supported by a variety of resources:· Each lecture will have its PowerPoint presentation uploaded in PDF format for your later use· Solved and documented workouts in Excel· Dedicated workbooks that animate and describe various probability distributions· Links to Interesting articles and books
Overview
Section 1: Introducing the Course
Lecture 1 The Purpose and the Structure of the Course
Section 2: The Mahalanobis Distance Test for Detecting Outliers
Lecture 2 Introducing Distances, Specifically the Mahalanobis Distance
Lecture 3 Practices in our Data and the Matrix Representation of the Euclidian Distance
Lecture 4 Shortcoming of Euclidian Distances
Lecture 5 The 7-Step Procedure for Calculating Mahalanobis Distances
Lecture 6 Conditions for the Covariance Matrix to be Positive Definite
Lecture 7 How to Identify Outliers and More Examples
Section 3: Supporting Lectures
Lecture 8 Support - Matrices and Transformation
Lecture 9 Support – The Cholesky Decomposition
Lecture 10 Support - Multivariate Data and their Parameters
Lecture 11 Covariance Matrices and Correlation
Data Scientists and Analysts,Machine Learning Engineers,Artificial Intelligence Researchers,Software Developers,Business Analysts,Market Researchers,Healthcare Professionals,Finance Professionals,Educators and Researchers,Cybersecurity Experts,Natural Language Processing (NLP) Specialists,Students embarking on machine learning and data science careers,Product Managers,Business Improvement Experts,Quality Assurance Professionals,Social Scientists