Skip to content

Latest commit

 

History

History
38 lines (19 loc) · 1.21 KB

File metadata and controls

38 lines (19 loc) · 1.21 KB

knn

Pros: Simple, multi thread, and flexible(works on linear & non linear data)
Cons: Bad against with many points, speed, and with outliners ( Missing data is just plain awful as well)

svm

Pros: Accuracy, uses subset of training points

Cons: Speed

Kernels

Used when your data is all clumped togeather (used for SVM). It adds another dimmension to the data points to filter them out.

It finds the similarities in points

Clustering

K-Means = Flat clustering. Yes or no style. (Is this person a buyer or not a buyer)

Mean-Shift = Hierarchical clustering. (Is this person a huge buyer, maybe he shops a little bit, or he does not shop at all)

K-Means makes 2 broad assumptions: number of clusters already known, and clusters shape. Very sensitive to initiazlation. It makes it fast however.

Mean-Shift = Does not assume anything about number of clusters. Not sensitive to outliners. Sensitive to selection of bandwidth however.

Accuracy vs Confidence

Accuracy = Did we get the classfication right?

How close are you to the actual value

Confidence = Comes from the classifer and says " we have only have 60% of the votes and our confidence is only 60%"

How sure you are your program is working