performance - Difference between Hierarchical and Non Hierarchical Clustering? -
i trying see if performance of both can compared based on objective functions work on?
hierarchical : single linkage, complete linkage , average linkage algorithms
non hierarchical : fuzzy c means , k means
don't confuse algorithms , tasks.
there more 1 algorithm k-means. there @ least 2 dozen. use k-d-trees acceleration, example. , heuristics, because finding optimal k-means solution shown np-hard in general, believe.
similarly, there naive o(n^3)
runtime , o(n^2)
memory approach hierarchical clustering, , there algorithms such slink single-linkage hierarchical clustering , clink complete-linkage hierarchical clustering run in o(n^2)
time , o(n)
memory.
last not least, if use dbscan , set minpts=2
, result same single-link hierarchical clustering when cut @ height of epsilon
. yet, appropriate index, dbscan runs in o(n log n)
(e.g. in elki - scipy implementation not clever, needs o(n^2)
memory , runtime).
so why can have such different runtimes apparently same task? first of all, algorithms (and implementations) can primitive. easy implement , understand, not clever be. secondly, algorithms perform work may or may not interested in. single-linkage hierarchical clustering computes hierarchy; dbscan minpts=2
single cut through hierarchy - much simpler result complete hierarchy. , last not least, there heuristics. of k-means variants fall last category: accepting miss global best solution, can of course become faster if performed exhaustive search.
Comments
Post a Comment