Seminars & Colloquia Calendar

Download as iCal file

Experimental Mathematics Seminar

Data analysis in high-dimensional spaces

Adi Ben Israel, Rutgers University (RUTCOR)

Location:  Zoom
Date & time: Thursday, 06 May 2021 at 5:00PM - 6:00PM

 

1. The unreliability of the Euclidean distance in high-dimension, making a proximity query meaningless and unstable because there is poor discrimination between the nearest and furthest neighbor [3], see also [4].

2. The uniform probability distribution on the n-dimensional unit sphere S_n, and some non-intuitive results for large $n$. For example, if x is any point in S_n, taken as the "north pole", then most of the area of S_n is concentrated in the "equator".

3. The advantage of the $ell_1$-distance, which is less sensitive to high dimensionality, and has been shown to "provide the best discrimination in high-dimensional data spaces," [1, p. 427].

4. Clustering high-dimensional data using the $ell_1$ distance, [2].

References

[1] C.C. Aggarwal et al, On the surprising behavior of distance metrics in high dimensional space, Lecture Notes in Computer Science, vol 1973(2001), Springer, https://doi.org/10.1007/3-540-44503-X_27

[2] T. Asamov and A. Ben-Israel, A probabilistic $ell_1$ method for clustering high-dimensional data, Probability in the Engineering and Informational Sciences, 2021, 1-16

[3] K. Beyer et al, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, vol 1540(1999), Springer, https://doi.org/10.1007/3-540-49257-7_15

[4] J.M. Hammersley, The distribution of distance in a hypersphere, The Annals of Mathematical Statistics 21(1950), 447452.

Special Note to All Travelers

Directions: map and driving directions. If you need information on public transportation, you may want to check the New Jersey Transit page.

Unfortunately, cancellations do occur from time to time. Feel free to call our department: 848-445-6969 before embarking on your journey. Thank you.