How to use UMAP
To learn the principle background of UMAP [1]
Propose
Manifold learning and dimension reduction algorithm
Python tools needed
numpy, sklearn, matplotlib, seaborn, pandas
matplotlib and seaborn are plotting tools and pandas is facilitating the process.
Data in use
Penguin data, https://raw.githubusercontent.com/allisonhorst/penguins/master/data/penguins_size.csv
Visualize the data
seaborn.pairplot(penguins, hue = 'species_short')
Construct a UMAP object
import umap
reducer = umap.UMAP()
Standardized the penguin dataset
penguin_data = penguins[
[
"culmem_length_mm",
"culmem_depth_mm",
"flipper_length_mm",
"body_mass_g",
]
].values
scaled_penguin_data = StandardScaler().fit_transform(penguin_data)
After standardized, the shape of data
matrix = reducer.fit_transform(scaled_penguin_data) #This is the dimention reduction step
print matrix.shape
# matrix is a numpy array
Terminal: (344,2)
Visualizing the result of UMAP
plt.scatter(
matrix[:,0],
matrix[:,1],
c=[sns.color_palette()[x] for x in penguins.species_short.map({"Adelie":0, "Chinstrap":1, "Gentoo":2})]
)
plt.gca().set_aspect('equal','datalim')
plt.title('UMAP projection of the Penguin dataset', fontsize = 24)
Hope to help people who think the umap-learn.readthedoc.io is redundancy ↩︎