This is an automatically generated graph containing all pages on my websites,
along with the connections
calculated using sentence embeddings. If you're interested, you can read the
source code.
How is this thing generated?
Explained non-technically
Using an AI-esque tool, I'm generating a mathematical representation of what
each page on my site contains, in terms of contents
I'm laying out each page on a graph, so that it is placed close to pages
with similar contents and far away from pages with different contents. E.x.
programming-related stuff will be grouped together, far away from something
travel-related.
I'm drawing links between pages which are the closest. This also generates
the "related posts" section at the bottom of each page. The drawn links only
serve aesthetic purposes.
Posts are colored depending on their relatedness to 3 topics:
More red: art-related
More green: computers-related
More blue: music-related
I'm working on better coloring algorithms based on various gradients
The gory technical details
All of the posts are fed through an embeddings generator, I'm using the
Sentence Transformers Python library.
The embeddings are passed to UMAP, a
dimensionality reduction algorithm, which takes in multi-dimensional
embeddings and projects it down to a 2D representation, which can be drawn
as a graph. The projection is done so that the high-level "structure" of the
data is preserved (at least that's what the UMAP paper states, I'm not data
scientist to argue with the experts).
I'm connecting each post with its top 2 nearest posts (using more clutters
up the map).
Coloring is done via calculating cosine similarity between the post content
embeddings and embeddings of simple tag-based sentences, such as "music,
melodies" or "art, beauty". Currently the gradient is dead-simple,
similarity directly affects the R/G/B channel.
graphviz renders the graphs and outputs them as SVGs.