Map

Cosmic Horror--Reading Cosmic Horror--Movies/shows Reading--Movies/shows Reading--Excellent Words Movies/shows--2022's Devlog Movies/shows--Excellent Words Decentralisation--Daily open-source software guide Decentralisation--Content creation workflow Daily open-source software guide--NixOs Daily open-source software guide--Caddy Content creation workflow--Daily open-source software guide Content creation workflow--Caddy Travel--The Ławka Initiative Travel--2023's Devlog The Ławka Initiative--Excellent Words The Ławka Initiative--The Waiting room 2023's Devlog--2022's Devlog 2023's Devlog--Listening 2022's Devlog--Travel 2022's Devlog--2024's Devlog Excellent Words--Cosmic Horror Excellent Words--Travel 2024's Devlog--2023's Devlog 2024's Devlog--Listening Listening--2022's Devlog Listening--Piano Rust--Luthier Rust--Resources on audio & DSP Luthier--Making music on Linux Luthier--Custom sequencer Resources on audio & DSP--Luthier Resources on audio & DSP--Making music on Linux About--2023's Devlog About--Contact Contact--Daily open-source software guide Contact--Colour scheme Making music on Linux--Custom synth Making music on Linux--GPU Synth Map--Website experience Map--Exocortex Website experience--Exocortex Website experience--Registry-based search engine Exocortex--Paintings Exocortex--Registry-based search engine Paintings--The Ławka Initiative Paintings--The word *Bączek* The word *Bączek*--The Ławka Initiative The word *Bączek*--Excellent Words The Waiting room--Reading The Waiting room--Excellent Words Registry-based search engine--Decentralisation Registry-based search engine--Map Bookmarks--Contact Bookmarks--Registry-based search engine System configuration--Colour scheme System configuration--Software alternatives Colour scheme--Daily open-source software guide Colour scheme--Software alternatives Software alternatives--Daily open-source software guide Software alternatives--NixOs Custom sequencer--Resources on audio & DSP Custom sequencer--GPU Synth Piano--Sounds & Melodies Piano--Music Transcribing Sounds & Melodies--Luthier Sounds & Melodies--Making music on Linux Music Transcribing--Making music on Linux Music Transcribing--Sounds & Melodies NixOs--Colour scheme NixOs--Caddy Custom synth--Resources on audio & DSP Custom synth--GPU Synth GPU Synth--Luthier GPU Synth--Resources on audio & DSP Caddy--Decentralisation Caddy--Software alternatives How to think--The word *Bączek* How to think--The Waiting room Cosmic Horror Cosmic Horror Reading Reading Movies/shows Movies/shows Decentralisation Decentralisation Daily open-source software guide Daily open- source software guide Content creation workflow Content creation workflow Travel Travel The Ławka Initiative The Ławka Initiative 2023's Devlog 2023's Devlog 2022's Devlog 2022's Devlog Excellent Words Excellent Words 2024's Devlog 2024's Devlog ! Listening Listening Rust Rust Luthier Luthier Resources on audio & DSP Resources on audio & DSP About About Contact Contact Making music on Linux Making music on Linux Map Map Website experience Website experience Exocortex Exocortex Paintings Paintings The word *Bączek* The word *Bączek* The Waiting room The Waiting room ! Registry-based search engine Registry-based search engine Bookmarks Bookmarks System configuration System configuration Colour scheme Colour scheme Software alternatives Software alternatives Custom sequencer Custom sequencer Piano Piano Sounds & Melodies Sounds & Melodies Music Transcribing Music Transcribing NixOs NixOs Custom synth Custom synth ! GPU Synth GPU Synth Caddy Caddy How to think How to think
You can click on each node, they are links!

This is an automatically generated graph containing all pages on my websites, along with the connections calculated using sentence embeddings. If you're interested, you can read the source code.

How is this thing generated?

Explained non-technically

  1. Using an AI-esque tool, I'm generating a mathematical representation of what each page on my site contains, in terms of contents
  2. I'm laying out each page on a graph, so that it is placed close to pages with similar contents and far away from pages with different contents. E.x. programming-related stuff will be grouped together, far away from something travel-related.
  3. I'm drawing links between pages which are the closest. This also generates the "related posts" section at the bottom of each page. The drawn links only serve aesthetic purposes.
  4. Posts are colored depending on their relatedness to 3 topics:
    • More red: art-related
    • More green: computers-related
    • More blue: music-related
    • I'm working on better coloring algorithms based on various gradients

The gory technical details

  1. All of the posts are fed through an embeddings generator, I'm using the Sentence Transformers Python library.
  2. The embeddings are passed to UMAP, a dimensionality reduction algorithm, which takes in multi-dimensional embeddings and projects it down to a 2D representation, which can be drawn as a graph. The projection is done so that the high-level "structure" of the data is preserved (at least that's what the UMAP paper states, I'm not data scientist to argue with the experts).
  3. I'm connecting each post with its top 2 nearest posts (using more clutters up the map).
  4. Coloring is done via calculating cosine similarity between the post content embeddings and embeddings of simple tag-based sentences, such as "music, melodies" or "art, beauty". Currently the gradient is dead-simple, similarity directly affects the R/G/B channel.
  5. graphviz renders the graphs and outputs them as SVGs.

A much better description on Simon Willison's blog.

Future plans

  1. Make this thing look more "map-alike", whatever that might mean.
  2. Experiment with text clusterisation & dimensionality reduction algorithms, such as:
    • tSNE
    • K-means clustering
    • UMAP
    • Latent Dirichlet allocation
    • DBSCAN
  3. Add #tags. Automatically assign posts to categories with cosine distances.
  4. Introduce color-coding and other visual markers, allowing viewers to make sense of the data based on different metrics:
    • Post tags
    • Links to/from other posts
    • Links outside (to the netsphere)
    • Other connections generated by NLP
  5. Check out KagiSearch/vectordb
  6. Color gradients with Python
  7. Circos