Content creation workflow--Decentralisation Content creation workflow--Caddy Decentralisation--NixOs Decentralisation--Bookmarks Caddy--Decentralisation Caddy--Software alternatives How to think--The Waiting room How to think--2022's Devlog The Waiting room--2022's Devlog The Waiting room--Cosmic Horror 2022's Devlog--2023's Devlog 2022's Devlog--2024's Devlog Cosmic Horror--Movies/shows Cosmic Horror--Reading 2023's Devlog--2024's Devlog 2023's Devlog--Movies/shows 2024's Devlog--Contact 2024's Devlog--System configuration Movies/shows--Listening Movies/shows--Reading Listening--Reading Listening--Piano Reading--The Waiting room Reading--Excellent Words Contact--About Contact--Daily open-source software guide About--2022's Devlog About--2024's Devlog Daily open-source software guide--Decentralisation Daily open-source software guide--Software alternatives Making music on Linux--Resources on audio & DSP Making music on Linux--Custom synth Resources on audio & DSP--Luthier Resources on audio & DSP--GPU Synth Custom synth--Resources on audio & DSP Custom synth--GPU Synth Software alternatives--Decentralisation Software alternatives--Colour scheme Music Transcribing--Making music on Linux Music Transcribing--Sounds & Melodies Sounds & Melodies--Making music on Linux Sounds & Melodies--Piano NixOs--Software alternatives NixOs--Colour scheme Colour scheme--2024's Devlog Colour scheme--About Rust--Resources on audio & DSP Rust--Luthier Luthier--Making music on Linux Luthier--Custom synth GPU Synth--Making music on Linux GPU Synth--Luthier Custom sequencer--Making music on Linux Custom sequencer--Custom synth Excellent Words--Cosmic Horror Excellent Words--The word *Bączek* The word *Bączek*--Paintings The word *Bączek*--The Ławka Initiative Paintings--The Ławka Initiative Paintings--Travel The Ławka Initiative--Excellent Words The Ławka Initiative--Travel Travel--How to think Travel--Excellent Words Piano--Making music on Linux Piano--Music Transcribing Map--Exocortex Map--Registry-based search engine Exocortex--How to think Exocortex--Registry-based search engine Registry-based search engine--Bookmarks Registry-based search engine--Website experience Bookmarks--Content creation workflow Bookmarks--Daily open-source software guide Website experience--Map Website experience--Bookmarks System configuration--Software alternatives System configuration--Colour scheme Content creation workflow Content creation workflow Decentralisation Decentralisation Caddy Caddy How to think How to think ! The Waiting room The Waiting room ! 2022's Devlog 2022's Devlog Cosmic Horror Cosmic Horror 2023's Devlog 2023's Devlog 2024's Devlog 2024's Devlog ! Movies/shows Movies/shows Listening Listening Reading Reading Contact Contact About About Daily open-source software guide Daily open- source software guide Making music on Linux Making music on Linux Resources on audio & DSP Resources on audio & DSP Custom synth Custom synth Software alternatives Software alternatives Music Transcribing Music Transcribing Sounds & Melodies Sounds & Melodies NixOs NixOs Colour scheme Colour scheme Rust Rust Luthier Luthier GPU Synth GPU Synth Custom sequencer Custom sequencer Excellent Words Excellent Words The word *Bączek* The word *Bączek* Paintings Paintings The Ławka Initiative The Ławka Initiative Travel Travel Piano Piano Map Map Exocortex Exocortex Registry-based search engine Registry-based search engine Bookmarks Bookmarks Website experience Website experience System configuration System configuration
You can click on each node, they are links!

This is an automatically generated graph containing all pages on my websites, along with the connections calculated using sentence embeddings. If you're interested, you can read the source code.

How is this thing generated?

Explained non-technically

  1. Using an AI-esque tool, I'm generating a mathematical representation of what each page on my site contains, in terms of contents
  2. I'm laying out each page on a graph, so that it is placed close to pages with similar contents and far away from pages with different contents. E.x. programming-related stuff will be grouped together, far away from something travel-related.
  3. I'm drawing links between pages which are the closest. This also generates the "related posts" section at the bottom of each page. The drawn links only serve aesthetic purposes.

The gory technical details

  1. All of the posts are fed through an embeddings generator, I'm using the Sentence Transformers Python library.
  2. The embeddings are passed to UMAP, a dimensionality reduction algorithm, which takes in multi-dimensional embeddings and projects it down to a 2D representation, which can be drawn as a graph. The projection is done so that the high-level "structure" of the data is preserved (at least that's what the UMAP paper states, I'm not data scientist to argue with the experts).
  3. I'm connecting each post with its top 2 nearest posts (using more clutters up the map).
  4. graphviz renders the graph and outputs it as an SVG file.

A much better description on Simon Willison's blog.

Future plans

  1. Make this thing look more "map-alike", whatever that might mean.
  2. Experiment with text clusterisation & dimensionality reduction algorithms, such as:
    • tSNE
    • K-means clustering
    • UMAP
    • Latent Dirichlet allocation
    • DBSCAN
  3. Add #tags. Automatically assign posts to categories with cosine distances.
  4. Introduce color-coding and other visual markers, allowing viewers to make sense of the data based on different metrics:
    • Post tags
    • Links to/from other posts
    • Links outside (to the netsphere)
    • Other connections generated by NLP
  5. Check out KagiSearch/vectordb
  6. Color gradients with Python