Content creation workflow--Decentralisation Content creation workflow--Registry-based search engine Decentralisation--Registry-based search engine Decentralisation--Bookmarks Registry-based search engine--Bookmarks Registry-based search engine--Website experience How to think--Movies/shows How to think--2023's Devlog Movies/shows--Cosmic Horror Movies/shows--Reading 2023's Devlog--Movies/shows 2023's Devlog--About The Waiting room--Movies/shows The Waiting room--2023's Devlog 2022's Devlog--2023's Devlog 2022's Devlog--About About--Daily open-source software guide About--2024's Devlog Cosmic Horror--The Waiting room Cosmic Horror--Reading Reading--The Waiting room Reading--Excellent Words Contact--About Contact--Bookmarks Bookmarks--Daily open-source software guide Bookmarks--Website experience Making music on Linux--Custom sequencer Making music on Linux--Rust Custom sequencer--Custom synth Custom sequencer--Luthier Rust--Custom sequencer Rust--Sounds & Melodies Daily open-source software guide--Decentralisation Daily open-source software guide--Caddy Caddy--Decentralisation Caddy--Software alternatives Music Transcribing--Sounds & Melodies Music Transcribing--Piano Sounds & Melodies--Making music on Linux Sounds & Melodies--Piano Piano--Making music on Linux Piano--Rust NixOs--Caddy NixOs--Software alternatives Software alternatives--Content creation workflow Software alternatives--Decentralisation Resources on audio & DSP--Custom sequencer Resources on audio & DSP--GPU Synth GPU Synth--Custom sequencer GPU Synth--Rust Custom synth--GPU Synth Custom synth--Luthier Luthier--Making music on Linux Luthier--GPU Synth Excellent Words--The word *Bączek* Excellent Words--Paintings The word *Bączek*--Paintings The word *Bączek*--The Ławka Initiative Paintings--The Ławka Initiative Paintings--Travel The Ławka Initiative--Excellent Words The Ławka Initiative--Listening Travel--The Ławka Initiative Travel--Website experience Listening--Travel Listening--Exocortex Exocortex--Registry-based search engine Exocortex--Map Map--Registry-based search engine Map--Website experience Website experience--Decentralisation Website experience--Exocortex 2024's Devlog--Contact 2024's Devlog--Colour scheme System configuration--Daily open-source software guide System configuration--Colour scheme Colour scheme--About Colour scheme--Daily open-source software guide Content creation workflow Content creation workflow Decentralisation Decentralisation Registry-based search engine Registry-based search engine How to think How to think ! Movies/shows Movies/shows 2023's Devlog 2023's Devlog The Waiting room The Waiting room ! 2022's Devlog 2022's Devlog About About Cosmic Horror Cosmic Horror Reading Reading Contact Contact Bookmarks Bookmarks Making music on Linux Making music on Linux Custom sequencer Custom sequencer Rust Rust Daily open-source software guide Daily open- source software guide Caddy Caddy Music Transcribing Music Transcribing Sounds & Melodies Sounds & Melodies Piano Piano NixOs NixOs Software alternatives Software alternatives Resources on audio & DSP Resources on audio & DSP GPU Synth GPU Synth Custom synth Custom synth Luthier Luthier Excellent Words Excellent Words The word *Bączek* The word *Bączek* Paintings Paintings The Ławka Initiative The Ławka Initiative Travel Travel Listening Listening Exocortex Exocortex Map Map Website experience Website experience 2024's Devlog 2024's Devlog ! System configuration System configuration Colour scheme Colour scheme
*Note*: you can click on each page, all nodes are links!

This is an automatically generated graph containing all pages on my websites, along with the connections calculated using sentence embeddings. If you're interested, you can read the source code.

How is this thing generated?

  1. All of the posts are fed through an embeddings generator, I'm using the Sentence Transformers Python library.
  2. The embeddings are passed to UMAP, a dimensionality reduction algorithm, which takes in multi-dimensional embeddings and projects it down to a 2D representation, which can be drawn as a graph. The projection is done so that the high-level "structure" of the data is preserved.
  3. I'm connecting each post with its top 2 nearest posts (using more clutters up the map).
  4. graphviz renders the graph and outputs it as an SVG file.

A much better description on Simon Willison's blog.

Future plans

  1. Make this thing look more "map-alike", whatever that might mean.
  2. Experiment with text clusterisation & dimensionality reduction algorithms, such as:
    • tSNE
    • K-means clustering
    • UMAP
    • Latent Dirichlet allocation
    • DBSCAN
  3. Add #tags.
  4. Introduce color-coding and other visual markers, allowing viewers to make sense of the data based on different metrics:
    • Post tags
    • Links to/from other posts
    • Links outside (to the netsphere)
    • Other connections generated by NLP
  5. Check out KagiSearch/vectordb