*Note*: you can click on each page, all nodes are links!
This is an automatically generated graph of all
pages on my websites, along with the connections
chosen by a text vectorisation algorithm.
If you're interested, you can read the
source code.
How is this thing generated?
TODO: update once I'm finished with UMAP
All of the posts are fed through a stemming algorithm, which reduces them
to their root form (think "doing" -> "do", "derivation" -> "deriv").
All the posts are fed into a TF-IDF vectorizer, which assigns a unique
index to each unique word and weights them by the frequency in which it appears.
For example, the word "to" probably won't be an important word, opposed to
"circumvolution" or "simulacrum". After this procedure, each post becomes
a vector of numbers containing the amount of occurences of each unique word
found in all of the posts, multiplied by the value of each word.
Post similiarity is compared by using a metric called cosine similiarity,
calculated from the vectors obtained in the 2nd step.
I'm connecting each post with its top 3 similiar posts (using 4 or more leads to clutter).
This graph is drawn by graphviz, it automatically generates the layout.
Future plans
Make this thing look more map-alike
Experiment with text clusterisation & dimensionality reduction algorithms, such as:
tSNE
K-means clustering
UMAP
Latent Dirichlet allocation
DBSCAN
Add #tags.
Introduce color-coding and other visual markers, allowing viewers to make sense of the data based on different metrics: