Links curation

This is the project that generates my links website, a website to share all article links I've found interesting

Links data are .webloc plain-text files, organized in folders.

Each folder is a category on the generated website that contains links.

The project generates statistics regarding the links : number of links shared, domains of the links, top domain websites...

Links are checked to find out if there are doubles.

I just have to move the url from any browser on a Mac inside a finder window. MacOS automatically creates the webloc file.

How to keep data using webloc files

Webloc files are XML files behind. They just keep the title of the webpage as the filename, and webpage url as a URL property in the XML file content.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
   <dict>
   <key>URL</key>
   <string>https://boringtechnology.club/</string>
</dict>
</plist>

I just keep the data inside the file as XML property (hostname, title, description, date, favicon).

Doing that doesn't break the file capabilities (automatically open a file using the default browser).

Example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
   <dict>
   <key>URL</key>
   <string>https://blog.cleancoder.com/uncle-bob/2015/08/06/LetTheMagicDie.html</string>
   <key>Hostname</key>
   <string>blog.cleancoder.com</string>
   <key>Title</key>
   <string>Clean Coder Blog</string>
   <key>Description</key>
   <string></string>
   <key>Date</key>
   <string>Tue Mar 04 2025 20:19:44 GMT+0100 (Central European Standard Time)</string>
   <key>Favicon</key>
   <string>blog.cleancoder.com.png</string>
</dict>
</plist>

How I get those data?

I retrieve the content of the webpage, then I use Cheerio to parse the page and get HTML attributes I want to save.

Favicon

As we saw before, I keep the favicon name inside the webloc file.

I then download the website favicon in a folder to avoid making requests directly on author server.

Domains information

I list the websites where the links come from. That's why I have to retrieve the data of the index page of the website.

To keep this data, I use a simple JSON file. I update this database using lowdb.