Twarchive

A Hugo utility theme for tweet archives

I built a utility theme for Hugo that facilitates archiving tweets, and a companion Python tool that processes tweet data for the theme. It’s called twarchive.

Embedded tweets look like this:

There is a comprehensive archive of all my tweets and more on a separate Hugo site at https://tweets.micahrl.com. That site uses mostly default styles from the theme, and is useful as an example. You can see its repo on github fontawesome/brands/github mrled/tweets.micahrl.com.

The theme source code is also on github fontawesome/brands/github mrled/twarchive, and the readme contains detailed install and usage instructions.

Goals

  • Keep a local copy of tweets in high fidelity, even if they are deleted or otherwise unavailable from Twitter
  • Keep a local copy of all media
  • Do not give tracking information to Twitter or any other third party
  • Allow tweets to be downloaded

Implementation

This project has two components: a Python program to download tweet data to JSON files, and a Hugo theme module that renders them for the site.

Python program to download tweet data

The program understands Hugo sites, the Twitter API, and Twitter archives.

It can retrieve tweets from the Twitter API directly, and can also grab related tweets like thread parents, quote tweets, and retweets.

It can parse Twitter archive, and embed tweets without calling the API. This is especially useful for very old tweets, or if you have a tweet archive from a deleted account.

It can scan your Hugo posts for tweets embedded with twarchive’s shortcodes and download them or pull them from an archive, along with related tweets.

It works around Hugo’s limitation that it cannot generate a new page from data. Tweets are saved to JSON files inside Hugo’s data folder, but Hugo cannot create a page from data this way. The Python program creates a page for each tweet in the data folder instead.

Hugo theme, generated HTML

Each tweet is an iframe to a self-contained HTML file. Images and videos are base64-encoded data: URIs which are saved directly in the HTML.

Tweet styles are self contained and not affected by site styles. Dark mode is supported if the user has set prefers-color-scheme, but any site-specific toggles to enable dark mode like I have will not work.

Each tweet has a download button allowing for any user to easily make a copy of their own. Hat tip to Terrence Eden for explaining how this works.

Future work

  • data: URIs are unweildy. Chromium-based browsers refuse to display data: URIs if they are entered into the address bar, so we have to hack around this with JavaScript. The solution currently prioritizes high fidelity local archives over user experience; the result is that embedded tweets in the website is an OK experience with some rough edges, but getting at images embedded in downloaded HTML is not very polished.
  • Capturing polls in tweets is not possible unless we use the v2 API. This implementation uses the v1.1 API because it is easier to get started, while v2 requires manual approval from Twitter 🙄.
  • Styling could use some improvements, especially for tweet threads.

Notes

  • Authentication: we can use the official twitter consumer key/secret for access to public data. love too skirt API key bullshit.
  • Page performance: Using iframes means there is some asynchrony in page load. Each tweet (including embedded images and video) are loaded in a frame separately. Depending on how many tweets you want to embed in a page, this might make performance better or worse.
  • Hugo performance: Including thousands of extra pages in a Hugo site increases build time. I originally wanted to keep all my tweets on this site with like a /tweets URI, but when that got too slow I moved them off to https://tweets.micahrl.com. Now only tweets that I embed are included in this site, and my entire Twitter history is on another site that doesn’t undergo heavy development.