twarchive is my project for making Hugo websites that contain archived tweets.
Embedded tweets look like this:
There is a comprehensive archive of all my tweets and more on a separate Hugo site at https://tweets.micahrl.com which may serve as a useful example.
The project source code is also on github mrled/twarchive, and the readme contains detailed install and usage instructions.
Goals
- Keep a local copy of tweets in high fidelity, even if they are deleted or otherwise unavailable from Twitter
- Keep a local copy of all media
- Do not give tracking information to Twitter or any other third party
- Allow tweets to be downloaded
Implementation
This project has three main components:
twarchive
, A Python program to download tweet data to JSON filesornithography
, A Hugo utility theme that renders them for the siteaviary
, An optional Hugo main site theme, so that sites that will just contain Twitter archives can be easily set up
twarchive
Python program
The program understands Hugo sites, the Twitter API, and Twitter archives.
It can retrieve tweets from the Twitter API directly, and can also grab related tweets like thread parents, quote tweets, and retweets.
It can parse Twitter archive, and embed tweets without calling the API. This is especially useful for very old tweets, or if you have a tweet archive from a deleted account.
It can scan your Hugo posts for tweets embedded with twarchive’s shortcodes and download them or pull them from an archive, along with related tweets.
It works around Hugo’s limitation that it cannot generate a new page from data. Tweets are saved to JSON files inside Hugo’s data folder, but Hugo cannot create a page from data this way. The Python program creates a page for each tweet in the data folder instead.
ornithography
Each tweet is an iframe to a self-contained HTML file.
Images and videos are base64-encoded data:
URIs which are saved directly in the HTML.
Tweet styles are self contained and not affected by site styles.
Dark mode is supported if the user has set prefers-color-scheme
,
but any site-specific toggles to enable dark mode
like I have
will not work.
Each tweet has a download button allowing for any user to easily make a copy of their own. Hat tip to Terrence Eden for explaining how this works.
aviary
A simple theme that can be used for a whole site.
I use it for https://tweets.micahrl.com.
Future work
data:
URIs are unweildy. Chromium-based browsers refuse to displaydata:
URIs if they are entered into the address bar, so we have to hack around this with JavaScript. The solution currently prioritizes high fidelity local archives over user experience; the result is that embedded tweets in the website is an OK experience with some rough edges, but getting at images embedded in downloaded HTML is not very polished.- Capturing polls in tweets is not possible unless we use the v2 API. This implementation uses the v1.1 API because it is easier to get started, while v2 requires manual approval from Twitter 🙄.
- Styling could use some improvements, especially for tweet threads.
Notes
- Authentication: we can use the official twitter consumer key/secret for access to public data. love too skirt API key bullshit.
- Page performance: Using iframes means there is some asynchrony in page load. Each tweet (including embedded images and video) are loaded in a frame separately. Depending on how many tweets you want to embed in a page, this might make performance better or worse.
- Hugo performance: Including thousands of extra pages in a Hugo site increases build time.
I originally wanted to keep all my tweets on this site with like a
/tweets
URI, but when that got too slow I moved them off to https://tweets.micahrl.com. Now only tweets that I embed are included in this site, and my entire Twitter history is on another site that doesn’t undergo heavy development.