I want to extract all the tweets I've ever written and convert them to small markdown files so they show up as "posts" on this website.

Posts are organized by date, in the traditional blogging format. But so are tweets, kind of? They're listed chronologically anyway. Maybe I could make one post for each day, and then have all the tweets listed on that page. Each one could have its tweet ID as a header, thus having an internal link. Tweets can link to the tweets that precede them, and maybe even backlink to tweets that follow.

It's not quite block references, but as a way of keeping my second brain under my ownership it should work. And this way if anyone wants to cancel me they'll have a convenient search box and permalinks for it. Even if my account gets deleted, my bad takes can stay up.

from pathlib import Path
out_dir = Path('../_posts/tweets/')
posts = [o.name for o in out_dir.iterdir()]

last_date = sorted(posts)[-2][:10]


This :point_up: is a workaround. Sometime after I started archiving my tweets, Twitter changed the API for search. This means that unofficial scrapers like twintstopped being able to see tweets from more than a few months ago. Even the in-app search was broken for a while.

Fortunately I had already scraped and archived my old tweets, so now I just check to see when the last time I ran this script was and find tweets since that date. This also helps protect me from getting rate-limited by Twitter (they tend to do that if you download 30,000 tweets a few times in a row lol).

import twint
import nest_asyncio

c = twint.Config()
c.Username = 'deepfates'
tweets = []
c.Store_object = True
c.Store_object_tweets_list = tweets
c.Since = last_date
c.Hide_output = True
<twint.run.Twint at 0x7f8ba4086150>

This is all the configuration necessary to grab my tweets. Have to use Hide_output = True, or it will print every single tweet in the output.

I now have a dataset of tweets. Let's explore one here, and see some of its metadata.

Check data

t = tweets[-1]
t.conversation_id, t.datestamp, t.datetime, t.id, t.likes_count, t.link, t.mentions, t.photos, t.quote_url, t.replies_count, t.reply_to, t.retweet, t.retweet_date, t.retweet_id, t.retweets_count, t.source, t.thumbnail, t.timestamp, t.timezone, t.tweet, t.urls, t.user_id, t.user_id_str, t.user_rt, t.user_rt_id, t.username, t.video
 '2021-11-02 23:08:51 MDT',
 [{'screen_name': 'AinterShow',
   'name': 'The Big Wet Homie 👁️⃤',
   'id': '1286760114632814592'}],
 '@AinterShow whoop',

Tweet layout functions

Many of my tweets have pictures or memes attached, but Twitter only includes the pic.twitter.com URL for these. Here I build a few functions to download the image and squeeze it into a Markdown template for display on my site.

import requests
import shutil
def dl_image(url):
    filename = '../images/from_twitter/' + url.split('/')[-1]
    r = requests.get(url, stream = True)
    if r.status_code == 200:
        r.raw.decode_content = True
        with open(filename,'wb') as f:
            shutil.copyfileobj(r.raw, f)
# hacky thing uses [1:] to shave the first '.' off the filename
def image_template(filename):
    return(f'![image from twitter]({filename[2:]})\n')

def get_tweet(t):
    if t.photos == []:
        img_md = ''
        img_list = [dl_image(url) for url in t.photos]
        img_md = '\n'.join([image_template(o) for o in img_list])

#### <a href = "{t.link}">*{t.timestamp}*</a>

<font size="5">{t.tweet}</font>


🗨️ {t.replies_count}{t.retweets_count} 🤍  {t.likes_count}   


def get_md(tweets, date):
    tweets_text = ''.join(t for t in tweets)
title: deepfates log {date}
layout: post
toc: true
comments: false
search_exclude: false
hide: true
categories: [tweets]

from IPython.display import Markdown
yesterday = t.datestamp
y_tweets = [tw for tw in tweets if tw.datestamp == yesterday]
Markdown(get_tweet([tw for tw in tweets if tw.datestamp == yesterday][-1]))


@AinterShow whoop

🗨️ 1 ♺ 0 🤍 1

y_sorted = sorted(y_tweets, key=lambda x: x.datetime)
# [tweet.tweet for tweet in y_sorted]

Too many replies! Let's limit to just mine for now

y_md = get_md([get_tweet(t) for t in y_sorted if "@" not in t.tweet], yesterday)
with open(f'../_posts/tweets/{yesterday}-tweets.md', 'w') as f:
    print(y_md, file=f)

Do the work

Okay, that'll do for now. It prints a chronological page of tweets for each day.

I'll wrap that behavior in a function and pass it my tweets and a set of dates when i have tweeted.

def write_day_page(day, tweets):
    tweets = [tw for tw in tweets if tw.datestamp == day]
    sorted_tweets = sorted(tweets, key=lambda x: x.datetime)
    md = get_md([get_tweet(t) for t in sorted_tweets], day)
    with open(f'../_posts/tweets/{day}-tweets.md', 'w') as f:
        print(md, file=f)
self_tweets = [t for t in tweets if "@" not in t.tweet]

days = set([t.datestamp for t in self_tweets])

from tqdm import tqdm
for day in tqdm(days):
    write_day_page(day, self_tweets)
100%|██████████| 23/23 [00:25<00:00,  1.12s/it]

I would also like to do analysis to see how often I tweet, and other facts. And maybe make a big list of links. Maybe next time.

For now you can find these secret tweet archives by searching in the Explore page. The days archived this time are as follows.