Archive my tweets
I want to extract all the tweets I've ever written and convert them to small markdown files so they show up as "posts" on this website.
Posts are organized by date, in the traditional blogging format. But so are tweets, kind of? They're listed chronologically anyway. Maybe I could make one post for each day, and then have all the tweets listed on that page. Each one could have its tweet ID as a header, thus having an internal link. Tweets can link to the tweets that precede them, and maybe even backlink to tweets that follow.
It's not quite block references, but as a way of keeping my second brain under my ownership it should work. And this way if anyone wants to cancel me they'll have a convenient search box and permalinks for it. Even if my account gets deleted, my bad takes can stay up.
from pathlib import Path
out_dir = Path('../_posts/tweets/')
posts = [o.name for o in out_dir.iterdir()]
last_date = sorted(posts)[-2][:10]
last_date
This is a workaround. Sometime after I started archiving my tweets, Twitter changed the API for search. This means that unofficial scrapers like
twint
stopped being able to see tweets from more than a few months ago. Even the in-app search was broken for a while.
Fortunately I had already scraped and archived my old tweets, so now I just check to see when the last time I ran this script was and find tweets since that date. This also helps protect me from getting rate-limited by Twitter (they tend to do that if you download 30,000 tweets a few times in a row lol).
import twint
import nest_asyncio
nest_asyncio.apply()
c = twint.Config()
c.Username = 'deepfates'
tweets = []
c.Store_object = True
c.Store_object_tweets_list = tweets
c.Since = last_date
c.Hide_output = True
twint.run.Profile(c)
This is all the configuration necessary to grab my tweets. Have to use Hide_output = True
, or it will print every single tweet in the output.
I now have a dataset of tweets. Let's explore one here, and see some of its metadata.
len(tweets)
t = tweets[-1]
t.conversation_id, t.datestamp, t.datetime, t.id, t.likes_count, t.link, t.mentions, t.photos, t.quote_url, t.replies_count, t.reply_to, t.retweet, t.retweet_date, t.retweet_id, t.retweets_count, t.source, t.thumbnail, t.timestamp, t.timezone, t.tweet, t.urls, t.user_id, t.user_id_str, t.user_rt, t.user_rt_id, t.username, t.video
import requests
import shutil
def dl_image(url):
filename = '../images/from_twitter/' + url.split('/')[-1]
r = requests.get(url, stream = True)
if r.status_code == 200:
r.raw.decode_content = True
with open(filename,'wb') as f:
shutil.copyfileobj(r.raw, f)
return(filename)
else:
return(None)
# hacky thing uses [1:] to shave the first '.' off the filename
def image_template(filename):
return(f'\n')
def get_tweet(t):
if t.photos == []:
img_md = ''
else:
img_list = [dl_image(url) for url in t.photos]
img_md = '\n'.join([image_template(o) for o in img_list])
return(f'''
#### <a href = "{t.link}">*{t.timestamp}*</a>
<font size="5">{t.tweet}</font>
{img_md}
🗨️ {t.replies_count} ♺ {t.retweets_count} 🤍 {t.likes_count}
---
''')
def get_md(tweets, date):
tweets_text = ''.join(t for t in tweets)
return(f'''---
title: deepfates log {date}
layout: post
toc: true
comments: false
search_exclude: false
hide: true
categories: [tweets]
---
{tweets_text}
''')
from IPython.display import Markdown
yesterday = t.datestamp
y_tweets = [tw for tw in tweets if tw.datestamp == yesterday]
len(y_tweets)
Markdown(get_tweet([tw for tw in tweets if tw.datestamp == yesterday][-1]))
y_sorted = sorted(y_tweets, key=lambda x: x.datetime)
# [tweet.tweet for tweet in y_sorted]
Too many replies! Let's limit to just mine for now
y_md = get_md([get_tweet(t) for t in y_sorted if "@" not in t.tweet], yesterday)
len(y_md)
with open(f'../_posts/tweets/{yesterday}-tweets.md', 'w') as f:
print(y_md, file=f)
def write_day_page(day, tweets):
tweets = [tw for tw in tweets if tw.datestamp == day]
sorted_tweets = sorted(tweets, key=lambda x: x.datetime)
md = get_md([get_tweet(t) for t in sorted_tweets], day)
with open(f'../_posts/tweets/{day}-tweets.md', 'w') as f:
print(md, file=f)
self_tweets = [t for t in tweets if "@" not in t.tweet]
len(self_tweets)
days = set([t.datestamp for t in self_tweets])
len(days)
from tqdm import tqdm
for day in tqdm(days):
write_day_page(day, self_tweets)
I would also like to do analysis to see how often I tweet, and other facts. And maybe make a big list of links. Maybe next time.
For now you can find these secret tweet archives by searching in the Explore page. The days archived this time are as follows.
days