Colin McMillen's Blog 2021-07-21T20:53:40-04:00 Colin McMillen https://www.mcmillen.dev/ Creating robot behaviors with Python generators https://www.mcmillen.dev/blog/20070502-robot-behaviors-python.html Creating robot behaviors with Python generators

Posted 2007-05-02.

Generators are a powerful feature of the Python programming language. In a nutshell, generators let you write a function that behaves like an iterator. The standard approach to programming robot behaviors is based on state machines. However, robotics code is full of special cases, so a complex behavior will typically end up with a lot of bookkeeping cruft. Generators let us simplify the bookkeeping and express the desired behavior in a straightforward manner.

(Idea originally due to Jim Bruce.)

I’ve worked for several years on RoboCup, the international robot soccer competition. Our software is written in a mixture of C++ (for low-level localization and vision algorithms) and Python (for high-level behaviors). Let’s say we want to write a simple goalkeeper for a robot soccer team. Our keeper will be pretty simple; here’s a list of the requirements:

  1. If the ball is far away, stand in place.
  2. If the ball is near by, dive to block it. Dive to the left if the ball is to the left; dive to the right if the ball is to the right.
  3. If we choose a “dive” action, then “stand” on the next frame, nothing will happen. (Well, maybe the robot will twitch briefly....) So when we choose to dive, we need to commit to sending the same dive command for some time (let’s say one second).

The usual approach to robot behavior design relies on hierarchical state machines. Specifically, we might be in a “standing” state while the ball is far away; when the ball becomes close, we enter a “diving” state that persists for one second. Because of requirement 3, this solution will have a few warts: we need to keep track of how much time we’ve spent in the dive state. Every time we add a special case like this, we need to keep some extra state information around. Since robotics code is full of special cases, we tend to end up with a lot of bookkeeping cruft. In contrast, generators will let us clearly express the desired behavior.

On to the state-machine approach. First, we’ll have a class called Features that abstracts the robot’s raw sensor data. For this example, we only care whether the ball is near/far and left/right, so Features will just contain two boolean variables:

class Features(object):
    ballFar = True
    ballOnLeft = True

Next, we make the goalkeeper. The keeper’s behavior is specified by the next() function, which is called thirty times per second by the robot’s main event loop (every time the on-board camera produces a new image). The next() function returns one of three actions: "stand", "diveLeft", or "diveRight", based on the current values of the Features object. For now, let’s pretend that requirement 3 doesn’t exist.

class Goalkeeper(object):
    def __init__(self, features):
        self.features = features

    def next(self):
        features = self.features
        if features.ballFar:
            return 'stand'
        else:
            if features.ballOnLeft:
                return 'diveLeft'
            else:
                return 'diveRight'

That was simple enough. The constructor takes in the Features object; the next() method checks the current Features values and returns the correct action. Now, how about satisfying requirement 3? When we choose to dive, we need to keep track of two things: how long we need to stay in the "dive" state and which direction we dove. We’ll do this by adding a couple of instance variables (self.diveFramesRemaining and self.lastDiveCommand) to the Goalkeeper class. These variables are set when we initiate the dive. At the top of the next() function, we check if self.diveFramesRemaining is positive; if so, we can immediately return self.lastDiveCommand without consulting the Features. Here’s the code:

class Goalkeeper(object):
    def __init__(self, features):
        self.features = features
        self.diveFramesRemaining = 0
        self.lastDiveCommand = None

    def next(self):
        features = self.features
        if self.diveFramesRemaining > 0:
            self.diveFramesRemaining -= 1
            return self.lastDiveCommand
        else:
            if features.ballFar:
                return 'stand'
            else:
                if features.ballOnLeft:
                    command = 'diveLeft'
                else:
                    command = 'diveRight'
                self.lastDiveCommand = command
                self.diveFramesRemaining = 29
                return command

This satisfies all the requirements, but it’s ugly. We’ve added a couple of bookkeeping variables to the Goalkeeper class. Code to properly maintain these variables is sprinkled all over the next() function. Even worse, the structure of the code no longer accurately represents the programmer’s intent: the top-level if-statement depends on the state of the robot rather than the state of the world. The intent of the original next() function is much easier to discern. (In real code, we could use a state-machine class to tidy things up a bit, but the end result would still be ugly when compared to our original next() function.)

With generators, we can preserve the form of the original next() function and keep the bookkeeping only where it’s needed. If you’re not familiar with generators, you can think of them as a special kind of function. The yield keyword is essentially equivalent to return, but the next time the generator is called, execution continues from the point of the last yield, preserving the state of all local variables. With yield, we can use a for loop to “return” the same dive command the next 30 times the function is called! Lines 11-16 of the below code show the magic:

class GoalkeeperWithGenerator(object):
    def __init__(self, features):
        self.features = features

    def behavior(self):
        while True:
            features = self.features
            if features.ballFar:
                yield 'stand'
            else:
                if features.ballOnLeft:
                    command = 'diveLeft'
                else:
                    command = 'diveRight'
                for i in xrange(30):
                    yield command

Here’s a simple driver script that shows how to use our goalkeepers:

import random

f = Features()
g1 = Goalkeeper(f)
g2 = GoalkeeperWithGenerator(f).behavior()

for i in xrange(10000):
    f.ballFar = random.random() > 0.1
    f.ballOnLeft = random.random() < 0.5
    g1action = g1.next()
    g2action = g2.next()
    print "%s\t%s\t%s\t%s" % (
        f.ballFar, f.ballOnLeft, g1action, g2action)
    assert(g1action == g2action)

… and we’re done! I hope you’ll agree that the generator-based keeper is much easier to understand and maintain than the state-machine-based keeper. You can grab the full source code below and take a look for yourself.

#!/usr/bin/env python

class Features(object):
    ballFar = True
    ballOnLeft = True


class Goalkeeper(object):
    def __init__(self, features):
        self.features = features
        self.diveFramesRemaining = 0
        self.lastDiveCommand = None

    def next(self):
        features = self.features
        if self.diveFramesRemaining:
            self.diveFramesRemaining -= 1
            return self.lastDiveCommand
        else:
            if features.ballFar:
                return 'stand'
            else:
                if features.ballOnLeft:
                    command = 'diveLeft'
                else:
                    command = 'diveRight'
                self.lastDiveCommand = command
                self.diveFramesRemaining = 29
                return command


class GoalkeeperWithGenerator(object):
    def __init__(self, features):
        self.features = features

    def behavior(self):
        while True:
            features = self.features
            if features.ballFar:
                yield 'stand'
            else:
                if features.ballOnLeft:
                    command = 'diveLeft'
                else:
                    command = 'diveRight'
                for i in xrange(30):
                    yield command


import random
f = Features()
g1 = Goalkeeper(f)
g2 = GoalkeeperWithGenerator(f).behavior()

for i in xrange(10000):
    f.ballFar = random.random() > 0.1
    f.ballOnLeft = random.random() < 0.5
    g1action = g1.next()
    g2action = g2.next()
    print "%s\t%s\t%s\t%s" % (
        f.ballFar, f.ballOnLeft, g1action, g2action)
    assert(g1action == g2action)
]]>
2007-05-02T12:00:00-04:00
Emacs Tips https://www.mcmillen.dev/blog/20070522-emacs-tips.html Emacs Tips

Posted 2007-05-22, updated 2021-07-01.

These are some emacs keybindings (and other functions) that I once found useful. I’ve mostly used Sublime Text for the last few years, however.

Editing

C-[SPC]: set mark
C-x C-x: exchange point and mark
C-w: kill (AKA “cut”)
M-w: kill-ring-save (AKA “copy”)
C-y: yank (AKA “paste”)
M-h: Put region around current paragraph (mark-paragraph).
C-x h: Put region around the entire buffer (mark-whole-buffer).
C-u C-[SPC]: Move in mark ring
M-d: Kill word
M-[DEL]: Kill word backwards
C-M-k: Kill the following balanced expression (kill-sexp)

Registers

C-x r r: Save position of point in register r (point-to-register).
C-x r j r: Jump to the position saved in register r (jump-to-register).
C-x r s r: Copy region into register r (copy-to-register).
C-x r i r: Insert text from register r (insert-register).

Bookmarks

C-x r m [RET]: Set the bookmark for the visited file, at point.
C-x r m bookmark [RET]: Set the bookmark named bookmark at point (bookmark-set).
C-x r b bookmark [RET]: Jump to the bookmark named bookmark (bookmark-jump).
C-x r l: List all bookmarks (list-bookmarks).
M-x bookmark-save: Save all the current bookmark values in the default bookmark file.

Miscellaneous

M-` shows the menu.
M-x highlight-changes-mode toggles showing the changes you’ve made to the file since the last save.

]]>
2007-05-22T12:00:00-04:00
Gnokii Tips https://www.mcmillen.dev/blog/20070522-gnokii-tips.html Gnokii Tips

Posted 2007-05-22, updated 2021-07-01.

I own a Nokia 6102i phone (provided by Cingular). gnokii is a Linux program that lets me interface with the phone. Here are some recipes:

File I/O

gnokii --getfilelist "A:\\predefgallery\\predeftones\\predefringtones\\*"

gnokii --putfile WiiSports.mp3 "A:\\predefgallery\\predeftones\\predefringtones\\WiiSports.mp3"

Ring Tones

Voice mail picks up in 20 seconds, so a ring tone should be about 20 seconds long.

The easiest way to chop an MP3 in Linux is with dd; the drawback is that you need to specify length in KB, not time. To chop an MP3 to be 200 KB long, do:

dd if=Mii\ Channel.mp3 of=MiiChan2.mp3 bs=1k count=200

Phonebook

To make a Phonebook.ldif file from the phone (suitable for import into Thunderbird):

gnokii --getphonebook ME 1 end --ldif > Phonebook.ldif

To add the entries in Phonebook.ldif to the phone:

cat Phonebook.ldif | gnokii --writephonebook -m ME --find-free --ldif

You can specify --overwrite instead of --find-free if you want to overwrite all the entries, but this will lose some data (e.g. speed dial, preferred numbers).

Multimedia

You can get photos like this:
gnokii --getfile "A:\\predefgallery\\predefphotos\\Image000.jpg"
They are 640x480 JPG files. (You can also configure the camera so that it takes pictures at 80x96.)

You can also store files:
gnokii --putfile silly.jpg "A:\\predefgallery\\predefphotos\\silly.jpg"
These show up on the phone in My Stuff/Images. The files don’t need to be any specific size; they are autoscaled. GIFs probably also work.

Videos live here:
gnokii --getfile "A:\\predefgallery\\predefvideos\\Video000.3gp"
VLC seems to be able to play .3gp files, but the audio doesn’t work.

Audio recordings live here:
gnokii --getfile "A:\\predefgallery\\predefrecordings\\Audio000.amr"

Unfortunately, nothing I knew of in 2007 (when this page was first written) would play .amr files, but these days (2021) perhaps ffmpeg input.amr output.mp3 would work. You might have to use the -ar flag to specify the audio rate. I haven’t actually tried this though!

]]>
2007-05-22T12:00:00-04:00
LaTeX Tips https://www.mcmillen.dev/blog/20070522-latex-tips.html LaTeX Tips

Posted 2007-05-22; updated 2021-07-01.

Note that these instructions are over a decade old. Some documentation may be out of date. :)

Embedding fonts in PDFs

To check whether fonts are embedded, use pdffonts, which is included with xpdf. pdffonts gives output that looks like this:

$ pdffonts paper.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
FHQIOS+NimbusRomNo9L-Medi            Type 1       yes yes no       6  0
NEESMN+NimbusRomNo9L-Regu            Type 1       yes yes no       9  0
PJQNOS+CMSY10                        Type 1       yes yes no      12  0

You want emb to be yes for all fonts (and possibly sub as well; also, all fonts should be Type 1, not Type 3). By default in Ubuntu, pdflatex should embed all fonts. Just in case, you can check /etc/texmf/updmap.d/00updmap.cfg, which should have a line like this:

pdftexDownloadBase14 true

If it’s set to false, change it to true, then run update-updmap as root. Remake the PDF; if it still has non-embedded fonts, your figures are probably to blame. Check your PDF figures and make sure their fonts are embedded (using the pdffonts command). For anything that doesn’t have embedded fonts, you can try the following magical invocation:

gs -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
-sPAPERSIZE=letter -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer \
-dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true \
-dEmbedAllFonts=true -sOutputFile=figures/Mprime-new.pdf -f figures/Mprime.pdf

This creates a file figures/Mprime-new.pdf that is hopefully identical to the input file figures/Mprime.pdf, except that the fonts are embedded. Run pdffonts on it to check.

Once all your figures are in PDF format, remake the paper again. Hopefully, all your fonts are now embedded — check again with pdffonts.

]]>
2007-05-22T12:00:00-04:00
Vim Tips https://www.mcmillen.dev/blog/20070807-vim-tips.html Vim Tips

Posted 2007-08-07.

Here’s some links about learning/mastering vim.

Why use vim?

Tutorials

]]>
2007-08-07T12:00:00-04:00
93% of Paint Splatters are Valid Perl Programs https://www.mcmillen.dev/sigbovik/ 93% of Paint Splatters are Valid Perl Programs

Posted 2019-04-01.

TLDR: read the paper and view the gallery of pretty Perl programs.

In this paper, we aim to answer a long-standing open problem in the programming languages community: is it possible to smear paint on the wall without creating valid Perl?

We answer this question in the affirmative: it is possible to smear paint on the wall without creating a valid Perl program. We employ an empirical approach, using optical character recognition (OCR) software, which finds that merely 93% of paint splatters parse as valid Perl. We analyze the properties of paint-splatter Perl programs, and present seven examples of paint splatters which are not valid Perl programs.

Screenshot of a Twitter conversation. Adrienne Porter Felt says: "I don't want to teach my kid to code. I want him to splash in muddy puddles and smear paint on the walls and read novels under the covers way too late at night. I grew up too soon and wish I'd had more time to be a kid. Why do schools teach vocational skills so young these days?" Jake Archibald replies: "but is it possible to smear paint on the wall without creating valid Perl?"

Accepted for publication at SIGBOVIK 2019, held April 1st 2019 in Pittsburgh. Winner of a Unwitting Participation Ribbon, “an unwelcome brand we’ve affixed to each paper determined after careful scrutiny to have included a genuine artifact, thereby furthering the admirable causes of open science and fruitful procrastination.”

Read it on Google Docs or download a PDF. Or grab the entire SIGBOVIK 2019 proceedings; I’m on page 174.

Supplementary Materials

Here’s all the paint splatters on a single page, along with the valid Perl source code corresponding to each. “Not valid” is written in red for those images which did not parse as valid Perl programs. If different OCR settings recognized multiple valid Perl programs, I chose the one that seemed the most “interesting”, according to my own aesthetic sense.

Here’s a tarball of 100 paint-splatter images that were used as the main dataset for this paper.

(source code not available yet because i am bad at GitHub)

Errata

There are a few paint splatter Perl programs that I didn’t recognize as “interesting” until after the SIGBOVIK submission deadline. For example, this splatter is recognized by OCR as the string lerzfijglpFiji-j, which evaluates to the number 0 in Perl:

paint splatter

The image below is recognized as the string -*?, which also evaluates to the number 0 in Perl:

paint splatter

Another surprising program is shown below; OCR recognizes this image as the string ;i;c;;#\\?z{;?;;fn':.;, which evaluates to the string c in Perl:

paint splatter

Finally, this image is recognized as the string ;E,'__', which evaluates to the string E__ in Perl:

paint splatter

]]>
2019-04-01T12:00:00-04:00
My first paper in 10 years?! https://www.mcmillen.dev/blog/20190403-update.html My first paper in 10 years?!

Posted 2019-04-03.

It’s been nearly two months since my last day at Google, so I guess I should finally make use of this newsletter :)

I wrote a paper which was published on April 1st as part of SIGBOVIK 2019: “93% of Paint Splatters are Valid Perl Programs”. In this paper, I answer a long-standing open problem in the programming languages community: is it possible to smear paint on the wall without creating valid Perl?

(Long-standing since February 13, 2019, when a Twitter conversation between Adrienne Porter Felt & Jake Archibald posed the question.)

To answer this question, I downloaded 100 images of paint splatters from Pinterest, ran the open-source Tesseract OCR engine to turn each into a text string, and then sent that text to the Perl interpreter to see whether that text successfully parsed as Perl. It turns out that 93 of the 100 paint splatters do parse as valid Perl, but since 7% do not, I conclude that it is possible to smear paint on a wall without creating valid Perl.

You might suspect there is some chicanery going on with this result. You’d be correct, but… honestly there’s not that much chicanery going on. You’ll have to read the paper for details… and for my attempts at academic humor. :)

There’s also some supporting material on this website, including a gallery of all 100 images and their associated valid Perl code. Here’s a screenshot of some of them. (Did you know that the string lerzfijglpFiji-j evaluates to the number 0 in Perl?)

screenshot of 17 paint splatters, and the Perl programs they represent

As it turns out, the publication date of my paper was exactly 10-years-minus-a-day since my Ph.D. thesis defense. I’d planned on travelling back to Carnegie Mellon to give this talk live at SIGBOVIK 2019, but unfortunately came down with a nasty cold-and-cough so I had to cancel my trip. :( Perhaps I can give a belated talk at next year’s conference.

For more light-hearted and vaguely CS-shaped research papers, check out the rest of the SIGBOVIK 2019 proceedings. I particularly enjoyed “Elo World, a framework for benchmarking weak chess engines” by tom7 (“The computer players include some traditional chess engines, but also many algorithms chosen for their simplicity, as well as some designed to be competitively bad”.)

Some other random things that I’ve been up to in the last month-and-a-half:

  • ohnosay, which is like “cowsay” but for comics in the style of webcomicname. [GitHub] This was a good excuse to get a Linux development environment set up on a persistent Google Cloud instance & to learn how to GitHub. Since then, I also realized that the World Outside Google uses Python 3, so I’ve started learning that :)

    a three panel comic displayed on a linux terminal: "i will write a silly program" "hm, what did i do with my ssh credentials?" "oh no"

  • Gardening! Last August I randomly planted some peppermint in a railing container on my balcony, and it went gangbusters. This spring I’ve actually planned out a whole porch-garden (like Stardew Valley but real life). Last year’s mint has started growing again, and I’ve added spearmint and mojito mint. I’ve also got two types of peas, two mixes of salad greens, and spinach planted. Later I’ll be planting carrots, basil, and rosemary. The peas just started sprouting a couple days ago, which is exciting!

    a container showing an assortment of "asian salad" greens

  • Gloomhaven! This is a cooperative legacy-style board game — a fun dungeon-crawler that doesn’t need a DM, so everyone gets to play. Our group is still only a few scenarios in, but we’re enjoying it so far. SO MANY HEX TILES. I’m also getting ready to paint our party’s miniatures, which is another (potential) new hobby of mine; more to come in a future newsletter, I suspect :)

  • Video games: just started Sekiro: Shadows Die Twice on PS4. Recently completed (and really enjoyed) New Super Mario Bros. U Deluxe for Nintendo Switch (though Nintendo seems to be trying to give Google a run for their money on ridiculous product names). I’ve also been playing Total War: Warhammer 2 regularly, and Splatoon 2 from time to time. I tried getting into XCOM 2 & enjoyed it, but I’m not sure I’m interested enough to finish the campaign. I keep going back to Total War when I want something in the tactical / strategy genre.

  • Guitar: starting to learn fingerstyle, with the goal of eventually becoming good enough to play Dream of the Shore Bordering Another World from Chrono Cross.

  • Computer stuff: upgraded my PC’s video card (it was many years old) and upgraded to an all-SSD setup. It turns out that 2TB SSDs aren’t that expensive any more.

  • Getting healthcare without an employer is a disaster — even in Massachusetts, which reportedly has one of the best systems in the US. Still working on straightening out my paperwork. Apparently they refuse to believe in my proof of health-insurance termination, even though it’s lettermarked by Google and everything.

Thanks for reading! Hopefully the next update will come sooner than 2 months and thus be a bit shorter than this one ended up being :)

~ Colin

]]>
2019-04-03T12:00:00-04:00
A new year & a sneaky new project https://www.mcmillen.dev/blog/20200209-sneak.html A new year & a sneaky new project

Posted 2020-02-09.

I can’t believe it’s here so quickly, but: today marks a year since my last day at Google. That seemed like a good occasion to dust off this newsletter & let you know what I’ve been up to: making a videogame!

I’m working on a stealth-based 2D platformer where you don’t have to kill anyone unless you want to. It’ll be possible to get through every level by sneaking and misdirection, but it’ll require you to be careful and tactical to do so… and of course if that doesn’t work out, you can always draw your swords and go in fighting! So far I’ve given it “Sneak” as a codename, but that’s definitely a placeholder until I can flesh out more of the world.

So far Sneak runs on PC & Xbox, but I hope to add Switch and PS4 support within the next couple months. I’m using a C# framework called MonoGame, which provides low-level graphics & audio support across all these platforms. In order to write games for Switch or PS4, you need to apply to Nintendo & Sony to get access to their platform-specific SDKs. So my first real milestone will be coming up with a compelling Game Design Doc & gameplay videos so that they can (hopefully) be convinced that I’m worth taking seriously. Wish me luck!

Sony won’t even talk to anyone unless they’re a Real Business (& Nintendo kinda wants you to be too), so as of… yesterday, I’m officially the founder of SemiColin Games LLC (and, for now at least, the only member…)

If you want to follow along, I have an extremely-placeholder website up at semicolin.games where you can sign up for Yet Another Newsletter if you like, and a Twitter account @SemiColinGames that would appreciate a follow. I’ll probably set up a devblog with an RSS feed too eventually, but that’s not quite ready yet. When it is, I’ll send a quick update here.

I only got started in December & a lot of my work so far has been on building infrastructure (and learning how to start a business), so I don’t have any Extremely Compelling Gameplay Videos yet. Here’s a short animated GIF for now. The bloopers on Twitter might be more fun though. :)

Animation of a pixel-art character swinging a sword
(Art definitely not final!)

Thanks for following along with me on this adventure! Hopefully my next update will come more quickly, and be less wordy! I’ve wanted to make videogames since I was Literally A Kid, so I’m quite excited to finally be doing that full-time, and to hopefully share something good with all of you. When I’m at a stage where I want alpha testers, I’ll definitely be asking here first.

Thanks for your support!
~ Colin (& SemiColin Games)

]]>
2020-02-09T12:00:00-04:00
Downvotes & Dislikes Considered Harmful https://www.mcmillen.dev/blog/20210721-downvotes-considered-harmful.html Downvotes & Dislikes Considered Harmful

Posted 2021-07-21.

If you’re letting users rank content, you probably don’t need and don’t want downvotes. Here’s why.

(This post inspired by news that Twitter is considering adding “Dislikes” to Tweets.)

Background

In my past life at Google, I was responsible for co-creating Memegen, a large & influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote & downvote the memes of others. Memegen’s home page is the Popular page, which shows the most-upvoted memes of the past day.

Adding downvotes to Memegen was my single greatest mistake.

The problems of downvotes

Any voting system where most posts mostly receive upvotes, but also allows downvotes, has a huge problem:

No matter how you do the math, downvotes count more than upvotes do.

Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don’t want surfaced on the top-N posts page. This is true even if you’re using a sophisticated ranking algorithm like Wilson score intervals to rank posts (as Reddit & many other sites do).

Downvotes aim to solve the problem of filtering out low-quality content, but are too easily coopted by trolls to let them filter out people — often for bad reasons that have more to do with the identity of who’s posting rather than the content of their posts.

From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.

What does a downvote actually mean?

The other problem with downvotes is that it’s unclear to everyone what they mean. Does a downvote mean that this particular post is:

  1. offensive or illegal and needs to be removed ASAP?
  2. a duplicate?
  3. just something you personally don’t like?
  4. off-topic for the forum?

As the creator of a social product, you need give people different buttons for these.

Offensive or illegal posts (#1) shouldn’t be handled by an algorithmic rating system. You need actual human moderators for that — and enough of them that they can review those reports in a timely manner. (I hope you’re willing to train & pay them well!)

For duplicate posts (#2) it’s nicer & more informative if your software simply says “hey, this submission is a duplicate of this other thing, why don’t you all check out that post instead?”

#3 is solved by default — people can simply not vote for content they don’t like.

#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it’s obviously spam).

How to actually rank posts

Once you’ve dispensed with the idea of downvotes, the main things a user cares about are: “what are the best things that have been posted today?” (or in the last hour / week / etc) or “what are the best things since I last visited?”

On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!

It turns out that’s it’s actually a bit trickier to implement than something like a Wilson score interval, so here’s some tips on how to do that.

We need to store each vote and when it was cast, and then when it’s time to compute the “most popular in the last day” page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.

Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It’s okay if the info is slightly out of date! Most users won’t care or notice if it takes a few minutes for things to move around.

How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like score_hour, score_day, score_month, score_alltime. A mapreduce was responsible for updating these values every few minutes.

Obviously you don’t need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, most of the posts in your system won’t need any update.

Conclusion

Downvotes are a blunt instrument for users to say “I don’t like this content”.

It’s easy for small groups of trolls to misuse downvotes as a vehicle for harassing & silencing groups of (often marginalized) people.

Downvotes reduce engagement by scaring off first-time posters.

Instead of adding downvotes to your site, build specific tools that handle specific kinds of unwanted posts.

(This post is a distillation & refinement of some thoughts originally posted in a Twitter thread in September 2020.)

Comments?

Feel free to reply to my post on Twitter about this article. Thanks!

]]>
2021-07-21T12:00:00-04:00