“How’d they do that?” Correcting punctuation

Posted March 25th, 2008 by erik

In my last post, I announced a new feature that automatically corrects the punctuation of questions using “fancy computer logic”. I’d like to reveal some of that computer logic, in the first in a series of blog posts where we explore the technology that drives Fluther.

Fluther uses the Django web framework, so our application code is written in Python. I just started learning Python and really like a lot of the features of the language.

One of those features is the re module, which allows you to search through text for patterns.

To automatically correct question punctuation, I first needed to detect two patterns:

  1. Questions that end in multiple question marks or exclaimation points
  2. Questions that end in no punctuation

Here is the regular expression I used to detect the first pattern:

MULTIPLE_PUNCTUATION_END = re.compile("[?|!]+$")

In English, this regular expression would read “a question mark or exclamation point, repeated one or more times, at the end of the string”. Note that I used re.compile, which compiles a regular expression for efficiency.

Detecting this pattern and swapping it out for a single punctuation mark can be done with just one line of code:

question = MULTIPLE_PUNCTUATION_END.sub(question[-1], question)

That was easy. My second task was to detect questions with no ending punctuation:

NO_PUNCTUATION_END = re.compile("[^(?|.|!)]$")

You would read this regular expression as “anything other than, a question mark, period, or exclaimation point, at the end of the string”.

When no punctuation is detected at the end of a question, I simply append a question mark.

if NO_PUNCTUATION_END.search(question):
    question += "?"

With just a few lines of code, I was able to pretty-up the punctuation on Fluther.

Please leave comments to let me know what you think of this new series. I wanted to start with something relatively simple, but I’d like to dig even deeper into the Fluther codebase. What are some features on Fluther where you’ve asked “How’d they do that”?

Explore posts in the same categories: Fluther, How'd they do that?, News

7 Comments on ““How’d they do that?” Correcting punctuation”

  1. omfgTALIjustIMDu Says:

    Erik, this is SO cool. I’ve always been fascinated with code and how people actually get websites up and running. Thanks for sharing!

  2. klaas4 Says:

    This is indeed cool. I wish I could code like that. I’m still stuck at PHP, HTML, CSS and tables. ;-)

  3. robmandu Says:

    I’d be interested to see some of the underpinnings that allow another user to dynamically update the page I’m currently viewing. That level of interactiveness on the web was unpossible only a few short years ago. And Fluther’s implementation of that concept is schweet.

  4. Vincent Says:

    @klaas4 - you can easily to the same in PHP with PCRE (Perl-Compatible Regular Expresions). A very good tutorial can be found at http://www.tote-taste.de/X-Project/regex/index.php
    The advantage of using PCRE over some language-specific RE is that PCRE is very often used and has implementations in most any language, so you’d just need to learn it once.

    Anyway, I think these posts are quite interesting, keep them up :)

  5. paulc Says:

    I was just messing about and I found that a question like “ALMOST aLL CAPITALS” doesn’t get corrected. Years back I remember seeing IRC bots that would boot people out of channels if more than x% of their message was in capitals. I don’t know the Python syntax but something like this in Ruby would be:

    >> question = “ABCabc”
    >> question.scan(/[A-Z]{1}/).length.to_f / question.scan(/[a-zA-Z]{1}/).length.to_f
    => 0.5

  6. andrew Says:

    Great thought paulc. You can do the same in python with

    In [1]: question = "ABCabc"
    In [2]: float(len(re.findall(r'[A-Z]{1}', question))) / len(re.findall(r'[a-zA-Z]{1}', question))
    Out[3]: 0.5

  7. The Fluther Blog » Blog Archive » “How’d they do that?” Real-time chat. Says:

    […] Erik’s last entry in the series, robmandu asked how we implemented the real-time chat. So, ask and ye shall receive! […]

Comment: