“How’d they do that?” Correcting punctuation
Posted March 25th, 2008 by erikIn my last post, I announced a new feature that automatically corrects the punctuation of questions using “fancy computer logic”. I’d like to reveal some of that computer logic, in the first in a series of blog posts where we explore the technology that drives Fluther.
Fluther uses the Django web framework, so our application code is written in Python. I just started learning Python and really like a lot of the features of the language.
One of those features is the re module, which allows you to search through text for patterns.
To automatically correct question punctuation, I first needed to detect two patterns:
- Questions that end in multiple question marks or exclaimation points
- Questions that end in no punctuation
Here is the regular expression I used to detect the first pattern:
MULTIPLE_PUNCTUATION_END = re.compile("[?|!]+$")
In English, this regular expression would read “a question mark or exclamation point, repeated one or more times, at the end of the string”. Note that I used re.compile, which compiles a regular expression for efficiency.
Detecting this pattern and swapping it out for a single punctuation mark can be done with just one line of code:
question = MULTIPLE_PUNCTUATION_END.sub(question[-1], question)
That was easy. My second task was to detect questions with no ending punctuation:
NO_PUNCTUATION_END = re.compile("[^(?|.|!)]$")
You would read this regular expression as “anything other than, a question mark, period, or exclaimation point, at the end of the string”.
When no punctuation is detected at the end of a question, I simply append a question mark.
if NO_PUNCTUATION_END.search(question):
question += "?"
With just a few lines of code, I was able to pretty-up the punctuation on Fluther.
Please leave comments to let me know what you think of this new series. I wanted to start with something relatively simple, but I’d like to dig even deeper into the Fluther codebase. What are some features on Fluther where you’ve asked “How’d they do that”?
Explore posts in the same categories: Fluther, How'd they do that?, News
March 25th, 2008 at 8:31 am
Erik, this is SO cool. I’ve always been fascinated with code and how people actually get websites up and running. Thanks for sharing!
March 25th, 2008 at 8:36 am
This is indeed cool. I wish I could code like that. I’m still stuck at PHP, HTML, CSS and tables.
March 25th, 2008 at 8:43 am
I’d be interested to see some of the underpinnings that allow another user to dynamically update the page I’m currently viewing. That level of interactiveness on the web was unpossible only a few short years ago. And Fluther’s implementation of that concept is schweet.
March 25th, 2008 at 11:46 am
@klaas4 - you can easily to the same in PHP with PCRE (Perl-Compatible Regular Expresions). A very good tutorial can be found at http://www.tote-taste.de/X-Project/regex/index.php
The advantage of using PCRE over some language-specific RE is that PCRE is very often used and has implementations in most any language, so you’d just need to learn it once.
Anyway, I think these posts are quite interesting, keep them up
March 26th, 2008 at 7:12 am
I was just messing about and I found that a question like “ALMOST aLL CAPITALS” doesn’t get corrected. Years back I remember seeing IRC bots that would boot people out of channels if more than x% of their message was in capitals. I don’t know the Python syntax but something like this in Ruby would be:
>> question = “ABCabc”
>> question.scan(/[A-Z]{1}/).length.to_f / question.scan(/[a-zA-Z]{1}/).length.to_f
=> 0.5
March 26th, 2008 at 11:46 am
Great thought paulc. You can do the same in python with
In [1]: question = "ABCabc"
In [2]: float(len(re.findall(r'[A-Z]{1}', question))) / len(re.findall(r'[a-zA-Z]{1}', question))
Out[3]: 0.5
April 2nd, 2008 at 8:46 am
[…] Erik’s last entry in the series, robmandu asked how we implemented the real-time chat. So, ask and ye shall receive! […]