Записи о веб-разработке, рекомендованные Артемием Трегубенко

Google feels Opera’s pain

2009-01-30T19:39:29Z

It seems Opera is not the only web browser that has to resort to UA spoofing in order to get around browser sniffing. Google recently released a patch that changes the Chrome user agent string when browsing Microsoft’s Hotmail site. From the official changelog:

While the Hotmail team works on a proper fix, we’re deploying a workaround that changes the user agent string that Google Chrome sends when requesting URLs that end with mail.live.com.

Omar Shahine, Lead Program Manager on the Hotmail Front Door team, didn’t take too kindly to that, and had this to say (emphasis is mine):

You think that Hotmail is a Web page and you expect a service with hundreds of millions of users and thousands of servers to stop what it’s doing, fix a bug for a browser that the majority of its customers do not use, and spin up an out-of-band release?

Hey, Google! It’s not much fun being on the other side of it, is it?

(Via CNET)

Coding Horror: A Scripter at Heart

2009-01-26T18:54:00Z

Shared by arty
Web applications aren't a hoax, but like the mechanical Turk, they do have a programmer inside. And that programmer is sketching away madly.

Coding Horror: A Scripter at Heart. Sigh. I cannot believe that the false distinction between “scripting” and “programming” is still being discussed.

OCR and Neural Nets in JavaScript

2009-01-23T20:00:15Z

A pretty amazing piece of JavaScript dropped yesterday and it's going to take a little bit to digest it all. It's a GreaseMonkey script, written by 'Shaun Friedle', that automatically solves captchas provided by the site Megaupload. There's a demo online if you wish to give it a spin.

Now, the captchas provided by the site aren't very "hard" to solve (in fact, they're downright bad - some examples are below):

But there are many interesting parts here:

The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
The script includes an implementation of a neural network, written in pure JavaScript.
The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used - in a sort of crude form of Optical Character Recognition (OCR).

If we crack open the source code we can see how it works. A lot of it comes down to how the captcha is implemented. As I mentioned before it's not a very good captcha. It has 3 letters, each in a separate color, using a possible 26 letters, and they're all in the same font.

The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.

function convert_grey(image_data){
for (var x = 0; x < image_data.width; x++){
for (var y = 0; y < image_data.height; y++){
var i = x*4+y*4*image_data.width;
var luma = Math.floor(image_data.data[i] * 299/1000 +
image_data.data[i+1] * 587/1000 +
image_data.data[i+2] * 114/1000);

image_data.data[i] = luma;
image_data.data[i+1] = luma;
image_data.data[i+2] = luma;
image_data.data[i+3] = 255;
}
}
}

The canvas is then broken apart into three separate pixel matrices - each containing an individual character (this is quite easy to do - since each character is a separate color, they're broken apart just based upon the different colors used).

filter(image_data[0], 105);
filter(image_data[1], 120);
filter(image_data[2], 135);

function filter(image_data, colour){
for (var x = 0; x < image_data.width; x++){
for (var y = 0; y < image_data.height; y++){
var i = x*4+y*4*image_data.width;

// Turn all the pixels of the certain colour to white
if (image_data.data[i] == colour) {
image_data.data[i] = 255;
image_data.data[i+1] = 255;
image_data.data[i+2] = 255;

// Everything else to black
} else {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}
}
}
}

Finally any extraneous noisy pixels are removed from the image (providing a clear character). This is done by looking for white pixels (ones that've been matched) that are surrounded (above and below) by black, un-matched, pixels. If that's the case then the matching pixel is simply removed.

var i = x*4+y*4*image_data.width;
var above = x*4+(y-1)*4*image_data.width;
var below = x*4+(y+1)*4*image_data.width;

if (image_data.data[i] == 255 &&
image_data.data[above] == 0 &&
image_data.data[below] == 0) {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}

We're getting really close to having a shape that we can feed into the neural network, but it's not completely there yet. The script then goes on to do some very crude edge detection on the shape. The script looks for the top, left, right, and bottom-most pixels in the shape and turns it into a rectangle - and converts that shape back into a 20 by 25 pixel matrix.

cropped_canvas.getContext("2d").fillRect(0, 0, 20, 25);
var edges = find_edges(image_data[i]);
cropped_canvas.getContext("2d").drawImage(canvas, edges[0], edges[1],
edges[2]-edges[0], edges[3]-edges[1], 0, 0,
edges[2]-edges[0], edges[3]-edges[1]);

image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0,
cropped_canvas.width, cropped_canvas.height);

So - after all this work, what do we have? A 20 by 25 matrix containing a single rectangle, drawn in black and white. Terribly exciting.

That rectangle is then reduced even further. A number of strategically-chosen points are then extracted from the matrix in the form of "receptors" (these will feed the neural network). For example a receptor might be to look at the pixel at position 9x6 and see if it's "on" or not. A whole series of these states are computed (much less than the full 20x25 grid - a mere 64 states) and fed into the neural network.

The question that you should be asking yourself now is: Why not just do a straight pixel comparison? Why all this mess with the neural network? Well, the problem is, with all of reduction of information a lot ambiguity exists. If you run the online demo of this script you're more likely to find the occasional failure from the straight pixel comparison than from running it through the network. That being said, for most users, a straight pixel comparison would probably be sufficient.

The next step is attempting to guess the letter. The network is being fed with 64 boolean inputs (collected from one of the extracted letters) along with another series of pre-computed values. One of the concepts behind how a neural network works is that you pre-seed it with some of the results from a previous run. It's likely that the author of this script simply ran it again and again and collected a whole series of values to get an optimal score. The score itself may not have any particular meaning (other than to the neural network itself) but it helps to derive the value.

When the neural net is run it takes the 64 values that've been computed from one of the characters in the captcha and compares it against a single pre-computed letter of the alphabet. It continues in the manner assigning a score for each letter of the alphabet (a final result might be 'A 98% likely', 'B 36% likely', etc.).

Going through the three letters in the captcha the final result is devised. It's not 100% perfect (I wonder if better scores would be achieved if the letter wasn't turned into a featureless rectangle before all these computations) but it's pretty good for what it is - and pretty amazing considering that it's all happening 100% in the browser using standards-based technology.

As a note - what's happening here is rather instance-specific. This technique *might* be able to work on a few more poorly-constructed captchas, but beyond that the complexity of most captchas just becomes too great (especially so for any client-side analysis).

I'm absolutely expecting some interesting work to be derived from this project - it holds a lot of potential.

Open Source Software, Self Service Software

2009-01-23T07:59:59Z

Have you ever used those self-service checkout machines at a grocery store or supermarket?

What fascinates me about self-service checkout devices is that the store is making you do work they would normally pay their employees to do. Think about this for a minute. You're playing the role of the paying customer and the cashier employee. Under the watchful eyes of security cameras and at least one human monitor, naturally, but still. We continue to check ourselves out. Not only willingly, but enthusiastically. For that one brief moment, we're working for the supermarket at the lowest possible pay scale: none.

That's the paradox of self-checkout. But to me it's no riddle at all: nobody else in that store cares about getting Jeff Atwood checked out nearly as much as Jeff Atwood does. I always choose self-service checkout, except in extraordinary cases. The people with the most vested interest in the outcome of the checkout process are the very same people that self-checkout puts in charge: me! How could it not work? It's the perfect alignment of self-interest.

I don't mean this as a dig against supermarket employees. They're (usually) competent and friendly enough. I should know; I worked my way through high school and part of college as a Safeway checker. I tried my level best to be good at my job, and move customers through my line as quickly as possible. I'm sure I could check someone out faster than they could do it themselves. But there's only one me, and at most a half-dozen other checkers working the store, compared to the multitudes of customers. It doesn't scale.

If you combine the self-interest angle and the scaling issue, self-service checkout seems obvious, a win for everyone. But self-service is not without issues of its own:

What if the item you're scanning isn't found, or can't be scanned?
Some of the self-service machines have fairly elaborate and non-obvious rules in place, to prevent fraud and theft. Also, the user interface can sometimes be less than ideal on the machines.
How do you handle coupons? Loyalty cards? Buying 20 of the same item? Scanning the wrong item?
The self-service stations are lightly manned. The ratio between employee monitors and self-checkout machines runs about 1:4 in my experience. If you have a problem, you might end up waiting longer than a traditional manned checkout.
How do you ring up items like fruit and vegetables which don't have UPC codes, and have to be weighed?
What about unusual, awkwardly shaped items or oversize items?
Customers who have trouble during self-checkout may feel they're stupid, or that they did something wrong. Guess where they're going to lay the blame for those feelings?

There are certain rituals to using the self-service checkout machines. And we know that. We programmers fundamentally grok the hoops that the self-service checkout machines make customers jump through. They are, after all, devices designed by our fellow programmers. Every item has to be scanned, then carefully and individually placed in the bagging area which doubles as a scale to verify the item was moved there. One at time. In strict sequence. Repeated exactly the same every time. We live this system every day; it's completely natural for a programmer. But it isn't natural for average people. I've seen plenty of customers in front of me struggle with self-service checkout machines, puzzled by the workings of this mysterious device that seems so painfully obvious to a programmer. I get frustrated to the point that I almost want to rush over and help them myself. Which would defeat the purpose of a.. self-service device.

I was thinking about this while reading Michael Meeks' article, Measuring the true success of OpenOffice.org. He reaches some depressing conclusions about the current state of OpenOffice, a high profile open source competitor to Microsoft Office:

Crude as they are, the statistics show a picture of slow disengagement by Sun, combined with a spectacular lack of growth in the developer community. In a healthy project we would expect to see a large number of volunteer developers involved, in addition - we would expect to see a large number of peer companies contributing to the common code pool; we do not see this in OpenOffice.org. Indeed, quite the opposite. We appear to have the lowest number of active developers on OO.o since records began: 24, this contrasts negatively with Linux's recent low of 160+. Even spun in the most positive way, OpenOffice.org is at best stagnating from a development perspective.

This is troubling, because open source software development is the ultimate self-service industry. As Michael notes, the project is sadly undermining itself:

Kill the ossified, paralysed and gerrymandered political system in OpenOffice.org. Instead put the developers (all of them), and those actively contributing, into the driving seat. This in turn should help to kill the many horribly demotivating and dysfunctional process steps currently used to stop code from getting included, and should help to attract volunteers. Once they are attracted and active, listen to them without patronizing.

Indeed, once you destroy the twin intrinsic motivators of self-determination and autonomy on an open source project, I'd argue you're no better off than you were with traditional closed source software. You've created a self-service checkout machine so painful to use, so awkward to operate, that it gives the self-service concept a bad name. And that's heartbreaking, because self-service is the soul of open source:

Why is my bug not fixed? Why is the UI still so unpleasant? Why is performance still poor? Why does it consume more memory than necessary? Why is it getting slower to start? Why? Why? The answer lies with developers: Will you help us make OpenOffice.org better?

In order for open source software projects to survive, they must ensure that they present as few barriers to self-service software development as possible. And any barriers they do present must be very low -- radically low. Asking your customers to learn C++ programming to improve their Open Office experience is a pretty far cry indeed from asking them to operate a scanner and touchscreen to improve their checkout experience. And if you can't convince an audience of programmers, who are inclined to understand and love this stuff, who exactly are you expecting to convince?

So, if you're having difficulty getting software developers to participate in your open source project, I'd say the community isn't failing your project. Your project is failing the community.

[advertisement] Tired of restoring deleted files? Get PA File Sight and track down the culprit. PA File Sight – file auditing made easy. Download the Free Trial!

AJAX APIs Playground

2009-01-22T18:38:56Z

AJAX APIs Playground. Ferociously useful collection of executable and editable example code for all(?) of Google’s JavaScript APIs, including Google Maps and the increasingly interesting Visualization API.

Using HTTP Headers to Serve Styles

2009-01-22T13:46:36Z

How many times have you played out the following scenario?

Makes local changes to your style sheet(s).
Upload the changes to the staging server.
Switch to your browser and hit “reload”.
Nothing happens.
Force-reload. Nothing happens.
Go back to make sure the upload is finished and successful.
Reload again. Still nothing.
Try sprinkling in !important. Upload, reload, nothing.
Start swearing at your computer.
Check Firebug to see what’s overriding your new styles. Discover they aren’t being applied at all.
Continue in that vein for several minutes before realizing you were hitting reload while looking at the live production server, not the staging server.
Go to the staging server and see all your changes.
Start swearing at your own idiocy.

This happened to me all the time as we neared completion of the redesign of An Event Apart. It got to the point that I would deliberately add obvious, easily-fixable-later errors to the staging server’s styles, like a light red page background.

Now that we’re launched and I have time to actually, you know, think about how I do this stuff, it occurred to me that what I should have done is create a distinct “staging” style sheet with the obvious error or other visual cue. Maybe repeat the word “staging” along the right side of the page with a background image, like a watermark:

html {background: url(staging-bg.png) 100% 50% repeat-y;}

Okay, cool. Then I just need to have that served up with every page on the staging server, without it showing up on the production server.

One way to do that is just make sure the image file never migrates to production. That way, even if I accidentally let the above CSS get onto production, the user will never see it. But that’s inelegant and wasteful, and fragile to boot: if the styles accidentally migrate, who’s to say the image won’t as well? And while I’m sure there are all kinds of CMS and CVS and Git and what-have-you tricks to make sure that doesn’t happen, I am both clumsy and lazy. Not only do I have great faith in my ability to screw up my use of such mechanisms, I don’t really want to be bothered to learn them in the first place.

So: why not send the link to the style sheet using HTTP headers? Yeah, that’s the ticket! I can just add a line to my .htaccess file on the staging server and be done. Under Apache, which is what I use:

Header add Link "</staging.css>;rel=stylesheet;type=text/css;media=all"

Those angle brackets are, so far as I can tell, absolutely mandatory, so bear that in mind. And of course the path in those brackets can be absolute, unlike what I’ve shown here. I’m sure there are simple PHP equivalents, which I’ll leave to others to work out. I really didn’t need to add the media=all part, but what the heck.

Seems so simple, doesn’t it? Almost… too simple. Like there has to be a catch somewhere. Well, there is. The catch is that this is not supported by all user agents. Internet Explorer, for one; Safari, for another. It does work in Opera and Gecko browsers. So you can’t deploy this on your production server, unless of course you want to use it as a way to hide CSS from both IE and Safari. (For whatever reason.) It works great in Gecko-based production environments like mine, though.

I looked around for a quick how-to on do this, and couldn’t find one. Instead, I found Anne van Kesteren’s test page, whose headers I sniffed in order to work out the proper value syntax; and a brief page on the Link header that didn’t mention CSS at all. Nothing seemed to put the two of them together. Nothing until now, that is.

Обама раздаст нации Silverlight 2

2009-01-19T14:27:14Z

Shared by arty
мда, микрософт стал лоббировать себя более открыто: в штатах через обаму, у нас - через ВГТРК

Компания Microsoft обеспечит веб-трансляцию завтрашней церемонии инаугурации избранного президента США.

второе приближение к идеалу

2009-01-18T11:34:46Z

недавний пост очень явно продемонстрировал, что даже у думающих людей качество восприятия картинок и текста отличается на порядок — сравните число комментов и «лайков». Более того, даже читать сам пост на эту тему взялись очень немногие, поэтому большинство комментариев было на тему «ах, как ужасно у вас всё отдизайнено». Окей, раз жизнь выдвигает такие требования, будем им следовать : )

поскольку мне всё-таки хочется продвигать directed identity в массы, я подготовил второй заход на идеальный openid, с учётом ошибок первого. Теперь у меня есть улучшенный прототип, с яваскриптом и блекджеком. Вернее, даже два прототипа. И картинки для привлечения визуалов!

первый вариант заглядывает немного дальше в будущее, скрывая не только второстепенные способы входа, но даже традиционную форму. Второй несколько ближе к нашим реалиям, но довольно тяжёл сам по себе, и поэтому в него не удалось вставить список альтернатив. Кроме того, менее актуальный для наших пенатов гугл поменялся местами с рамблером.

сейчас я надеюсь на комментарии, как ещё можно доработать интерфейс (не вёрстку и не скрипты), и в понедельник-вторник выложу этот новый вариант на хабр. Судя реакции во френдфиде, флейма на хабре будет выше крыши, так что поддержке понимающих людей я буду рад ; )

в конце концов, кто будет делать этот веб лучше, если не мы!

Rules of Database App Aging

2009-01-18T09:09:43Z

Rules of Database App Aging. Peter Harkins: All fields become optional, all relationships become many-to-many, chatter always expands. This is why document oriented databases such as CouchDB are looking more and more attractive.

Sloppy - the slow proxy for testing sites

2009-01-18T08:43:37Z

Sloppy—the slow proxy. Java Web Start GUI application which runs a proxy to the site of your choice simulating lower connection speeds—great for testing how well your ajax holds up under poor network conditions.

Project Voldemort

2009-01-18T08:35:19Z

Shared by arty
memcache by linkedin

Project Voldemort. Yet Another “big, distributed, persistent, fault-tolerant hash table”—this time from LinkedIn, released under the Apache 2.0 license. The approach to consistency is interesting—instead of using distributed transactions, they use versioning and “resolve inconsistencies at read time”. It also uses consistent hashing (as seen in libketama) to select servers. The design document has lots more information.

пустые элементы наносят ответный удар

2009-01-16T14:25:52Z

верстальщики ещё помнят одну из проблем древнего эксплорера: если элемент не имеет никакого содержимого, у него то ли не рисуется фон, то ли вылезают какие-то другие косяки

но это эксплорер. А тут сюрприз пришёл откуда не ждали. Платформа Android — флагман гиковского мобилостроения. Используемый на нём движок WebKit — единственный открытый конкурент оперы в скорости развития. И вдруг нате вам: клики по пустым элементам не засчитываются! Снова здравствуй, пробел : )

впрочем, нет, мы пошли семантичным путём, и поставили туда настоящую кнопку ; )

хотя от других проблем, типа отсутствия клавиатурных событий, это не спасает : (

ссылка о цифровом видео

2009-01-16T07:32:28Z

В блоге Mark Pilgrim появилась неоконченная пока что серия статей о кодировании видео. С одной стороны, написано достаточно просто, с другой — вполне подробно. Поэтому, если вас интересовало, что такое .avi, divx и h264, но вы так и не узнали этого, рекомендую почитать «A gentle introduction to video encoding».

Каскадные Таблицы Стилей / Наглядное тестирование поддержки CSS3-свойств вашим брузером

2009-01-14T14:05:34Z

Предлагаю всем интересующимся небольшую страницу с наглядным сравнением того как реагирует ваш браузер на CSS3-свойства, поддерживает их либо нет. Страница будет развиваться и наполняться другими свойствами, кроме того, планирую добавить в нее ссылки на описание свойств и дополнительную информацию. На данный момент в тесте участвует 10 свойств, но если вы хотите расширить его, пожалуйста напишите в комментариях про желаемое CSS3-свойство.

Посмотреть тест.

PS: opacity в тесте не будет.

UPD: тест обновлен: добавлено 5 тестов, исправлена ошибка в css

Железо / Платформа Mini-ITX как основа для домашнего сервера

2009-01-14T13:00:20Z

Совсем недавно, на хабре писали о недостатках недорогих SOHO роутеров под большой нагрузкой, и расширении функционала роутеров ASUS серии g5XX путем загрузки модифицированной прошивки. Хочу рассказать вам о альтернативных, куда как более высокопроизводительных платформах Mini-ITX от VIA для организации тихого и маленького домашнего сервера и/или маршрутизатора.

Дальше - немного текста и фото о том, из чего это сделано, и как все это работает

Железо / Как из дешевого и простенького роутера сделать полнофункциональный сервер.

2009-01-13T06:21:41Z

Введение

Первым роутером, который попал ко мне, был D-Link DI-524, у меня не было времени что либо выбирать, просто купил первый попавшийся недорогой роутер с wi-fi. Так как по натуре я люблю всё ломать, я почти сразу полез в интернет искать, как его можно усовершенствовать.
Но в то время почти ничего не нашел кроме советов просверлить в нем дырок. Да действительно он частенько перегревался и поэтому нестабильно работал, но на такой рискованный шаг я не пошел.
Благо я его почти сразу подарил своему другу.
Когда мне снова понадобилось такое устройство, я уже знал какие функции мне действительно необходимы, для меня это было QoS. Я как обыденный покупатель начал смотреть на маркетинговые описания возможностей и фишек устройств. Для меня это тогда казалось единственно верным. Оказалось это не совсем так.
Я купил asus w520gu, я им в целом доволен (об этом позже), но считаю правильным поделиться опытом и информацией как я его усовершенствовал.

Кому интересно читаем дальше

instanceof considered harmful (or how to write a robust isArray

2009-01-12T10:55:07Z

instanceof considered harmful (or how to write a robust isArray. JavaScript’s instanceof operator breaks when dealing with objects that may have been created in a different document or frame, since constructors are unique to each frame. Instead, you can check for arrays using the default Object.toString method which the JS spec guarantees will return [object Array].

Link: Perfection Kills

2009-01-12T00:02:37Z

How to write a robust isArray. Faced with a long-standing problem, Juriy finds the solution by reading the spec. Let this be a lesson to all of us.

HTML 5 canvas - the basics

2009-01-08T13:48:26Z

HTML 5 canvas is a powerful, flexible way to create two dimensional graphics on web pages using scripting, and a number of previous dev.opera.com articles have demonstrated usage of it already. This article goes back to basics, giving beginners a starting point to work from and explaining the basics. Get drawing!

Rate limiting with memcached

2009-01-07T22:27:08Z

On Monday, several high profile “celebrity” Twitter accounts started spouting nonsense, the victims of stolen passwords. Wired has the full story—someone ran a dictionary attack against a Twitter staff member, discovered their password and used Twitter’s admin tools to reset the passwords on the accounts they wanted to steal.

The Twitter incident got me thinking about rate limiting again. I’ve been wanting a good general solution to this problem for quite a while, for API projects as well as security. Django Snippets has an answer, but it works by storing access information in the database and requires you to run a periodic purge command to clean up the old records.

I’m strongly averse to writing to the database for every hit. For most web applications reads scale easily, but writes don’t. I also want to avoid filling my database with administrative gunk (I dislike database backed sessions for the same reason). But rate limiting relies on storing state, so there has to be some kind of persistence.

Using memcached counters

I think I’ve found a solution, thanks to memcached and in particular the incr command. incr lets you atomically increment an already existing counter, simply by specifying its key. add can be used to create that counter—it will fail silently if the provided key already exists.

Let’s say we want to limit a user to 10 hits every minute. A naive implementation would be to create a memcached counter for hits from that user’s IP address in a specific minute. The counter key might look like this:

ratelimit_72.26.203.98_2009-01-07-21:45

Increment that counter for every hit, and if it exceeds 10 block the request.

What if the user makes ten requests all in the last second of the minute, then another ten a second later? The rate limiter will let them off. For many cases this is probably acceptable, but we can improve things with a slightly more complex strategy. Let’s say we want to allow up to 30 requests every five minutes. Instead of maintaining one counter, we can maintain five—one for each of the past five minutes (older counters than that are allowed to expire). After a few minutes we might end up with counters that look like this:

ratelimit_72.26.203.98_2009-01-07-21:45 = 13
ratelimit_72.26.203.98_2009-01-07-21:46 = 7
ratelimit_72.26.203.98_2009-01-07-21:47 = 11

Now, on every request we work out the keys for the past five minutes and use get_multi to retrieve them. If the sum of those counters exceeds the maximum allowed for that time period, we block the request.

Are there any obvious flaws to this approach? I’m pretty happy with it—it cleans up after itself (old counters quietly expire from the cache), it shouldn’t use much resources (just five active cache keys per unique IP address at any one time) and if the cache is lost the only snag is that a few clients might go slightly over their rate limit. I don’t think it’s possible for an attacker to force the counters to expire early.

An implementation for Django

I’ve put together an example implementation of this algorithm using Django, hosted on GitHub. The readme.txt file shows how it works—basic usage is via a simple decorator:

from ratelimitcache import ratelimit

@ratelimit(minutes = 3, requests = 20)
def myview(request):
    # ...
    return HttpResponse('...')

Python decorators are typically functions, but ratelimit is actually a class. This means it can be customised by subclassing it, and the class provides a number of methods designed to be over-ridden. I’ve provided an example of this in the module itself—ratelimit_post, a decorator which only limits on POST requests and can optionally couple the rate limiting to an individual POST field. Here’s the complete implementation:

class ratelimit_post(ratelimit):
    "Rate limit POSTs - can be used to protect a login form"
    key_field = None # If provided, this POST var will affect the rate limit
    
    def should_ratelimit(self, request):
        return request.method == 'POST'
    
    def key_extra(self, request):
        # IP address and key_field (if it is set)
        extra = super(ratelimit_post, self).key_extra(request)
        if self.key_field:
            value = sha.new(request.POST.get(self.key_field, '')).hexdigest()
            extra += '-' + value
        return extra

And here’s how you would use it to limit the number of times a specific IP address can attempt to log in as a particular user:

@ratelimit_post(minutes = 3, requests = 10, key_field = 'username')
def login(request):
    # ...
    return HttpResponse('...')

The should_ratelimit() method is called before any other rate limiting logic. The default implementation returns True, but here we only want to apply rate limits to POST requests. The key_extra() method is used to compose the keys used for the counter—by default this just includes the request’s IP address, but in ratelimit_post we can optionally include the value of a POST field (for example the username). We could include things like the request path here to apply different rate limit counters to different URLs.

Finally, the readme.txt includes ratelimit_with_logging, an example that over-rides the disallowed() view returned when a rate limiting condition fails and writes an audit note to a database (less overhead than writing for every request).

I’ve been a fan of customisation via subclassing ever since I got to know the new Django admin system, and I’ve been using it in a bunch of projects. It’s a great way to create reusable pieces of code.

Opera / Осторожно: статистическая дезинформация

2009-01-07T12:08:19Z

Вы никогда не задумывались, насколько велика может быть так называемая статистическая погрешность? Особенно — в мировых масштабах. Особенно — у, казалось бы, солидных и многократно цитируемых агентств или онлайновых сервисов. Я попробую показать вам, насколько всё грустно. Естественно — на примере статистики использования браузеров (кто меня знает — поймёт, почему). Надеюсь, после прочтения данного мини-исследования вы будете более критично относиться к очередным громким заявлениям различных изданий о рыночной доле той или иной программы.

Читать дальше →