Ссылки о веб-разработке за июль 2010

Практичные регулярки: современные движки рвут и мечут

Как известно, секрет правильного приготовления движков по выполнению регулярных выражений был когда-то промышленностью изобретён (grep, awk), утерян (Perl, Python, PHP, Ruby), и изобретён вновь.

Время сопоставления an с образцом a?nan.

Проект Google Re2 (C++) — успешная попытка [вос]создать промышленный regex движок, свободный от экспоненциальной зависимости от входных данных. Этот проект, выполненный Рассом Коксом (Russ Cox), дал возможность гуглу эффективно выполнять поиск по регулярным выражениям в программном коде, найденном гуглом в интернете. В качестве ценнейшего артефакта были порождены три статьи про построение движков регулярок, давая интересный исторический контекст и описание эффективной имплементации.

В сентябре 2010 мир пополнится ещё одним таким must-read ресурсом по регулярным выражениям. Пьеса (!) A Play on Regular Expressions (http://sebfisch.github.com/haskell-regexp/regexp-play.pdf) (via [info]antilamer) описывает создание ленивого, чисто функционального алгоритма, основывающегося на всё том же подходе превращения регулярки в конечный автомат. Описанный в статье алгоритм позволяет строить матчер выражений, поддерживающий любые контекстно-свободные (sic!!) грамматики.

Алгоритм, и соответствующая ему имплементация, Weighted RegExp Matching на простых выражениях обычно медленнее, чем другие библиотеки типа pcre, но у неё асимптотически лучшее поведение: O(nm) от длины входных данных и размера выражения. Пользуясь ленивостью, на некоторых бенчмарках (.*a.{20}a.*) эта Haskell-имплементация обгоняет Re2, описанный выше, по абсолютному времени выполнения.

Weighted-regexp для Haskell доступна на Hackage: http://hackage.haskell.org/package/weighted-regexp

Также алгоритм, описанный в «A Play on Regular Expressions», был имплементирован на Python всего в нескольких десятках строках кода:

Самое интересное, что тот же алгоритм был написан ещё и на C++ и Java, и на Java получился в два раза быстрее, несмотря на то, что алгоритм очень хорошо ложится на ОО, и может быть переносим между ОО-языками без особых изменений:
To get a feeling for the orders of magnitude involved, the CPython re module (which is implemented in C and quite optimized) can match 2'500'000 chars/s. Google's new re2 implementation still matches 550'000 chars/s. Google's implementation is slower, but their algorithm gives complexity and space guarantees similar to our implementation in the last blog post.


The C++ version is a little bit faster than the RPython version translated to C, at 750'000 chars/s. That's not very surprising, given their similarity. The Java version is more than twice as fast, with 1'920'000 chars/s. Apparently the Java JIT compiler is a lot better at optimizing the method calls in the algorithm or does some other optimizations.


With the regular expression matcher translated to C and with a generated JIT, the regular expression performance increases significantly. Our running example can match 16'500'000 chars/s, which is more than six times faster than the re module. This is not an entirely fair comparison, because the re module can give more information than just "matches" or "doesn't match", but it's still interesting to see. A more relevant comparison is that between the program with and without a JIT: Generating a JIT speeds the matcher up by more than 20 times.

Implementation язык (ред. — [info]lionet) chars/s speedup over pure Python
Pure Python code Python 12'200 1
Python re moduleC2'500'000205
Google's re2 implementationC++550'00045
RPython implementation translated to CPython720'00059
C++ implementationC++750'00061
Java implementationJava1'920'000157
RPython implementation with JITPython16'500'0001352

Это, в частности, подтверждает тезис о том, что C++ не нужен: современные языки с GC и JIT просто уделывают его на традиционных для C/C++ задачах.

Wikileaks To Leak 5000 Open Source Java Projects With All That Private/Final Bullshit Removed

EYJAFJÖLL, ICELAND — Java programmers around the globe are in a panic today over a Wikileaks press release issued at 8:15am GMT. Wikileaks announced that they will re-release the source code for thousands of Open Source Java projects, making all access modifiers 'public' and all classes and members non-'final'.

Agile Java Developer Johnnie Garza of Irvine, CA condemns the move. "They have no right to do this. Open Source does not mean the source is somehow 'open'. That's my code, not theirs. If I make something private, it means that no matter how desperately you need to call it, I should be able to prevent you from doing so, even long after I've gone to the grave."

According to the Wikileaks press release, millions of Java source files have been run through a Perl script that removes all 'final' keywords except those required for hacking around the 15-year-old Java language's "fucking embarrassing lack of closures."

Moreover, the Perl script gives every Java class at least one public constructor, and turns all fields without getters/setters into public fields. "The script yanks out all that @deprecated shit, too," claims the controversial announcement.

Longtime Java programmer Ronnie Lloyd of Austin, TX is offended by the thought of people instantiating his private classes. "It's just common sense," said Lloyd, who is 37. "If I buy you a house and put the title in your name, but I mark some of the doors 'Employees Only', then you're not allowed to open those doors, even though it's your house. Because it's really my house, even though I gave it to you to live in."

Pacing and frowning thoughtfully, Lloyd continued: "Even if I go away forever and you live there for 20 years and you know exactly what's behind the doors — heck, even if it's a matter of life and death — plain old common sense still dictates that you're never, ever allowed to open them for any reason."

"It's for your own protection," Lloyd added.

Wesley Doyle, a Java web developer in Toronto, Canada is merely puzzled by the news. "Why do they think they need to do this? Why can't users of my Open Source Java library simply shake their fists and curse my family name with their dying breaths? That approach has been working well for all the rest of us. Who cares if I have a private helper function they need? What, is their copy/paste function broken?"

Wikileaks founder Julian Assange, who coined the term "Opened Source" to describe the jailbroken open-source Java code, fears he may be arrested by campus security at Oracle or possibly IBM. The Wikileaks founder said: "Today the Eclipse Foundation put out a private briefing calling me a 'non-thread-safe AbstractKeywordRemovalInitiatorFactory'. What the fuck does that even mean? I fear for my safety around these nutjobs."

The removal of '@deprecated' annotations is an especially sore issue for many hardworking Java developers. "I worked hard to deprecate that code that I worked hard to create so I could deprecate some other code that I also worked hard on," said Kelly Bolton, the spokesperson for the League Of Java Programmers For Deprecating The Living Shit Out Of Everything.

"If people could keep using the older, more convenient APIs I made for them, then why the fuck would they use my newer, ridiculously complicated ones? It boggles the imagination," Bolton added.

The Eclipse CDT team was especially hard-hit by the removal of deprecation tags. Morris Baldwin, a part-time developer for the CDT's C++ parsing libraries says: "We have a policy of releasing entire Java packages in which every single class, interface and method is deprecated right out of the box, starting at version 1.0."

"We also take careful steps to ensure that it's impossible to use our pre-deprecated code without running our gigantic fugly framework," the 22-year-old Baldwin added. "Adding public constructors and making stuff non-final would be a serious blow to both non-usability and non-reusability."

The Agile Java community has denounced the Wikileaks move as a form of terrorism. "It was probably instigated by those Aspect-Oriented Programming extremists," speculates Agile Java designer Claudia Hewitt, age 29. "I always knew they wanted to use my code in ways I couldn't predict in advance," she added.

Many Java developers have vowed to fight back against the unwelcome opening of their open source. League of Agile Methodology Experts (LAME) spokesperson Billy Blackburn says that work has begun on a new, even more complicated Java build system that will refuse to link in Opened Source Java code. The new build system will be released as soon as several third-party Java library vendors can refactor their code to make certain classes more reusable. Blackburn declined to describe these refactorings, claiming it was "none of y'all's business."

Guy Faulkner, a 51-year-old Python developer in Seattle, was amused by the Wikileaks announcement. "When Python developers release Open Source code, they are saying: Here, I worked hard on this. I hope you like it. Use it however you think best. Some stuff is documented as being subject to change in the future, but we're all adults here so use your best judgment."

Faulkner shook his head sadly. "Whereas Java developers who release Open Source are code are saying: Here, I worked hard on this. I hope you like it. But use it exactly how I tell you to use it, because fuck you, it's my code. I'll decide who's the goddamn grown-up around here."

"But why didn't they write that Perl script in Python?" Faulkner asked.


Previous article: San Francisco Airport Announces That All Restrooms Near You Are Now Deprecated
The SFO port authority announced today that all airport restrooms located anywhere near you are now deprecated due to "inelegance". The newer, more elegantly designed restrooms are located a short 0.8 mile (1.29 km) walk from the International Terminal. Read more

Next article: Eclipse Sits On Man's Couch, Breaks It
New Hampshire programmer Freddie Cardenas, 17, describes the incident: "We invited Eclipse over for dinner and drinks. Eclipse sat down on our new couch and there was this loud crack and it broke in half. Those timbers had snapped like fuckin' matchsticks. Then my mom started crying, and Eclipse started crying, and I ran and hid in my bedroom." Read more

Mobile-friendly: The mobile web optimization guide

Everyone wants to make their sites “mobile friendly” these days — the mobile web market is becoming big business. This article takes you through the different available strategies for making your websites mobile browser compatible, sharing many tips and tricks along the way.

Mobile payments made easy

This is just in: Google seems to be taking steps to allow operator billing. If that’s true it’s huge news.

Note from the outset that the article doesn’t say in so many words that operator billing is coming, although it certainly gives that impression, and plenty of publications translate it as such.

The basic idea of operator billing is very simple: if you want to buy an app, or access to online content, the price is automatically added to your operator bill (or, I assume, deducted from your pre-paid account).

Now I’m not a mobile billing specialist by any means, but I still want to give you an idea of what we’re talking about. If I make any technical mistakes, please correct them in the comments.

The billing process

Just yesterday I made my first Android Market purchase, and although the process was relatively smooth, I still had to fill in all my credit card stuff, make mistakes, being told off, etc. Besides, when I tried to do the same a few months ago, the Android Market rejected my credit card. Why? Probably because the Dutch market wasn’t active yet — but I thought of that only much later. At the moment I was pretty pissed.

Now with operator billing I wouldn’t have all this hassle. I’d just click on whatever I want to buy, give a one-time confirmation “Yes, I do want to spend € 2.39 on this” and be done. When my next operator bill comes around, the € 2.39 will be added to it.

In addition, operators can verify your identity through your SIM card, without you having to do anything. No more hassle with credit card numbers. (In fact, the only parties that have a lot to lose from operator billing are the credit card companies. Expect resistance from them; they’ll probably say it’s unsafe or something.)

Thus operator billing is by far the most user-friendly way of making mobile purchases. That’s what makes it so important. Besides, it also opens up the mobile marketplace to those that do not have a credit card.

A question of identity

However, Google’s rather vague announcement does leave some questions unanswered. No doubt that’s because Google is still figuring out how to answer those questions. But let’s review them anyway:

The last question is probably the most important one. If I want to make a purchase through operator billing, there are three parties involved: me, the selling party, and the operator. Somehow, the selling party has to connect to the operator to figure out exactly who I am, and to make a request to put € 2.39 on my bill for my purchase. In addition, the operator has to pay that money (maybe minus a fee) to the selling party.

The JIL 1.2 API gives us some clues as to how this system is going to work. This API, that will eventually be implemented in Vodafone 360 phones as well as, one hopes, many others, has two properties that are meant specifically for operator billing (p. 16):


Thus, when purchase times comes around, the store has some grip on your identity. It will have to send off a communication to the specified operator stating that user with unique ID X wants to make a € 2.39 purchase.

The operator will have to make some effort to verify this information; after all I might be able to hack a phone to send false unique IDs. Thus the operator will probably send me an SMS “Are you sure you want to purchase product X for € 2.39?“ Once I reply to that SMS the purchase is made and downloading can commence.

Still, I hope that the process will become even more user-friendly. The same JIL specification defines a way to send an SMS from a widget. Thus, if I want to purchase something the system could automatically generate an SMS for me and send it off to the operator. Thus the operator will be able to verify my intent by comparing my unique user ID with the SIM card through which the SMS was sent. If they match the purchase is made and downloading can commence.

That’s one step less, and thus more user-friendly. Hell, if it’s implemented correctly I don’t even have to switch to my SMS application. (The operator still has to tell the store “Purchase made, proceed with download.” But a proper system will not bother me with the details.)

Unfortunately the JIL 1.2 spec does not yet contain the methods that will be used for actual payments, nor the exact workflow. Besides, it’s unclear which operators Google is talking to right now. Probably US-based ones, and of those only Verizon is part of the JIL consortium. The others might use other APIs. (Come to think of it, so might Verizon. One never knows.)

Future expansion

Let’s close on a positive note and assume that a system roughly similar to what I describe above will actually be in place in two years or so. Apart from the increased user-friendliness of the purchasing process, what will it bring?

The basic answer is Long Tail. Increased user-friendliness and the scrapping of the requirement to own a credit card may entice more consumers to make a mobile purchase. That would be good.

The real benefit will lie with developers, though. In theory, the system could be set up so that individual developers who offer one or two apps for download on their own site can also use it.

Thus the requirement to offer your wares through one or more app stores might also be scrapped. That could be especially important to cross-platform apps such as W3C Widgets. Whichever phone with whichever operator ends up at the developer’s site, they can all make a purchase, provided they support widgets.

One more nail in the app stores’ coffin would be the opportunity to make in-app purchases; say some articles from a news site or a few new levels for your game. Operator billing is explicitly meant for such purchases, too. And if we can use operator billing in our apps, too, the app store infrastructure is basically not necessary any more.

Picture the following:

  1. I write a news reader app as a W3C Widget. Anyone can download it for free from my website.
  2. If you like my app you can share it with your friends. Just send over the widget via Bluetooth. No more complicated user-unfriendly Send-To-Friend systems necessary.
  3. But how do I make money? By selling access to the actual articles. Every article you want to read costs you, say € 0.02. Alternatively, you can buy a day of unlimited access for, I don’t know, € 0.99.
  4. All the billing is done in-app through the operator. My users never have to do anything beyond saying “Yes, I want to buy this article.”

Where’s the app store in this process? Nowhere. We don’t need it any more. Wouldn’t that be something?

(I should note that although sending widgets via Bluetooth is possible nowadays — I’ve done it — the process is not very user-friendly yet. But this functionality is definitely coming; it’s not a pipe dream.)

Waiting for Google

So I’m impatiently waiting for Google to announce more details. Exactly how will their system work? What does the user have to do? Which operators? Questions, questions.

Anyway, the future of mobile payments has come one step closer.

HTTP login POST -> HTTPS = Bad Idea

Алексей Капранов
HTTP POST -> HTTPS = Bad Idea® « my 20% - http://paulmakowski.wordpress.com/2009...
Phil Z'Difference and arty liked this

Актуальные данные для SpeedDial на Погоде и Картах

Shared by arty
это круто, да
интересно, при запросе для SpeedDial опера шлёт какие-то особые заголовки?

Для пользователей Opera и любителей визуальных закладок у нас есть хорошая новость. Теперь при добавлении в SpeedDial Яндекс.Погода и Яндекс.Карты будут показывать вам актуальную информацию о погоде и пробках в вашем городе.

Выглядит это вот так:

Воспользоваться такими закладками могут пользователи браузера Opera, начиная с версии 9.2, а также пользователи Firefox с помощью плагина Speed Dial. После добавления нажмите на закладке правую кнопку мыши и настройте нужный вам интервал обновления.

Любители визуальных закладок Яндекса


OpenStack: облако на открытом коде и открытых стандартах

Сегодня увидел в ракспейсовской рассылке крайне интересную штуку – OpenStack http://www.openstack.org который продвигают NASA и Rackspace вместе. Кроме того весь софт открытый и под Apache License 2.0

Пишут что сделано всё на Python с Tornado и Twisted и AMPQ. Обещают первую версию к середине октября, а пока можно взять код на Лаунчпаде https://launchpad.net/openstack

Выглядит весьма интересно.

Originally published at Иван Бегтин. You can comment here or there.

квоты на количество данных в localStorage

сейчас большинство браузеров уже поддерживают localStorage, и можно довольно смело его использовать для хранения данных на клиенте. Но каков размер этого хранилища? Спецификация говорит про «случайно выбранное ограничение в 5 мегабайт». Но не всё так просто.

большинство приложений будет хранить не байты, а символы. Абсолютное большинство символов даже в utf-8 занимает два байта. Некоторые реализации используют utf-16, которая использует два байта даже для ascii-символов.

каждый производитель браузеров принял своё решение. Chrome ограничивает размер базы именно пятью мегабайтами. Firefox позволяет хранить около 5 миллионов символов. Explorer — чуть меньше 5 миллионов. И только Opera уже сейчас при достижении предела просто предлагает пользователю выделить побольше места для приложения — вплоть до всего диска!

скорее всего, эти ограничения будут меняться со временем. Чтобы в любой момент можно было легко проверить актуальные ограничения, я сделал тест квоты на количество данных в localStorage.

Test of localStorage limits/quota

Test of localStorage limits/quota:

The script constructs very long strings and tries to save them to window.localStorage. When that fails, it reports last successful length and current failing length.

Diffable: only download the deltas

Diffable: only download the deltas. JavaScript library for detecting and serving diffs to JavaScript rather than downloading large scripts every time a few lines of code are changed. “Using Diffable has reduced page load times in Google Maps by more than 1200 milliseconds (~25%). Note that this benefit only affects users that have an older version of the script in cache. For Google Maps that’s 20-25% of users.”

Commit message generator

Если юзер пришёл с того же IP, не просите у него капчу в первый раз

Каждый раз, когда сбрасываются куки на Хабре, приходится не просто жать ОК на странице входа, но и демонстрировать свои впечатляющие возможности по распознаванию образов, капчу вводить. С первого раза это получается далеко не всегда.

Вот задумался, а зачем в такой ситуации меня просят её вводить? Нужна презумпция невиновности — я не робот по крайней мере до тех пор, пока моё поведение не даст повода подозревать обратное. Дайте одну попытку ввести пароль без капчи! Я не ошибусь, пароль помнит браузер.

Update: Попробую немного поменять схему, чтобы усложнить жизнь ботнетам, которые перебирают логины и пароли так, чтобы попытка входа в один аккаунт выполнялась не слишком часто. Надо разрешить одну попытку входа без капчи только с того компьютера, с которого был предыдущий вход.

По просьбам читателей и собственному разумению переношу в блог «Хабр — поддержка пользователей». Если есть более правильное место — подскажите.
← предыдущий месяц