Сегодня мы начинаем поддерживать микроформат hCard, предназначенный для разметки контактных данных. Эти данные используются для пополнения нашего справочника организаций и в дальнейшем отображаются на Яндекс.картах и странице результатов поиска.
OpenSSL 1.0.0 release, http://www.openssl.org/news/ 29 Mar 2010 (started 23 Dec 1998). 11+ years in the making
|
pwnat - NAT to NAT client-server communication - http://samy.pl/pwnat/
|
:-D RT @johlrogge: RT @technomancy Q: What's the difference between Ant and Maven? A: The creator of Ant has apologized.
|
Microsoft has unfortunately decided to close off development to native applications. Because of this, we won’t be able to provide Firefox for Windows Phone 7 at this time.
Shared by arty
хорошее введение в тему
Video on the Web—Dive Into HTML5. Everything a web developer needs to know about video containers, video codecs, adio containers, audio codecs, h.264, theora, vorbis, licensing, encoding, batch encoding and the html5 video element.
After the <device>
element (see What’s Next in HTML, episode 1) Ian sketched out an interface in the WHATWG HTML draft that builds on that, which looks quite interesting. Peer-to-peer connections (URL bound to change):
[NoInterfaceObject] interface AbstractPeer { void sendText(in DOMString text); attribute Function ontext; // receiving void sendBitmap(in HTMLImageElement image); attribute Function onbitmap; // receiving void sendFile(in File file); attribute Function onfile; // receiving attribute Stream localStream; // video/audio to send readonly attribute Stream remoteStream; // video/audio from remote peer attribute Function onstreamchange; // when the remote peer changes whether the video is being sent or not attribute Function onconnect; attribute Function onerror; attribute Function ondisconnect; }; [Constructor(in DOMString serverConfiguration)] interface PeerToPeerServer : AbstractPeer { void getClientConfiguration(in PeerToPeerConfigurationCallback callback); void close(); // disconnects and stops listening }; [Constructor] interface PeerToPeerClient : AbstractPeer { void addConfiguration(in DOMString configuration); void close(); // disconnects }; [Callback=FunctionOnly, NoInterfaceObject] interface PeerToPeerConfigurationCallback { void handleEvent(in PeerToPeerServer server, in DOMString configuration); };
You will still need some kind of intermediary (i.e. a server in almost all practical scenarios) to exchange the address, but after that things can get pretty interesting I think. I was hoping people would be willing to share their thoughts on the interface sketch above and the general idea of having access to peer-to-peer connections from Web pages and the Web platform in general.
Originally posted by Molly:
For some background, look up CSS Media Types: "all - Suitable for all devices."Opera on your desktop, your phone, in your truck, on your Wii and coming soon: television. We are the media="all" of the industry.
S/G | D/U | D/G |
---|---|---|
Sparse/Grounded | Dense/Ungrounded | Dense/Grounded |
Код на Scala, использующий только конструкции, доступные в Java. Этот код имеет привязки к реляционной алгебре с помощью имён переменных, функций, типов, etc. | Компактный код на Scala, использующий более высокоуровневые конструкции, доступные в Scala. Код не имеет явных привязок к реляционной алгебре. | Компактный код на Scala, имеющий привязки к реляционной алгебре |
![]() | ![]() |
Рис. 3. Для каждого стиля программирования показано нормализованное время, затраченное на чтение кода алгоритма. | Рис. 4. Для каждого стиля программирования показаны среднее нормализованное время, затраченное на рассматривание одного лексического токена в программе, независимо от его природы. |
![]() |
Рис. 9. Снимок экрана с кодом, как он был показан испытуемым во время эксперимента, с картой распределения концентрации внимания. Алгоритм слева — S/G, читаемый испытуемым 7, алгоритм справа — D/U, читаемый испытуемым 12. Обратите внимание на то, что фиксация внимания на именах идентификаторов, наблюдаемая для кода в стиле S/G, в коде стиля D/U отсутствует. |
Экспериментальные результаты показывают, что код на Scala, написанный с использованием продвинутых, абстрактных конструкций, лучше чем код, написанный в стиле, схожем с Java. Разница во времени понимания испытуемыми материала статистически значима, несмотря на малый размер выборки. Касательно достигаемого степени понимания, неформальные замечания, сделанные испытуемыми, дают субъективное подтверждение этому — использование продвинутых конструкций упрощает задачу понимания кода. Интересно отметить, что преимущества использования Scala были видны даже в группе, состоящей из программистов с ограниченным пониманием общих концепций Scala.
[…]
Неожиданностью оказалось наблюдение, что время понимания смысла токена не отличалось, несмотря на разный когнитивное содержание токена. (Шрифт мой — lionet). Если это свойство может быть обобщено, это дало бы дизайнерам языков точную цель: чем короче код, тем лучше. Это также может объяснить, почему языки предметной области (DSL) так эффективны.
Самым главным «тормозом» был признан fb.me – сервис социальной сети Facebook. Для переадресацию на страницу ему требуется целых 2 секунды.
Shared by arty
кратко: раньше яваскрипту можно было учиться на примерах реального веба, а теперь HTML — контейнер байткода, типа SWF, и может им проиграть
If HTML is just another bytecode container and rendering runtime, we’ll have lost part of what made the web special, and I’m afraid HTML will lose to other formats by willingly giving up its differentiators and playing on their turf.
The document enumerates the set of the specifications that constitute the Web Application WG’s Widgets Family of Specifications.
Shared by arty
о, теперь и яндекс
Сегодня мы запустили новый сервис для веб-разработчиков — хостинг популярных Javascript-библиотек на серверах Яндекса.
Используя загрузку библиотек из CDN Яндекса, вы получаете следующие преимущества:
Мы будем размещать свежие стабильные версии библиотек сразу после их выхода, старые версии будут сохраняться на неограниченный срок.
Новости проекта будут публиковаться в нашем клубе, там же вы можете задать вопросы и оставить отзывы.
в спеке html5 есть удобный интерфейс classList
для работы с классами (и другими подобными строками из разделённых пробелами слов). Естественно, он базируется на том, что яваскрипт-библиотеки давно уже сделали удобным и привычным, поэтому переходник сделать очень легко в качестве развлечения на пять минут:
Element.addMethods({
getClassList: function(element) {
element = $(element);
return element.classList || (element.classList = {
has: attach('has'),
add: attach('add'),
remove: attach('remove'),
toggle: attach('toggle')
});
function attach(name) {
return element[name + 'ClassName'].bind(element);
}
}
});
если же очень горит максимально приблизиться к спецификации даже в той части, которую никто никогда не использует, то можно добавить ещё пару методов:
item: function(index){ return element.classNames()[index]; },
length: function(){ return element.classNames().length; },
в целом, конечно, это очень похоже на переходник к не менее удобному dataset
, который я делал пару лет назад.
Shared by arty
вот чего у Крокфорда не отнять, так это таланта увлекательно ругать существующие технологии, и отстранённо смотреть на их историю : ) Хотя, конечно, не всему нужно слепо верить. Если кто не любит видео, вот транскрипт: http://developer.yahoo.com/yui/theater/video.php?v=crockonjs-4
Last week, Yahoo! JavaScript architect Douglas Crockford delivered the fourth installment of his Crockford on JavaScript series:
In this session, Douglas tackles the DOM. On the one hand there was JavaScript, he says, and JavaScript is “what made the browser work.”
On the other hand, there was the Document Object Model, also known affectionately as the DOM. It is what most people hate when they say they hate JavaScript. Most of the people who say they hate JavaScript don’t know JavaScript, might have never seen JavaScript, but they’ve felt the DOM alright. If you don’t know what the difference is and you say, “JavaScript is the stupidest thing I’ve ever seen,” you’re not talking about JavaScript, you’re talking about the DOM. The DOM is the browser’s API. It is the interface. It provides JavaScript for manipulating documents.
The DOM may be imperfect, but it’s nonetheless crucial to what frontend engineers do when they write web applications. In this talk, Douglas provides an overview, situated historically, of where the DOM came from, how it achieved ascendance with Ajax, and what the future might hold. In Douglas’s inimitable fashion, this history starts with Sir John Harrington and takes us up to the present day. A few choice words for CSS are among the many applause lines for veteran developers:
I find within the community of people who use CSS great affection for it. They’re totally invested in CSS, they love it. They can’t imagine any other way of doing formatting in a document. It’s it. It’s sort of like watching an episode of Cops where the cops come in and break up the family dispute, and there’s this “CSS ain’t bad, you just don’t understand it like I do. I know it hurts me, but I make mistakes, I’m wrong.” CSS is awful, and it amazes me the way people get invested in it. It’s like once you figure it out, kind of go “oh, OK, I see how I might be able to make it work,” then you flip from hating it to loving it, and despising anybody who hasn’t gone through what you’ve gone through. It doesn’t make sense to me.
If the video embed below doesn’t show up correctly in your RSS reader of choice, be sure to click through to watch the high-resolution version of the video on YUI Theater.
flashblockdetector. Mark Pilgrim’s JavaScript library for detecting if the user has a Flash blocker enabled, such as FlashBlock for Firefox and Chrome or ClickToFlash for Safari. One good use of this would be to inform users that they need to opt-in to Flash for unobtrusive Flash enhancements (such as invisible audio players) to work on that page.
Facebook Adds Code for Clickjacking Prevention. Clever technique: Facebook pages check to see if they are being framed (using window.top) and, if they are, add a div covering the whole page which causes a top level reload should anything be clicked on. They also log framing attempts using an image bug.
Speed Tracer is a tool to help you identify and fix performance problems in your web applications. It visualizes metrics that are taken from low level instrumentation points inside of the browser and analyzes them as your application runs. Speed Tracer is available as a Chrome extension and works on all platforms where extensions are currently supported (Windows and Linux).
Using Speed Tracer you are able to get a better picture of where time is being spent in your application. This includes problems caused by:
Shared by artyCode Bubbles - интересная идея фундаментально нового устройства IDE. Отдельные пузырьки для методов/классов/данных, которые легко группируются и разъединяются, и существуют на одной огромной виртуальной рабочей площади. По ссылке есть 8-минутное видео, которое все объясняет.
очень круто!
|
Shared by arty
Directed Identity, естественно, поддерживается: https://i.mydocomo.com/
NTT docomo is now an OpenID Provider | OpenID - http://openid.net/2010...
|
«Why I switched to Pylons after using Django for six months» (reddit) - http://www.reddit.com/r...
|
Right now nobody’s interested in a mobile solution that does not contain the words “iPhone” and “app” and that is not submitted to a closed environment where it competes with approximately 2,437 similar mobile solutions.
Compared to the current crop of mobile clients and developers, lemmings marching off a cliff follow a solid, sensible strategy. Startling them out of this obsession requires nothing short of a new buzzword.
Therefore I’d like to re-brand standards-based mobile websites and applications, definitely including W3C Widgets, as “HTML5 apps.” People outside our little technical circle are already aware of the existence of HTML5, and I don’t think it needs much of an effort to elevate it to full buzzwordiness.
Technically, HTML5 apps would encompass all websites as well as all the myriads of (usually locally installed) web-standards-based application systems on mobile. The guiding principle would be to write and maintain one single core application that uses web standards, as well as a mechanism that deploys that core application across a wide range of platforms.
За два дня, прошедших с выпуска обновления для европейских Windows-пользователей, число закачек браузера Opera существенно выросло.
В некоторых топовых моделях телевизоров Philips есть такая прикольная штука, как Ambilight. По сути, это светодиодная подсветка телевизора, которая меняет цвет в зависимости от цвета картинки. Смотреть кино на таком телевизоре — одно удовольствие.
На флэше уже есть реализации такой подсветки, ну а чем мы — фронтовики — хуже? Дабы в очередной раз разобраться, на что способны современные браузеры, на свет появился очередной эксперимент:
Ambilight для тэга <video> (Firefox 3.5, Opera 10.5, Safari 4, Google Chrome 4)
Далее рассмотрим, как это было сделано.
Прежде, чем начать что-то писать, нужно составить алгоритм, по которому будет работать наша подсветка.
Настоящая подсветка в телевизоре работает примерно так. На задней панели располагается ряд ярких светодиодов, которые светятся разными цветами. Причём цвет диода примерно соответствует цвету области изображения, напротив которой он находится. Когда картинка меняется, светодиод плавно меняет свой цвет на другой.
Исходя из этого описания, нам нужно проделать следующее: определить цвет каждого диода для текущего кадра и отрисовать его свечение. Что ж, приступим.
Для удобства предположим, что в нашем «телевизоре» всего по 5 светодиодов с каждой стороны. Соответственно, нужно взять фрагмент кадра, разделить его на области по количеству диодов и найти усреднённый цвет в каждой области — это и будут цвета подсветки:
Чтобы получить изображение текущего видео-кадра, достаточно отрисовать его в <canvas>
через метод drawImage()
:
var canvas = document.createElement('canvas'), video = document.getElementsByTagName('video')[0], ctx = canvas.getContext('2d'); // обязательно выставляем размер холста canvas.width = video.width; canvas.height = video.height; // рисуем кадр ctx.drawImage(video, 0, 0, video.width, video.height);
Текущий кадр получили, теперь нужно узнать, какого цвета пиксели сбоку изображения. Для этого воспользуемся методом getImageData()
:
/** Ширина области, которую будем анализировать */ var block_width = 50; var pixels = ctx.getImageData(0, 0, block_width, canvas.height);
В объекте pixels
есть свойство data
, в котором содержатся цвета всех пикселей. Причём хранятся они в немного необычном формате: это массив RGBA-компонетнов всех пикселей. К примеру, чтобы узнать цвет и прозрачность первого пикселя, нужно взять первые 4 элемента массива data
, второго пикселя — следующие 4 и так далее:
var pixel1 = { r: pixels.data[0], g: pixels.data[1], b: pixels.data[2], a: pixels.data[3] }; var pixel2 = { r: pixels.data[4], g: pixels.data[5], b: pixels.data[6], a: pixels.data[7] };
Нам нужно разделить все полученные пиксели на 5 групп (по количеству светодиодов, которое мы выбрали ранее) и проанализировать каждую группу по очереди:
function getMidColors() { var width = canvas.width, height = canvas.height, lamps = 5, //количество светодиодов block_width = 50, // ширина анализируемой области block_height = Math.ceil(height / lamps), // высота анализируемого блока pxl = block_width * block_height * 4, // сколько всего RGBA-компонентов в одной области result = [], img_data = ctx.getImageData(0, 0, block_width, h), total = img_data.data.length; for (var i = 0; i < lamps; i++) { var from = i * width * block_width; result.push( calcMidColor(img_data.data, i * pxl, Math.min((i + 1) * pxl, total_pixels - 1)) ); } return result; }
В этой функции мы просто пробегаемся по анализируемым блокам и считаем для них усреднённый цвет с помощью функции calcMidColor()
. Нам не нужно применять всякие хитрые формулы, чтобы посчитать усреднённый цвет на области исходя из интенсивности цветов в ней, достаточно посчитать среднее арифметическое для каждого цветового компонента:
function calcMidColor(data, from, to) { var result = [0, 0, 0]; var total_pixels = (to - from) / 4; for (var i = from; i <= to; i += 4) { result[0] += data[i]; result[1] += data[i + 1]; result[2] += data[i + 2]; } result[0] = Math.round(result[0] / total_pixels); result[1] = Math.round(result[1] / total_pixels); result[2] = Math.round(result[2] / total_pixels); return result; }
Итак, мы получили цвета для светодиодов, но они слишком тусклые: ведь диоды светят очень ярко чтобы добиться достаточного уровня свечения. Нужно увеличить яркость цветов, а также увеличить насыщенность, чтобы добавить глубины свечению. Для этих целей очень удобно пользоваться цветовой моделью HSV — hue, saturation, value, — достаточно домножить два последних компонента на некий коэффициент. Но цвета у нас хранятся в модели RGB, поэтому сначала конвертируем цвет в HSV, увеличиваем яркость и насыщенность, а затем обратно конвертируем в RGB (формулы конвертирования RGB→HSV и обратно легко находятся в интернетах):
function adjustColor(color) { color = rgb2hsv(color); color[1] = Math.min(100, color[1] * 1.4); // насыщенность color[2] = Math.min(100, color[2] * 2.7); // яркость return hsv2rgb(color); }
Светодиоды — это всенаправленные источники света. Для их отображения лучше всего подходят радиальные градиенты: для каждого диода свой градиент. Однако для достижения хорошего визуального результата придётся делать очень много сложных расчётов: нужно учитывать позицию диода, диаметр и затухание свечения, смешивание соседних цветов и так далее. Поэтому мы немного сжульничаем: нарисуем обычный — линейный — градиент, а сверху наложим специальную маску, которая создаст ощущение правдоподобного свечения.
Градиент рисуется просто: сначала создаём его с помощью createLinearGradient()
, а потом добавляем цвета через addColorStop()
и отрисовываем его:
// для свечения создаём новый холст var light_canvas = document.createElement('canvas'), light_ctx = light_canvas.getContext('2d'); light_canvas.width = 200; light_canvas.height = 200; var midcolors = getMidColors(), // полчаем усреднённые цвета grd = ctx.createLinearGradient(0, 0, 0, canvas.height); // градиент for (var i = 0, il = midcolors.length; i < il; i++) { grd.addColorStop(i / il, 'rgb(' + adjustColor(midcolors[i]).join(',') + ')'); } // рисуем градиент light_ctx.fillStyle = grd; light_ctx.fillRect(0, 0, light_canvas.width, light_canvas.height);
Получим что-то вроде этого:
Маску мы нарисуем в фотошопе. Есть замечательный фильтр Lightning Effects (Filters→Render→ Lightning Effects…), который позволяет создавать источники света. Заливаем слой белым цветом и вызываем этот фильтр примерно с такими настройками:
Получим вот такое световое пятно:
Меняем режим наложения на Lighten, дублируем, крутим, меняем масштаб, играемся с прозрачностью, правим уровни и получаем вот такой результат:
Так как изображение чёрно-белое, из него очень легко получить маску, где белый цвет будет прозрачным. И если эту маску наложить поверх градиента, то получим вполне себе симпатичное свечение:
Но самое главное — мы легко сможем менять внешний вид и интенсивность свечения, не прибегая к программированию.
Свечение для левой стороны готово, осталось проделать то же самое для правой стороны, добавить плавную смену подсветок и написать контроллер, который с определённым интервалом будет эту подсветку обновлять. Расписывать это — долго и нудно, проще посмотреть исходник.
UPD: как показал эксперимент, далеко не у всех нормально работает HD-видео (изначально размер ролика был 1280×544), снижение разрешения до 592×256 решило проблему.
Some People Can’t Read URLs. Commentary on the recent “facebook login” incident from Jono at Mozilla Labs. I’d guess that most people can’t read URLs, and it worries me more than any other aspect of today’s web. If you want to stay safe from phishing and other forms of online fraud you need at least a basic understanding of a bewildering array of technologies—URLs, paths, domains, subdomains, ports, DNS, SSL as well as fundamental concepts like browsers, web sites and web servers. Misunderstand any of those concepts and you’ll be an easy target for even the most basic phishing attempts. It almost makes me uncomfortable encouraging regular people to use the web because I know they’ll be at massive risk to online fraud.
Internet Explorer: Global Variables, and Stack Overflows. An extremely subtle IE bug—if your recursive JavaScript function is attached directly to the window (global) object, IE won’t let you call it recursively more than 12 times.
Online photo editor Picnik has been acquired by Google, as the Picnik blog announces. The Picnik team is excited, writing that “It means we can think BIG. Google processes petabytes of data every day, and with their worldwide infrastructure and world-class team, it is truly the best home we could have found.” TechCrunch comments that “Interestingly, Picnik is Flickr’s default photo editor”... Flickr being a competitor to Google’s Picasa Web Albums.
A built-in image editor would make some sense in a whole lot of Google tools. Blogger, for instance, or Picasa Web Albums, or Google Presentations (beyond just vector-based editing), even Google image search (for, say, a quick contrast increasing of a pic you’ve found). A stand-alone photo editing app could be interesting too; for one thing, you can’t just install Photoshop on Google Chrome OS. Not sure if we’ll see the existing Picnik app itself surface in Google world, but it seems at least the skill set of the Picnik team could come in handy for Google if they plan any of these efforts.
[Thanks RiyAndroid!]
[By Philipp Lenssen | Origin: Google Acquires Photo Editor Picnik | Comments]
Over the last few years, I've occasionally commented on JavaScript's RegExp API, syntax, and behavior on the ES-Discuss mailing list. Recently, JavaScript inventor Brendan Eich suggested that, in order to get more discussion going, I write up a list of regex changes to consider for future ECMAScript standards (or as he humorously put it, have my "95 [regex] theses nailed to the ES3 cathedral door"). I figured I'd give it a shot, but I'm going to split my response into a few parts. In this post, I'll be discussing issues with the current RegExp API and behavior. I'll be leaving aside new features that I'd like to see added, and merely suggesting ways to make existing capabilities better. I'll discuss possible new features in a follow-up post.
For a language as widely used as JavaScript, any realistic change proposal must strongly consider backward compatibility. For this reason, some of the following proposals might not be particularly realistic, but nevertheless I think that a) it's worthwhile to consider what might change if backward compatibility wasn't a concern, and b) in the long run, all of these changes would improve the ease of use and predictability of how regular expressions work in JavaScript.
Actual proposal: Deprecate RegExp.prototype.lastIndex and add a "pos" argument to the RegExp.prototype.exec/test methods
JavaScript's lastIndex
property serves too many purposes at once:
lastIndex
's intended purpose, but it's nevertheless an important use since there's no alternative feature that allows this. lastIndex
is not very good at this task, though. You need to compile your regex with the /g
flag to get lastIndex
to be used this way; and even then, it only specifies the starting position for the regexp.exec
/test
methods. It cannot be used to set the start position for the string.match
/replace
/search
/split
methods.lastIndex
serves as a convenient and commonly used compliment to the index
property on match arrays returned by exec
. Like always, using lastIndex
like this works only for regexes compiled with /g
.lastIndex
is actually set to the end position of the last match rather than the position where the next search should start (unlike equivalents in practically all programming languages) causes a problem after zero-length matches, which are easily possible with regexes like /\w*/g
or /^/mg
. Hence, you're forced to manually increment lastIndex
in such cases. I've posted about this issue in more detail before (see: An IE lastIndex Bug with Zero-Length Regex Matches), as has Jan Goyvaerts (Watch Out for Zero-Length Matches).Unfortunately, lastIndex
's versatility results in it not working ideally for any specific use. I think lastIndex
is misplaced anyway; if you need to store a search's ending (or next-start) position, it should be a property of the target string and not the regular expression. Here are three reasons this would work better:
In fact, Perl uses this approach of storing next-search positions with strings to great effect, and adds various features around it.
So that's my case for lastIndex
being misplaced, but I go one further in that I don't think lastIndex
should be included in JavaScript at all. Perl's tactic works well for Perl (especially when considered as a complete package), but some other languages (including Python) let you provide a search-start position as an argument when calling regex methods, which I think is an approach that is more natural and easier for developers to understand and use. I'd therefore fix lastIndex
by getting rid of it completely. Regex methods and regex-using string methods would use internal search position trackers that are not observable by the user, and the exec
and test
methods would get a second argument (called pos
, for position) that specifies where to start their search. It might be convenient to also give the String
methods search
, match
, replace
, and split
their own pos
arguments, but that is not as important and the functionality it would provide is not currently possible via lastIndex
anyway.
Following are examples of how some common uses of lastIndex
could be rewritten if these changes were made:
Start search from position 5, using lastIndex
(the staus quo):
var regexGlobal = /\w+/g, result; regexGlobal.lastIndex = 5; result = regexGlobal.test(str); // must reset lastIndex or future tests will continue from the match-end position (defensive coding) regexGlobal.lastIndex = 0; var regexNonglobal = /\w+/; regexNonglobal.lastIndex = 5; // no go - lastIndex will be ignored. instead, you have to do this result = regexNonglobal.test(str.slice(5));
Start search from position 5, using pos
:
var regex = /\w+/, // flag /g doesn't matter
result = regex.test(str, 5);
Iteration, using lastIndex
:
var regex = /\w*/g, matches = [], match; // the /g flag is required for this regex. if your code was provided a non- // global regex, you'd need to recompile it with /g, and if it already had /g, // you'd need to reset its lastIndex to 0 before entering the loop while (match = regex.exec(str)) { matches.push(match); // avoid an infinite loop on zero-length matches if (regex.lastIndex == match.index) { regex.lastIndex++; } }
Iteration, using pos
:
var regex = /\w*/, // flag /g doesn't matter
pos = 0,
matches = [],
match;
while (match = regex.exec(str, pos)) {
matches.push(match);
pos = match.index + (match[0].length || 1);
}
Of course, you could easily add your own sugar to further simplify match iteration, or JavaScript could add a method dedicated to this purpose similar to Ruby's scan
(although JavaScript already sort of has this via the use of replacement functions with String.prototype.replace
).
To reiterate, I'm describing what I would do if backward compatibility was irrelevant. I don't think it would be a good idea to add a pos
argument to the exec
and test
methods unless the lastIndex
property was deprecated or removed, due to the functionality overlap. If a pos
argument existed, people would expect pos
to be 0
when it's not specified. Having lastIndex
around to sometimes screw up this expectation would be confusing and probably lead to latent bugs. Hence, if lastIndex
was deprecated in favor of pos
, it should be a means toward the end of removing lastIndex
altogether.
Actual proposal: Deprecate String.prototype.match and add a new matchAll method
String.prototype.match
currently works very differently depending on the whether the /g
(global
) flag has been set on the regex provided as the first argument:
/g
: If no matches are found, null
is returned; otherwise an array of simple matches is returned./g
: The match
method operates as an alias of regexp.exec
. If a match is not found, null
is returned; otherwise you get an array containing the (single) match in key zero, with any backreferences stored in the array's subsequent keys. The array is also assigned special index
and input
properties.The match
method's non-global mode is confusing and unnecessary. The reason it's unnecessary is obvious: If you want the functionality of exec
, just use it (no need for an alias). It's confusing because, as described above, the match
method's two modes return very different results. The difference is not merely whether you get one match or all matches—you get a completely different kind of result. And since the result is an array in either case, you have to know the status of the regex's global
property to know which type of array you're dealing with.
I'd change String.prototype.match
by making it always return an array containing all matches in the target string. I'd also make it return an empty array, rather than null
, when no matches are found (an idea that comes from Dean Edwards's base2 library). If you want the first match only or you need backreferences and extra match details, that's what regexp.exec
is for.
Unfortunately, if you want to consider this change as a realistic proposal, it would require some kind of language version or mode based switching of the match
method's behavior (unlikely to happen, I would think). So, instead of that, I'd recommend deprecating the match
method altogether in favor of a new method (perhaps RegExp.prototype.matchAll
) with the changes prescribed above.
Actual proposal: Deprecate /g and RegExp.prototype.global, and add a boolean replaceAll argument to String.prototype.replace
If the last two proposals were implemented and therefore regexp.lastIndex
and string.match
were things of the past (or string.match
no longer sometimes served as an alias of regexp.exec
), the only method where /g
would still have any impact is string.replace
. Additionally, although /g
follows prior art from Perl, etc., it doesn't really make sense to have something that is not an attribute of a regex stored as a regex flag. Really, /g
is more of a statement about how you want methods to apply their own functionality, and it's not uncommon to want to use the same pattern with and without /g
(currently you'd have to construct two different regexes to do so). If it was up to me, I'd get rid of the /g
flag and its corresponding global
property, and instead simply give the string.replace
method an additional argument that specifies whether you want to replace the first match only (the default handling) or all matches. This would have the additional benefit of allowing replace-all functionality with nonregex searches.
Note that SpiderMonkey already has a proprietary third argument ("flags") for string.replace
that this proposal would conflict with. I doubt this conflict would cause much heartburn, but in any case, a new replaceAll
argument would provide the same functionality that SpiderMonkey's flags argument is most useful for (that is, allowing global replacements with nonregex searches).
Actual proposal: Make backreferences to nonparticipating groups fail to match
I'll keep this brief since David "liorean" Andersson and I have previously argued for this on ES-Discuss and elsewhere. David posted about this in detail on his blog (see: ECMAScript 3 Regular Expressions: A specification that doesn't make sense), and I've previously touched on it here (ECMAScript 3 Regular Expressions are Defective by Design). On several occasions, Brenden Eich has also stated that he'd like to see this changed. The short explanation of this behavior is that, in JavaScript, backreferences to capturing groups that have not (yet) participated in a match always succeed (i.e., they match the empty string), whereas the opposite is true in all other regex flavors: they fail to match and therefore cause the regex engine to backtrack or fail. JavaScript's behavior means that /(a|(b))\2c/.test("ac")
returns true
. The negative implications of this behavior reach quite far when pushing the boundaries of JavaScript regular expressions.
I think everyone agrees that changing to the traditional backreferencing behavior would be an improvement—it provides far more intuitive handling, compatibility with other regex flavors, and great potential for creative use. The bigger question is whether it would be safe, in light of backward compatibility. I think it would be, since I imagine that more or less no one uses the nonintuitive JavaScript behavior intentionally. The JavaScript behavior amounts to automatically adding a ?
quantifier after backreferences to nonparticipating groups, which is what people already do explicitly if they actually want backreferences to nonzero-length subpatterns to be optional. Also note that Safari 3 and earlier did not follow the spec on this point and used the more intuitive behavior, although that has changed in more recent versions (notably, this change was due to a write up on my blog rather than reports of real-world errors).
Finally, it's probably worth noting that .NET's ECMAScript regex mode (enabled via the RegexOptions.ECMAScript
flag) indeed switches .NET to ECMAScript's unconventional backreferencing behavior.
Actual proposal: Add a /u flag (and corresponding RegExp.prototype.unicode property) that changes the meaning of \d, \w, \b, and related tokens
Unicode-aware digit and word character matching is not an existing JavaScript capability (short of constructing character class monstrosities that are hundreds or thousands of characters long), and since JavaScript lacks lookbehind you can't reproduce a Unicode-aware word boundary. You could therefore say this proposal is outside the scope of this post, but I'm including it here because I consider this more of a fix than a new feature.
According to current JavaScript standards, \s
, \S
, .
, ^
, and $
use Unicode-based interpretations of whitespace and newline, whereas \d
, \D
, \w
, \W
, \b
, and \B
use ASCII-only interpretations of digit, word character, and word boundary (e.g., /na\b/.test("naïve")
unfortunately returns true
). See my post on JavaScript, Regex, and Unicode for further details. Adding Unicode support to these tokens would cause unexpected behavior for thousands of websites, but it could be implemented safely via a new /u
flag (inspired by Python's re.U
or re.UNICODE
flag) and a corresponding RegExp.prototype.unicode
property. Since it's actually fairly common to not want these tokens to be Unicode enabled in particular regex patterns, a new flag that activates Unicode support would offer the best of both worlds.
Actual proposal: Never reset backreference values during a match
Like the last backreferencing issue, this too was covered by David Andersson in his post ECMAScript 3 Regular Expressions: A specification that doesn't make sense. The issue here involves the value remembered by capturing groups nested within a quantified, outer group (e.g., /((a)|(b))*/
). According to traditional behavior, the value remembered by a capturing group within a quantified grouping is whatever the group matched the last time it participated in the match. So, the value of $1
after /(?:(a)|(b))*/
is used to match "ab"
would be "a"
. However, according to ESS3/ES5, the value of backreferences to nested groupings is reset/erased after the outer grouping is repeated. Hence, /(?:(a)|(b))*/
would still match "ab"
, but after the match is completed $1
would reference
a nonparticipating capturing group, which in JavaScript would match an empty string within the regex itself, and be returned as undefined
in, e.g., the array returned by the regexp.exec
.
My case for change is that current JavaScript behavior breaks from the norm in other regex flavors, does not lend itself to various types of creative patterns (see one example in my post on Capturing Multiple, Optional HTML Attribute Values), and in my opinion is far less intuitive than the more common, alternative regex behavior.
I believe this behavior is safe to change for two reasons. First, IE does not implement this rule and follows the more traditional behavior on this point. And second, this is generally an edge case issue to all but hardcore regex wizards, and I'd be surprised to find regexes that rely on this bit of behavior as currently mandated by JavaScript.
Actual proposal: Add an /s flag (and corresponding RegExp.prototype.dotall property) that changes dot to match all characters including newlines
I'll sneak this one in as a change/fix rather than a new feature since it's not exactly difficult to use [\s\S]
in place of a dot when you want the behavior of /s
. I presume the /s
flag has been excluded thus far to save novices from themselves and limit the damage of runaway backtracking, but what ends up happening is that people write horrifically inefficient patterns like (.|\r|\n)*
instead.
Regex searches in JavaScript are seldom line-based, and it's therefore more common to want dot to include newlines than to match anything-but-newlines (although both modes are useful). It makes good sense to keep the default meaning of dot (no newlines) since it is shared by other regex flavors and required for backward compatibility, but adding support for the /s
flag is overdue. A boolean indicating whether this flag was set should show up on regexes as a property named either singleline
(the unfortunate name from Perl, .NET, etc.) or the more descriptive dotall
(used in Java, Python, PCRE, etc.).
Following are a few changes that would suit my preferences, although I don't think most people would consider them significant issues:
/[/]/
). This was already included in the abandoned ES4 change proposals.]
as the first character in character classes (e.g., []]
or [^]]
). This is allowed in probably every other regex flavor, but creates an empty class followed by a literal ]
in JavaScript. I'd like to imagine that no one uses empty classes intentionally, since they don't work consistently cross-browser and there are widely-used/common-sense alternatives ((?!)
instead of []
, and [\s\S]
instead of [^]
). Unfortunately, adherence to this JavaScript quirk is tested in Acid3 (test 89), which is likely enough to kill requests for this backward-incompatible but reasonable change.$&
token used in replacement strings to $0
. It just makes sense. (Equivalents in other replacement text flavors for comparison: Perl: $&
; Java: $0
; .NET: $0
, $&
; PHP: $0
, \0
; Ruby: \0
, \&
; Python: \g<0>
.)[\b]
. Within character classes, the metasequence \b
matches a backspace character (equivalent to \x08
). This is a worthless convenience since no one cares about matching backspace characters, and it's confusing given that \b
matches a word boundary when used outside of character classes. Even though this would break from regex tradition (which I'd usually advocate following), I think that \b
should have no special meaning inside character classes and simply match a literal b
.ECMAScript 3 removed octal character references from regular expression syntax (although \0
was kept as a convenient exception that allows easily matching a NUL character). However, browsers have generally kept full octal support around for backward compatibility. Octals are very confusing in regular expressions since their syntax overlaps with backreferences and an extra leading zero is allowed outside of character classes. Consider the following regexes:
/a\1/
: \1
is an octal./(a)\1/
: \1
is a backreference./(a)[\1]/
: \1
is an octal./(a)\1\2/
: \1
is a backreference; \2
is an octal./(a)\01\001[\01\001]/
: All occurences of \01
and \001
are octals. However, according to the ES3+ specs, the numbers after each \0
should be treated (barring nonstandard extensions, which are allowed) as literal characters, completely changing what this regex matches./(a)\0001[\0001]/
: The \0001
outside the character class is an octal; but inside, the octal ends at the third zero (i.e., the character class matches character index zero or "1"
). This regex is therefore equivalent to /(a)\x01[\x00\x31]/
; although, as mentioned just above, adherence to ES3 would change the meaning./(a)\00001[\00001]/
: Outside the character class, the octal ends at the fourth zero and is followed by a literal "1"
. Inside, the octal ends at the third zero and is followed by a literal "01"
. And once again, ES3's exclusion of octals and inclusion of \0
could change the meaning./\1(a)/
: Given that, in JavaScript, backreferences to capturing groups that have not (yet) participated match the empty string, does this regex match "a"
(i.e., \1
is treated as a backreference since a corresponding capturing group appears in the regex) or does it match "\x01a"
(i.e., the \1
is treated as an octal since it appears before its corresponding group)? Unsurprisingly, browsers disagree./(\2(a)){2}/
: Now things get really hairy. Does this regex match "aa"
, "aaa"
, "\x02aaa"
, "2aaa"
, "\x02a\x02a"
, or "2a2a"
? All of these options seem plausible, and browsers disagree on the correct choice.There are other issues to worry about, too, like whether octal escapes go up to \377
(\xFF
, 8-bit) or \777
(\u01FF
, 9-bit); but in any case, octals in regular expressions are a confusing cluster-cuss. Even though ECMAScript has already cleaned up this mess by removing support for octals, browsers have not followed suit. I wish they would, because unlike browser makers, I don't have to worry about this bit of legacy (I never use octals in regular expressions, and neither should you).
According to ES3 rules, regex literals did not create a new regex object if a literal with the same pattern/flag combination was already used in the same script or function (although this did not apply to regexes created by the RegExp
constructor). A common side effect of this was that regex literals using the /g
flag did not have their lastIndex
property reset in some cases where most developers would expect it. Several browsers didn't follow the spec on this nonintuitive behavior, but Firefox did, and as a result it became the second most duplicated JavaScript bug report for Mozilla. Fortunately, ES5 got rid of this rule, and now regex literals must be recompiled every time they're encountered (this change is coming in Firefox 3.7).
———
So there you have it. I've outlined what I think the JavaScript RegExp API got wrong. Do you agree with all of these proposals, or would you if you didn't have to worry about backward compatibility? Are there better ways to fix the issues I've pointed out than what I've proposed? Got any other gripes with existing JavaScript regex features? I'm eager to hear feedback about this.
Since I've been focusing on the negative in this post, I'll note that I find working with regular expressions in JavaScript to be a generally pleasant experience. There's a hell of a lot that JavaScript got right.