To meet the Connect4 challenge, Marc Moskowitz and I chose to work in XQuery. This wasn’t strictly in the spirit of the challenge, which was supposed to be about trying out something new, since both Marc and I have been working in XQuery for about 7 years now. However, we decided to use XQuery because we like it, and it’s not popular (“medieval torture device” was a colleague’s fond description). We saw this as an opportunity to show off XQuery’s utility outside its nominal sweet spot as the “XML database query language.”

About XQuery

Even programmers who are only remotely familiar with XQuery probably have at least some familiarity with its little sister XPath. If you’re new to XQuery, you could think of it as XPath on steroids: all XPath expressions are also XQuery expressions. To XPath’s base, XQuery adds: looping, sorting, variables, functions, conditionals, an explicit type system, the ability to construct XML documents, and a system for organizing code into modules. In short, it is a fairly complete functional programming language. I’m speaking here of XQuery 1.0; XQuery 3.0 will be adding a good deal more.

About Lux

Beyond XQuery itself, we had to choose an application environment and data store. In the XQuery world these two things generally come as a package since the language is so tightly bound to the XML data representation. The natural choice for us would have been MarkLogic; we’ve also used eXist, and I’ve been curious about trying BaseX for a while now. But I couldn’t resist making Marc try out an open source search engine I’ve been working on for a little while now called Lux, which is basically a mashup of Saxon XQuery processing with Solr/Lucene as a data storage engine. I originally developed Lux as a query tool for our developers who were building XML-based applications primarily in Java using Solr with some XML-aware processing at index time.

At its core, Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service and application server. You can read all about Lux on GitHub, where its source code and documentation is freely available. It’s mostly intended as an XQuery extension to Solr, and is not really a complete system for general-purpose web development, but it does include a minimal application server which is easy to set up and provides quite a bit of functionality.

Read the rest of this entry »

Introductions are always awkward so I’ll just get right to the point. I’m Matt Warren and I work at Safari Books Online’s newly acquired publishing division, PubFactory. Specifically, I was their Lone SysAdmin. I’ve been with PubFactory for slightly over 18 months. But that’s enough about me.

What the Heck is Waldo?

About a year ago one of my coworkers pointed out that we send an awful lot of email about where we are that day. Some people go out on sales trips, some people work from home to take care of home things, some are sick, some leave early… you get the picture.

At the time, these emails went to the whole company and everyone either read, promptly deleted, or skipped over it altogether. For some, it seemed like an inefficient use of time. There were two major proposed fixes. One, send all this email to a different list. Two, send it to some database that would record it for any interested parties.

Solution one was, naturally, incredibly easy. I set up a list named ‘whereami’ and after some discussion over the existential theme of the list, people quickly adopted its use.

Solution two seemed like a fun project so I captured the idea in a Systems ticket and promptly filed it away to be dealt with another day. Then over the September Labor Day weekend, I sat down in my family’s cabin in Maine, decided to learn Ruby on Rails and created Waldo.

Getting Started

I implemented a fair amount of the functionality in that weekend. Ruby on Rails has a fantastic little command scaffold that allows you to create a schema and the necessary pages/functions to create, display, edit, and delete data. Some people will tell you not to use scaffolding to create your app. I was following a guide and it made my life super easy.

Looking back, I’ve made some major changes since creating that scaffolding but it allowed me to demo the project after my return from the weekend. I also used scaffold in the Engineering Programming Challenge and got a serious leg up on the non-cheating competition.

Alpha Release

Anyway, after a weekend of work, I had the basics of a site, something I could call an alpha release. Part of that was a script that would allow users to email the app and feed it into the database. Parsing through raw email headers to get the From, Subject, and Body was a good challenge. Similar to the site layout, I had the basic functionality done and successful tests with some cleansed inputs.

Remaining tasks were to clean up email parsing, get/process the “date out”, disallow duplicate email addresses, style the app, and create a script to clean the database. I didn’t want this to be the de facto solution for time-off. I suppose it could be but the immediate use case was to find out where people were today. Just to be clear, I didn’t style it well. I took that task as a challenge to learn CSS and leave the real styling to someone who knows how red and blue can really coexist on a page.

Development Phase 2

During the recent snowmageddon, I had plenty of time to polish up the project. A large amount of focus went into the email parsing script. I had real inputs now and there were extra headers or unexpected characters that had to be stripped out.

An example of one of the more difficult formats (don’t worry, I asked for permission):

------=_Part_25652_32303157.1360171018160
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Staff meeting first and then getting my hand checked out. I wont be back to=
day but you can reach me by cell if you need me. HUGE thanks to Lao, Melani=
e and Marc for the help.=20
Have a great day guys.=20
--=20
Name Name=20
Some Title=20
More Signature Stuff=20
Even more signature=20
and so on=20

The result isn’t quite ideal but definitely readable.

Staff meeting first and then getting my hand checked out. I wont be back to=day but you can reach me by cell if you need me. HUGE thanks to Lao, Melani=e and Marc for the help.Have a great day guys.

That was all pretty standard Ruby text processing. The fun part came when processing the subjects for what date the user planned to be out. At first, I thought I was going to write how to handle today, tomorrow, this Thursday, Next Monday, etc. With the power of Google, I found a nice little gem (Ruby joke, ha) by the name of Chronic. Once I had that, it was just a simple regex and function call to get the date I needed. I was even able to catch multiple dates in an email (i.e. Out Tues and Wed). Of course I realize now I have a potential bug.

subjects = subject.scan(/(today|tomorrow|next \w+|this \w+|mon\w*|tue\w*|wed\w*|thur\w*|fri\w*)/i)
if subjects.size > 0
subjects.each do |dates|
  dates.each do |date|
    active_date = Chronic.parse(date).strftime("%Y-%m-%d").to_date
      Users.create(:name => name, :email => emailaddr, :status => subject, :notes => notes, :active_date => active_date)
    end
  end
else
  active_date = Date.today
  Users.create(:name => name, :email => emailaddr, :status => subject, :notes => notes, :active_date => active_date)
end

I won’t go through the rest of the code right now. If you’re that interested, I encourage you to browse through or fork it! I am happy, however to announce that the Beta 2 release came out on 11 Feb! I have some bugs to work out and I’d like to get a real Front-end Developer to look at the styles before I call it stable. That being said, I have to publicly thank Dave MacGugan, Ryan Pollock and Robert Hall for helping on ideas for how to get styles started. They will eventually make it into a file in the project.

Marketing

My real motivation for writing this is to encourage people to run my app! I would greatly appreciate feedback on what I’ve done. As a disclaimer, I freely admit that I am a SysAdmin and not a programmer. That explains my confusion about the strange feeling I had in regards to the multiple Users.create calls in the snippet above. Was it just a draft in my apartment or a voice telling me to write a helper function?

Safari’s Content Team has the dubious distinction of having the highest volume of tickets in our company-wide issue management tracking system (we use Atlassian’s JIRA). We easily win this competition, with more than 1,500 open issues on any given day. But do we buckle under the psychic weight of all these tickets? Nah… go ahead, bring ‘em!

Content Issue Pie

Content Issue Pie

Why So Many, You May Ask?

The Content Team has quality-checked 12,729 brand new titles loaded onto Safari Books Online from April 2011 to last week. For the past 6 months, we averaged 753 titles/month, or 177 titles/week. We track only issues that are clearly errors (e.g., a title-cover image mismatch) or issues that seriously impact readability (e.g., all images are random color bitmaps like this one from a real book).

Mangled image

Mangled image

Each time we find an issue like this, we stop the title in the pipeline before it goes live, and follow up one way or another to correct it. We track all of these issues in JIRA, so we can manage the corrections and move each title live as quickly as possible.

At this time, we only check brand new titles, but our publishers are free to update titles at any time without oversight. And, since we only started quality-checking new titles in April 2011, but Safari launched way back in September 2001, there are quite a few titles that we haven’t scrutinized. Various problems get reported: the unavailability of practice files referred to in the text, teeny tiny images too small to make out, or broken links. An average of 200 new content issue tickets are created each month.

Issues Created Monthly

That explains where our issues are coming from. So, how do we manage them?

Standardization, Automation, and Elbow Grease

Well, managing these issues has been an evolving process. We are fortunate to have on staff not just one, but several JIRA experts, who are always willing to help us out with custom fields and productivity brainstorming.

We’ve been working our way up to several key improvements, which are now at a point where we are starting to realize the benefits. With >1,500 issues, global improvements don’t happen overnight. It’s easy to add new fields to help us organize and track issues, but then those fields need to be populated – a daunting task. And of course, in order for this system to work, everyone has to use it the same way — which means a bit of documentation, training, and oversight are needed. Here are the keys to managing this type of issue volume:

  1. Standardization: custom fields, boilerplate language
  2. Automation: QaQ, automated email
  3. Elbow Grease: Monthly issues export & follow up
  4. NEW: Greenhopper

Standardization. Custom JIRA fields help us slice and dice the issues into manageable groups. For example, we added a publisher field, which allows us to export all the open issues for a given publisher. We use a component field, which allows us to sort that publisher’s open issues by whether the issue relates to the source PDF, the source EPUB, the metadata, companion files, etc.

Component Pie

And we have boilerplated the language we use in certain fields, which serves two purposes. First, it saves the ticket writer time – she doesn’t have to consider how to explain a given issue, she can rather just copy/paste the explanatory text from our (constantly updated) JIRA Issue Map. Second, we make sure our boilerplate language is clear enough for publisher-facing communications, even if our primary publisher contact is a rights person who has no need to speak the lingo of CSS or toc.ncx, for example.

Automation. Our stellar engineering team has built us an QA Queue application (we call it the QaQ) to manage our daily load of new titles to quality-check, and this system hooks right into JIRA. After we check a publisher’s new batch of titles, we follow up via email to let the publisher know which titles are live, and which need a little more work before they can go live. The QaQ automates the creation of lovely formatted emails; for titles with associated JIRA tickets, it exports the text from key fields which detail the required fix in easy-to-understand language.

Elbow Grease. We are now rolling out a monthly export of issues for each publisher. When a publisher receives a spreadsheet listing their issues in detail, sorted by issue type, it’s a lot easier for them to follow up en masse, so they can get as many new titles live (or corrected, if they are already live) as quickly as possible. We did a pilot of this new process with a select set of publishers, with very promising results. We don’t want our publishing partners swimming in the JIRA sea, nor should we require them to rely on email alone for making sure all their titles are working well on Safari.

New: Greenhopper. This plug-in to JIRA has us really excited. We are doing a trial run with a Kanban workflow for the subset of Content issues requiring engineering work. In 2010, we were managing the long list of engineering Content issues via JIRA and email alone. Well, that doesn’t work so well once you have more than a handful of issues. So in 2012, we switched to a shared Google doc so we could be sure we were all working off the same songsheet. But even that has its shortcomings – we meant to keep notes in the Google doc and ALSO update each JIRA ticket as we worked. In theory. Often, only one or the other would get updated, and sometimes the priorities in the doc didn’t match the priorities in JIRA.

But with Greenhopper, we plan to kiss the Google spreadsheet goodbye, for the most part. We created a Kanban board with a few key buckets: Pending, In Progress, In SBO QA, and Completed. We are strictly limiting the number of In Progress tickets to 10. (If you go over 10 tickets In Progress, the whole board turns a distressing bloody red.) This way it’s very clear for engineering to know exactly what must be worked on. And the Kanban board is very easy to work with – in our status calls, we can discuss the entire board, and update each individual issue as we discuss it from the same board. No more getting lost in a sea of dozens of browser tabs or windows.

If this Greenhopper experiment works well for our Engineering tickets, we will explore creating boards for other types of Content Issues. The sky seems to be the limit in terms of how you structure your boards; they seem fully customizable based on the fields you want to use.

OK, now that we have these great tools in place and are starting to use them, we can start setting some nice aggressive goals to get our overall numbers down. (The team is going to kill me when they hear this.)  Let’s beat our current created-to-resolved ratio by summer, guys!

30 Day Summary to Beat

Like many nerds, I participate in the MIT Mystery Hunt every January. This is not a post about the hunt, or about the puzzles at the hunt, or about the tools used to solve those puzzles. This is a post about nametags.

My team uses nametags to help us keep track of people, and because it’s fun to make nametags. Because we have the fictional persona of a law firm called “Immoral, Illegal, and Fattening” every person’s nametag contains something immoral, illegal, or fattening that they are defending. This has led, as intended, to some hilarity in this part of the nametag. Two years ago, my housemate Deborah Kaplan defended “a’;DROP TABLE `puzzles`;” a classic SQL Injection attack. But our nametags aren’t assembled using SQL. They’re assembled by parsing the team roster with Perl and creating a PostScript document that is sent to a printer. So this year, Deborah, my colleague Joy Nicholson, and I wrote a PostScript injection attack.

PostScript

PostScript is a language created in the 1980′s by Adobe to allow simple creation of files that are sent to printers. It is generally used as a communications protocol between computers and printers. But it’s a full programming language, written in ASCII, and as such can be very useful as a scriptable language format for producing printed material. Our nametag code, written by my brother Denis Moskowitz, is an example. It produces a single PostScript file that contains a nametag for every team member, with all of their information included. The resulting file looks something like this:

Nametag: Immoral, Illegal, and Fattening. Name: Marc Moskowitz. Defending Sloth

So what did my injection attack look like? It’s simply:


) pop /show { pop } def (

How does this work? It relies on three of the language commands (pop, show, and def) and three language syntax elements, strings, names, and procedures.

PS uses parentheses to delimit strings, so the close-parenthesis at the beginning simply closes the string that the program has started to contain the item. Similarly, the final open-parenthesis gives the program a new string to end with the close-parenthesis that is expecting to close its open parenthesis. So the two parentheses are giving us a little space to work some mischief. PS is a stack-based language. All literal elements simply put an item on the stack. A stack-based language needs stack manipulation commands, and pop is the simplest. It just removes the item on the top of the stack and discards it. So the first pop just gets rid of the string we just created. PS sets off items that are used as names of functions and variables with slashes. So “/show” puts the name “show” on the stack for definition by later commands. PS delimits procedures, groups of commands for later use, in functions with braces, which can then be used by other commands. In this case, we’ve created a procedure that simply calls “pop” (which removes the top of the stack) and completes. As you may have guessed, def takes a name and a procedure off the stack and defines a function with that name. So when the processor reads “/show { pop } def” it defines “show” as a function that does the same thing as “pop”. But “show” already had a definition, which we’ve just overridden. The show command is used by the language to take a string and display it on the printed page. So in 24 characters, we’ve disabled the ability of the program to print text.

Did it work? Technically, yes. The day my brother ran the code to produce the badges, he sent me mail with the subject “Curse you Bobby Tables!” containing a PDF that was multiple pages that looked like:
Multiple nametags with all text missing
So I did pull it off. What I didn’t do is surprise him or inconvenience him. He had noticed the code when I entered it, and after producing this image, he made an easy change to escape parentheses in string literals. But as a fun hack, it definitely passes, and it’s a useful warning that any unsecured system can be affected contrary to the designer’s intentions.

TL;DR — Release products at conferences. The products will be better and you’ll be happier.

Test-driven development is a technique that helps programmers build large applications from small, working components. It has been successful enough to unlock developers’ innate love of acronyms, ranging from ATDD and BDD to MDD and UGG. TDD is important in the industry because it forces a mental shift inside the programmer’s mind. Like most humans, programmers are all too willing to succumb to really lame brain bugs. Instead of falling for the trap of designing and implementing a grand cathedral in one single volcano of brilliance, TDD focuses on a continuous stream of achievable, minor, functional bricks. Conference-driven development offers similar rewards for virtuous choices, but works for the whole product development team rather than just programmers.

Read the rest of this entry »

TOC_logo_twitter

I wasn’t sure until the last minute whether I was going to Tools of Change 2013. When I ran a publishing startup, TOC was the most important event of the year: we organized our entire product release schedule around it. (Keith calls this “Conference-Driven Development.”) It was often the only opportunity to meet our current customers face-to-face, and giving conference presentations and attending mixers constituted 100% of our marketing and sales effort. Missing it was unthinkable, a potentially catastrophic failure for the company.

This year I still have lots of meetings and not enough time, but the stakes are much lower. In the end, what convinced me to come back was less the urgency of the appointments and instead the opportunity to see friends and colleagues. If I didn’t attend, I’d miss the chance to stay in touch with those who’ve supported and encouraged me in the rollercoaster ride that is 21st-century publishing.

It’s always a crapshoot which sessions I’m able to see — many get preempted by interesting session-break conversations that spill into the next track (and are always well worth the time). Here are the talks I’m hoping to attend, some of which naturally overlap, sigh:

Preparing Content for Next-Generation Learning

Greg Grossmeier (Creative Commons), Michael Jay (Educational Systemics, Inc.)

10:45am Wednesday, 02/13/2013

Safari considers itself as much a learning company as an ebook company, but the “e-learning” industry is one with which I have almost no familiarity. We’re always looking for ways to facilitate professional development and skill-building, and I’m eager to keep on top of the leading edge of the space, especially with regards to web-centric approaches versus traditional learning management systems.

End To End Accessibility: A Journey Through The Supply Chain

Dave Gunn (Royal National Institute of Blind People), Sarah Hilderley (EDItEUR Ltd), Doug Klein (Nook Media, LLC), Rick Johnson (Ingram | VitalSource)

1:40pm Wednesday, 02/13/2013

Though our product has significant accessibility affordances, most of them pre-date advances in accessible content, including EPUB 3 semantics. I want to be ready for us to take advantage of semantically-rich content and ensure that we’re providing a consistent user experience relative to other ereading systems.

Book as API

Hugh McGuire (PressBooks / LibriVox / Iambik ), Alistair Croll (Solve For Interesting)

1:40pm Wednesday, 02/13/2013

Some publishers and book services have had public APIs, but have placed enough restrictions as to make them useless for general purpose use. Consequently the APIs don’t see wide adoption, and then the organization wonders why they’re supporting something nobody uses — supporting a public API is a non-trivial investment. Eventually the API is discarded. I’m interested to see if there’s a way out of this self-defeating cycle.

Information Wants to be Shared

Joshua Gans (Rotman School of Management)

9:20am Thursday, 02/14/2013

Google’s First Click Free or innovative approaches to search engine discovery are offering publishers more choices in discoverability and sharing that shouldn’t compromise sales or devalue content. This is a critical topic for any web-based aggregator.

The Elusive “Netflix of eBooks”

Travis Alber (ReadSocial and BookGlutton), Christian Damke (Skoobe), Justo Hidalgo (24Symbols), Andrew Savikas (Safari Books Online)

10:35am Thursday, 02/14/2013

I suspect this is relevant to my interests. Also my boss will be there.

Don’t miss

Other sessions likely to be time well-spent: Revamping Editing: The Invisible Art (Maureen Evans & Blaine Cook, Poetica), especially if you missed their Books in Browsers presentation;  Creators and Technology Converging: When Tech Becomes Part of the Story (moderated by Erin Kissane), an interesting line-up of speakers from outside traditional publishing; PubHack: Understanding Industry Barriers, And How To Get Innovating Anyway (moderated by Kristen McLean), a must-see for publishing startups struggling to work with larger organizations.

Last November I wrote an article explaining how to capture temperature data with an Arduino. These data were sent to the serial bus on a computer via USB. Now that the data are on the computer, we can track and monitor their flow.

Firstly, we should store these data in a database. I rewrote the serial reader to dump data into mysql.

import serial
import time
import MySQLdb

dbhost = 'localhost'
dbname = 'DB_NAME'
dbuser = 'DB_USERNAME'
dbpass = 'DB_PASSWORD'

ser = serial.Serial('/dev/ttyACM0',9600,timeout=1) # On Ubuntu systems, /dev/ttyACM0 is the default path to the serial device on Arduinos, yours is likely different.
while 1:
    time.sleep(10)

    the_goods = ser.readline()
    str_parts = the_goods.split(' ')

    conn = MySQLdb.connect (host = dbhost,
                    user = dbuser,
                    passwd = dbpass,
                    db = dbname)
    cursor = conn.cursor ()
    sql = "INSERT arduino_temp (temperature) VALUES ('%s');" % (str_parts[0])
    try:
        cursor.execute(sql)
    except:
        pass
    cursor.close ()
    conn.commit()
    conn.close ()        

    print the_goods 

Next, I want to track this sensor feed. I have a nagios server so I wrote a plugin to check the arduino output in mysql.

#! /usr/bin/python

import datetime
import time
import MySQLdb
import sys

dbhost = 'localhost'
dbname = 'DB_NAME'
dbuser = 'DB_USER'
dbpass = 'DB_PASSWORD'

# Make sure this file is executable. You should chmod +x sensor_status.py

def main():
    
    mysql_datetime = ""
    current_temp = ""    

    conn = MySQLdb.connect (host = dbhost,
                user = dbuser,
                passwd = dbpass,
                db = dbname)
    cursor = conn.cursor ()
    sql = "select created_at, temperature from arduino_temp where created_at = (select max(created_at) from arduino_temp);"
    cursor.execute(sql)
    rows = cursor.fetchall()
    for row in rows:
        mysql_datetime = row[0]
        current_temp = row[1]
    cursor.close ()
    conn.commit()
    conn.close ()        
    
    timediff = datetime.datetime.now() - mysql_datetime
    
    if int(timediff.total_seconds()) < 15:
        print "Shed was %s Fahrenheit and checked only %s seconds ago" % (str(current_temp), str(timediff.total_seconds()))
        sys.exit(0)
    else:
        sys.exit(2)


if __name__ == "__main__":
    main()

Nagios is monitoring software that uses a really simple algorithm for assigning status. Nagios runs your bash/python/whatever script and looks for an exit code of 0-3. Zero means everything is all right, 1 is a warning, 2 is critical, and 3 is unknown.

Nagios has a simple configuration that defines command and services. The executable sensor_status.py script is defined with the following command.

define command{
	command_name	arduino_temp_sensor
	command_line	/YOUR_PATH_TO_FILE/sensor_status.py  # chmod +x sensor_status.py
	}

The machine connected to the arduino sensor runs a service that calls this command.

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       localhost
        service_description             Shed Temperature
        check_command                   arduino_temp_sensor
        }

Sensor_status.py checks if the current system time and the last mysql record are more that 15 seconds apart. The arduino serial_reader_mysql.py inserts data every 10 seconds into mysql. This fifteen seconds check is enough time to know if data are not flowing properly.

If everything is working, Nagios will look like this.

Nagios_screen_shot

As you can see, monitoring your arduino temperature sensor is relatively simple. Nagios is a great way to make it happen. You can get all the source code to this project at https://github.com/muskox/arduino_env_monitoring/

Last year, my group decided to invest a certain amount (5%) of time to unstructured self-directed activities – we call it “investment time.” We do this on every other Friday afternoon.

This isn’t a new idea. The concept has been widely touted, and according to this post, it has been in vogue, or at least in use, as far back as 1948 (at 3M). There are also many well-documented flaws with this idea; see these comments. The comments on coding horror’s post are full of examples of how this can fail.

But there are good reasons to try: In a client-driven consulting organization, engineering work is tightly constrained: by feature requirements and/or by budget. Often it’s an engineer’s role to be the grownup on a project, reminding everyone about technical constraints that will lead to scope growth and cost overruns. It can be tiresome to always be the person that says “no,” and every now and then we need a chance to think expansively, try new things, and exercise our creative juices.

So the idea isn’t new, and 5% of our time isn’t really enough time to launch major new initiatives: gmail isn’t going to get invented every working other Friday afternoon. But I want to talk about this practice because I think it has had a lot of value for us, we’re a small organization without the resources of a Google or a Microsoft, and this gives us a different  perspective. In these sessions, we have a chance to try out new things, to work with different people, and to think creatively, but it hasn’t necessarily been easy to realize these benefits. I’d like to reflect on how this has worked and not worked for us, and then I’ll describe what we did in our most recent session.

Since we started this six months ago, we’ve had mixed experiences. Sometimes interesting things that provide tangible benefits have happened. We built an XSLT code coverage tool and integrated into our unit tests. We worked through some incompatibilities in our core platform so we could deploy <a href=”http://newrelic.com”>NewRelic</a&gt; to monitor it (which by the way has been a huge help). Other times people drift off and work on their regular work. A few times people have left early. My feeling is it’s OK that that happened because this is an experiment, but we do want to get the most out of the time we’re spending and so we try to learn from the failures as well as benefit the successes. Here’s a quick run-down on what seems to work and what doesn’t:

1. Celebrate success. OK this is just like mom and apple pie, but it’s easy to forget to do. We follow up each session with a report out to the whole company about what happened. It’s a good opportunity to let people know what the engineers are wasting their time on so they can see just how
useful it is.

2. Do it on a regular basis, at a predetermined time. This is really critical: it ensures that it will actually happen. The whole idea is to be open to new ideas, so you don’t know *what* you are going to do exactly, which lends a bit of an air of unreality to the whole enterprise. To combat that, you have to be very concrete and specific about when you are going to do it, who’s going to be involved, and where it will happen. Beer and food help too.

An important corollary is: no Friday releases. If you can make this part of your company culture, you will benefit regardless of whether you also have investment time. Friday releases are bad because nobody tests them until Monday except your customers – nuff said. But it’s especially important to keep releases away if you are trying to reserve some unscheduled time.

3. Have everybody in the same place together. This has been really important: the best weeks have seen a low hum of activity in the office, and significant exchanges have occurred from people spontaneously walking around the office and looking over other folks’ shoulders. However, one thing we need to get better at is involving our remote developers. There’s a tension around encouraging collaboration: some people work better alone; others in groups. We’re still learning the right balance.

4. Don’t force people to be creative if they don’t want to, or feel like they’re too busy. People shouldn’t be made to feel guilty if they’re not inventing a cure for cancer in their spare time.

Also, sometimes there is a monkey on someone’s back that they need to shake. Ideally this shouldn’t happen, but if it does, delaying work you really absolutely need to do just so you can take time out to come up with other freakin’ stuff to do obviously makes no sense.

5. Find a way to encourage everyone to generate good ideas in advance of the investment time. We’ve run an exercise with colored markers and sticky notes that was fun and generated all kinds of great ideas. Sometimes just going around the room and having everyone say what they’re planning to do
is enough.

6. Add competition to the mix. From time to time we like to issue a programming challenge to the engineers at our company. This gets folks thinking about work in new and creative ways, and well – it’s fun. We
strove to build the fastest sudoku solver; we competed in robocode matches. Hilarity ensued: programming chops were rewarded.

The Connect4 Challenge

Recently one of our engineers, Mark LeMay, came up with the idea of a framework showdown. The idea was to learn more about different programming languages and web application frameworks by devising a simple problem, applying various tools to it, and comparing notes.

The problem we chose to work on was Connect Four. This is a simple two-player game, familiar to many. You drop disks into a 6×7 array of cells, where they fall to the bottom of their column with the object of getting four in a row of your color before your opponent does the same. It seemed ideal from the perspective of being easy to program, but would require some server side programming in order to manage the game state, since it must be shared by multiple players.

We came up with a scoring rubric for evaluating entrants that awarded points for everything from “return HTTP OK” to stress testing and unbeatable AI. Pretty ambitious for 4 hours on a Friday afternoon. Yes, well it turns out that 4 hours is not enough to learn a new language, and a new framework, and to code a perfect Connect4 AI that can run while being stress tested by North Korean hacker bots (actually we didn’t try that last one, but I’m pretty sure all the entrants would have failed it).

However, it is almost enough try out some new tools and get a feel for what it might be like to do some real work with them. Here are the ones we tried: Ruby on Rails, Grails, Google Web Toolkit, Node.js, Lux (XQuery), Go and Google Apps, Clojure, Wicket, and Django. Some newish stuff, a lot of stuff that is not super new, but this was an opportunity for folks who don’t get out that much to try out a newer, younger model, um as it were.

A few things people wanted to or suggested trying were but never got to were: Scala, ChicagoBoss/Erlang, and HapiJS. Maybe we’ll check them out later.

I’m not even going to attempt to give you a run-down on all these tools in this post, but in some later posts we’ll cover some of them (we’ve already heard from Robert Hall re: sockets in node.js).

Here at SBO we love our food trucks. Even in the dead of winter we can be found waiting in line for our chicken and rice while enduring wind chills of 20°F and below. When we are not outside freezing our butts off, we can be found at our desks communicating with each other through HipChat, our preferred team chat software. On most days, the question of which Boston Food Truck awaits us comes up. I decided it would be fun to find a way to get HipChat to tell us each day what we want to know before we even ask.

HipChat automation basics

HipChat has a nice API available. To take advantage of it, you will need a group admin account on HipChat. The first step is to create an API Auth Token —a fairly simple process that is explained on the site. For our purposes, it will need to be of type ‘Notification.’ Its label can be what ever you like; I chose FTotD: Food Truck of the Day. (It will be useful to make an admin token as well for testing this next part, but in the end, we want the notification token.)

Now we need to test out the authentication by getting the HipChat API to give us a list of the rooms. HipChat kindly supplies some sample code for us in many languages. We write lots of Python code at SBO, so I chose to write my script in Python. Here is the code that will get us started:

import urllib2

url = "https://api.hipchat.com/v1/rooms/list?auth_token=TOKEN"
request = urllib2.Request(url)
response = urllib2.urlopen(request)

print response.read()

If you put your admin token in the code where indicated, you should get back a list of the rooms as well as the id values for each room. (It would be useful at this point to create a room to test your code. Once you have that ready, re-run this script and take note of the room you just made.)

We want to be able to send messages to HipChat, so we will first need to change the method from rooms/list to room/message in the URL above. The rooms/message method has a few required parameters that we will need to pass along such as room_id, from, and message. There are two others that I chose to change form their default values: notify and color. Add or change the following lines in the script, add your own specific values for the all-caps text, and give it a shot:

room = "YOUR ROOM NUMBER"
token = "YOUR AUTH TOKEN"
sender = "BostonFoodTruck"
color = "purple"
notify = "1"
message = "Test Message"

url = "https://api.hipchat.com/v1/rooms/message?room_id="+room+"&amp;auth_token="+token+"&amp;from="+sender+"&amp;message="+message+"&amp;color="+color+"&amp;notify="+notify

If all went well, you should have gotten a response that said “sent” and a message in your newly created room.

What’s for lunch?

Now that we have mastered sending a message to a room in HipChat, we need to make it interesting. The City of Boston provides a nice online app to help you figure out which food truck will be stopping by your neighborhood on which days. Check it out! This is great for a human user, but getting a script to extract the info we need from here will not be easy. The good news is that they have a mobile version of this page here that’s marked up as a simple table. I chose to use the Python urllib and urlopen methods to grab all the info off this page:

import urllib
url_file = urllib.urlopen("http://www.cityofboston.gov/business/mobile/schedule-app-min.asp")
file_lines = url_file.readlines()

I added that last line to break the file up into an array of lines so that later I can search through them and reference them with simple indexing. If you look at the markup code for the table on the city of Boston webpage, you will see that each line in the table looks something like this:

<td class="map"><a href="#maps" onClick='window.location.href=getMapLink("770074664779-2952731269296",
                        "Roxy's Gourmet Grilled Cheese 1");'>Map</a></td>
<td class="com"><a href="http://www.roxysgrilledcheese.com">Roxy's Gourmet Grilled Cheese 1</a></td>
<td class="dow">Wednesday</td>
<td class="tod">Dinner</td>
<td class="loc"><script type="text/javascript">document.write(getMapInfo("770074664779-2952731269296",
                        "Roxy's Gourmet Grilled Cheese 1"))</script>
                        (25) Innovation District, Seaport Blvd at Thompson</td>

Ultimately we want to get the URL out of the second line above, but we need to make sure it comes form the right part of the table. The last three lines help make that easy. All we need to do is search for a set of lines that contain my location, (25) Innovation District, Seaport Blvd at Thompson, the meal of interest (lunch), and the day of the week. But before we can do the search, we need to determine which day of the week it is. Luckily, Python will just handle this for us:

import datetime
now = datetime.datetime.now()
dotw = now.strftime("%A") # For example, "Thursday"
meal = "Lunch"
location = "(25) Innovation District, Seaport Blvd at Thompson"

Putting that right after the file_lines declaration should set us up well to extract the information we seek. The trick here is to reference the right lines as we loop though all the file lines and cut out the html code so that we are only left with the URL to the web page of the food truck we want.

i = 0
for line in file_lines:
    if location in line and meal in file_lines[i-1] and dotw in file_lines[i-2]:
        truck_url = file_lines[i-3].rsplit('href="',1)[1].rsplit('">',1)[0]
    i += 1

[Ed. For a more complicated HTML parsing problem we definitely recommend using a real XML/HTML parser like our perennial favorite lxml, but the sooner Matt could write this bot, the sooner we could get lunch. - Liza]

At this point we have the URL for the truck and we only need to add a line for the message as such:

message = truck_url

If you run your script now you should get a message in your room that tells you the URL for today’s truck at your location. But that is pretty boring, especially since HipChat expects an HTML-encoded message by default. We can send out a message that looks good and will do something if you click on it. For this I choose to use the main logo from each of the food truck’s websites that stop by our office and their menu page if they had one. Since all the webpages are written differently, there was no elegant way to code this, I simply had to copy the URLs for every page and put them directly into my code. The last block of code looks like this:

if truck_url == "http://www.roxysgrilledcheese.com":
    message = "<a href='http://www.roxysgrilledcheese.com/menu'>
               <img height='100' src='http://www.roxysgrilledcheese.com/wp-content/themes/bones/images/header.png'/></a>"
elif truck_url == "http://www.bennyscrepecafe.com":
    message = "<a href='http://www.bennyscrepecafe.com/menu'>
               <img height='100' src='http://www.bennyscrepecafe.com/wp-content/uploads/2012/10/header3.png'/></a>"
elif truck_url == "http://thechickenriceguys.com":
    message = "<a href='http://thechickenriceguys.com/'>
               <img height='100' src='http://thechickenriceguys.com/images/cnrg_logo.jpg'/></a>"
elif truck_url == "http://www.bonmetruck.com":
    message = "<a href='http://blog.bonmetruck.com/?page_id=9'>
               <img height='100' src='http://blog.bonmetruck.com/wp-content/uploads/2012/11/Bon_Me__4colorlogo3-300x235.png'/></a>"
elif truck_url == "http://www.meimeistreetkitchen.com":
    message = "<a href='http://meimeiboston.com/menu/'>
               <img height='100' src='http://meimeiboston.com/wp-content/uploads/2012/03/cropped-meimeiheader1.jpg'/></a>"
else:
    message = truck_url

While the food trucks that show up here do not typically change from week to week, I added the last condition just in case. If I see a plain URL pop up one day I will have to add a new condition to make up for the new truck. One final line is needed to make the message safe for passing through a URL:

message = urllib.quote(message)

At this point you should be able to run this and see a message pop up in your test room with what ever info you put in your message. At this point, the Python script is complete, you will just need to edit it to put have the right notification token and room ID.

The next trick will be to get the script to run at a certain time each day and on the days that are important to you. For that we will use cron. I want my script to run Monday through Friday at noon. Log in to your favorite unix machine that is on all the time. Run the command ‘crontab -e’. If it is your first time, it will ask you which editor you use, choose your favorite. Read the info it gives you and at the end add a line like this:

00 12 * * 1-5 /path/to/your/script.py

the way that will read to cron is, run this command “/path/to/your/script.py” any month of the year, any day of the month, Monday – Friday, at the 12th hour, on the 00 minute. You will need to add:

#!/usr/bin/env python

as the first line in your Python script and make the file executable. Last thing to do is wait and enjoy the results.

Well, there you have it, a Python script that will look up food truck info and post it to HipChat five days a week.

Working With WebSockets

I’ve been tinkering around with Node.js quite a bit lately, trying to ramp up on a language that helps a front end engineer like me slice deeper into the back end and create more sophisticated applications. One of the side benefits of learning a new language is the gamut of novel technologies that a new language levies, such as WebSockets.

Node.js has a single-threaded, event-driven core. This translates into a language that allows for high concurrency, fast event-driven response, and easy round-trip client-to-server(-to-client) communication. Node.js is built in the same language that resides in the browser (JavaScript). This means that “real-time” applications that maintain state and communicate to the server without refreshing the browser can rely on the same browser-like listening behavior. In essence, the server acts very much like the browser, waiting for events to be called, and then operating when triggered on the callback.

This sort of call and callback functionality has traditionally been the realm of AJAX-driven applications. Pepper a browser page with some AJAX calls, which are routed on the server in whatever language, with response pushed back up to the browser usually in the form of JSON or XML, and parsed by the browser without ever refreshing the page. Asynchronous HTTP.

Enter WebSockets. If AJAX was the main driver behind the “Web 2.0” phase of the Internet (though definitions differ), maybe WebSockets could fairly be called version 3. Essentially, WebSockets are an HTML5 browser technology that define a full-duplex socket “handshake” over a single TCP connection. Translated, WebSockets provide a way for connecting from the client to the server with a constant shared connection. Think of it like the telephone. Dial the number, place the call, and once connected, voices can talk back and forth without interruption (until the call is ended).

Where does Node.js come into this? Honestly, any server-side language can employ WebSockets. (Scala, might do it via “actors,” for example.) What sets Node.js apart is that the event-driven behavior of the browser that WebSockets rely on is also native to Node.js, as both have APIs that rely on JavaScript. Put another way, the event handling API is not only native to Node.js (in Node terms, you can reference it as require(‘event’).EventEmitter;), but can be extended to handle back-end WebSocket functionality.

In fact, this is precisely what Node.js author-coder Guillermo Rauch did with his highly popular Node.js package, “Socket.io.” (For nerds/hipsters looking for a little more, “Engine.io” is it’s important twin.) Rauch took the Node.js events API and built a socket tool on top, complete with immediate event handlers and so forth. The whole nine yards.

A few of us at PubFactory recently tested Socket.io in a “Connect Four” engineering challenge. For sake of brevity, I’ll just state that implementation of a front-to-back-end connection was extremely simple to set up. In the matter of about an hour I was able to setup a very rough chat room that ran across the same IP address, over a fully functional WebSocket. (Unfortunately, Heroku, our deployment environment, does not support sockets, so we had to downgrade to XHR long-polling, which Socket.io easily allows, but WebSockets works locally in full splendor.)

Where will WebSockets take the web? My jest about “Web 3.0” may not actually be too hyperbolic. There’s already a growing list of developers and applications that utilize this stack. In fact, Node.js has become the nucleus for real-time frameworks reliant on WebSockets, including the likes of Meteor.js, Derby.js, SocketStream, which iterate far beyond Socket.io. These frameworks try and solve the problem, for example, of shared code (sockets and node combined tend to flatten the code stack), authentication over a constant connection, vertical and horizontal socket scalability, and other such problems. It’ll be interesting to see where it all goes.

Follow

Get every new post delivered to your Inbox.

Join 291 other followers