Andy Oram

photo_andyo_m.jpgAndy Oram is an editor at O'Reilly Media. An employee of the company since 1992, Andy currently specializes in open source technologies and software engineering. His work for O'Reilly includes the first books ever released by a U.S. publisher on Linux, the 2001 title Peer-to-Peer, and the 2007 best-seller Beautiful Code.

Andy Oram是O'Reilly Media的一位编辑。自1992年就是O'Reilly的一员,Andy现在致力于开放源代码技术以及软件工程领域。他在O'Reilly的作品包括美国出版商出版的第一本Linux书籍、2001年的“Peer-to-Peer”以及2007年的畅销书“Beautiful Code”。

Current activities at the Electronic Information Privacy Center

Andy Oram @praxagora 2010-03-19

When Marc Rotenberg founded the Electronic Information Privacy Center in 1994, I doubt he realized how fast their scope would swell as more and more of our lives became digitized and networked. Now it seems like everything that happens in society has an electronic component and a privacy component. I had the chance to drop in to their office on Monday and heard about the front-burner items they're working on.

  • Whole-body imaging in airports, a very hot issue right now. While Americans push back against it, the European Union has to vote on it soon.
  • The Smart Grid: a massive upgrade planned for the American system for delivering electricity across the nation as well as over the last mile to your home. Could the Smart Grid tell marketers your life style?
  • Privacy of text messaging. EPIC is very active on City of Ontario v. Quon, where the government asserts that using a city-issued device allows the city to read all of the employee's messages.
  • Freedom of Information Act. Why are government agencies (except for a few exemplary ones) fulfilling a smaller percentage of demands during the Obama administration than they did during the Bush administration?
  • Ballot initiatives. EPIC has argued in Doe v. Reed that signing a petition to put a question on a ballot should be private, like voting.

And if you visit the EPIC home page this week, or the companion privacy.org page, you'll see that they're following even more diverse issues: the FCC broadband proposal, consumer privacy, data retention by ISPs, etc. They were interested to hear what I've been learning recently about privacy in electronic health records.

EPIC has been remarkably effective over the years as an organization with about a dozen staff (mostly young and idealistic rather than canny and seasoned) and no cash-wielding lobbyists. They haven't compromised their principles in the dozen years I've been following them, but they not only get to the table most of the time but manage to bend the decision their way most of the time.

I attribute this success to single-mindedness (they can nail the privacy chink in any initiative) persistence, coalition-building with like minded organizations (leading the Privacy Coalition, collaborating with London's Privacy International, among other organizations around the world, and work closely with such natural allies as the ACLU), but mostly knowing their stuff cold. They sail into debate with a full understanding of technical details as well as the legal issues that impinge on their position.

The Smart Grid is an excellent example of how EPIC investigates an issue early in its existence and hones in on the dark underside. The Smart Grid is a buzzword covering changes that should save us huge amounts of electricity lost in old, inefficient switches, as well as improve the efficiency of energy delivery in neighborhoods. A key part of the Smart Grid is monitoring and logging our electricity usage, building by building and even machine by machine.

In this futuristic vision, the electric utility would know when you've started your air conditioner or clothes dryer and could send you messages suggesting new patterns of behavior that will relieve pressure on the grid and save you money as well. This is nice, but it also means the electric utility basically knows how you lead your life.

Traffic analysis on your device usage could show who stays home during the day, when kids come home from school, and who plays video games (heavy electricity usage from a home computer) late at night.

Currently no one has discussed who controls this data. Implicitly, it is left in the hands of the utility, which is free to sell it like any other information. There is little doubt that advertisers would love to get their hands on this information. So would the government, I bet--remember when police were scanning homes for evidence of marijuana cultivation? EPIC would like the information to be in the hands of the consumer.

A bill just introduced by Representative Ed Markey, the "Electric Consumer Right to Know Act" (H. R. 4860), would inform electricity users of their energy usage in a form they could process on a computer or other device, typically every 15 minutes. The bill mandates a smart meter that "provides adequate protections for the security of such information and the privacy of such electric consumer." It doesn't go into any more detail about what the utility could do with the information.

The ambiguous ownership of Smart Grid data illustrates why privacy is such a hard turf to defend, once you have declared your jurisdiction over it as EPIC has done. Data flows from one place to another--whether from the electric meter to your cell phone, your camera to Facebook, or your vendor to your bank--and is therefore intrinsically shared. Privacy is an umbrella term that encompass attempts to set limits or impose rules on all these types of sharing.

In trying to protect our privacy EPIC is swimming against the tide, of course, but what's really challenging is how data collection and dissemination has shifted. When EPIC started, most electronic data was held by large institutions who made ready targets for EPIC's legal challenges. Now each person is his or her own worst enemy, freely sharing personal information, pictures, and videos online--a phenomenon termed Little Brother.

Cameras and sensors are also creating millions of new sources for data, while advances in data mining and analysis allow people to learn more from the data than ever before.

I think EPIC is handling this shift well. They stay focused on policy rather than pursuing the idealistic but impractical course of training people to use privacy safeguards and protect themselves. There are just too many ways to weasel data out of us, some of which will never be under our control, and most people just can't learn everything they need to know to be safe, whether it be about Web proxies, Flash cookies, or document metadata.

EPIC demands that institutions take responsibility for privacy, designing it into their systems. A recent, well publicized example of this doctrine was their complaint to the FTC about Facebook's changes to privacy settings in December 2009. EPIC doesn't believe it's enough to boast about flexibility and user control--something that endangers the 99.9% of users who don't understand how to change a default is a violation of users' rights.

But EPIC is neither rigid nor abstentionist. They may complain about Facebook, but maintain a Facebook page. They're totally into the new electronic age. But they want it to serve its users rather than a few centralized institutions, and for privacy advocates they're not shy about letting us know what they think.

Report from HIMMS Health IT conference: building or bypassing infrastructure

Andy Oram @praxagora 2010-03-05

Today the Healthcare Information and Management Systems Society (HIMSS) conference wrapped up. In previous blogs, I laid out the benefits of risk-taking in health care IT followed by my main theme, interoperability and openness. This blog will cover a few topics about a third important issue, infrastructure.

Why did I decide this topic was worth a blog? When physicians install electronic systems, they find that they need all kinds of underlying support. Backups and high availability, which might have been optional or haphazard before, now have to be professional. Your patient doesn't want to hear, "You need an antibiotic right away, but we'll order it tomorrow when our IT guy comes in to reboot the system." Your accounts manager would be almost as upset if you told her that billing will be delayed for the same reason.

Network bandwidth

An old sales pitch in the computer field (which I first heard at Apollo Computer in the 1980s) goes, "The network is the computer." In the coming age of EHRs, the network is the clinic. My family practitioner (in an office of five practitioners) had to install a T1 line when they installed an EHR. In eastern Massachusetts, whose soil probably holds more T1 lines than maple tree roots, that was no big deal. It's considerably more problematic in an isolated rural area where the bandwidth is more comparable to what I got in my hotel room during the conference (particularly after 10:30 at night, when I'm guessing a kid in a nearby room joined an MMPG). One provider from the mid-West told me that the incumbent changes $800 per month for a T1. Luckily, he found a cheaper alternative.

So the FCC is involved in health care now. Bandwidth is perhaps their main focus at the moment, and they're explicitly tasked with making sure rural providers are able to get high-speed connections. This is not a totally new concern; the landmark 1994 Telecom Act included rural health care providers in its universal service provisions. I heard one economist deride the provision, asking what was special about rural health care providers that they should get government funding. Fifteen years later, I think rising health care costs and deteriorating lifestyles have answered that question.

Wireless hubs

The last meter is just as important as the rest of your network, and hospitals with modern, technology-soaked staff are depending increasingly on mobile devices. I chatted with the staff of a small wireless company called Aerohive that aims its products at hospitals. Its key features are:

Totally cable-free hubs
Not only do Aerohive's hubs communicate with your wireless endpoints, they communicate with other hubs and switches wirelessly. They just make the hub-to-endpoint traffic and hub-to-hub traffic share the bandwidth in the available 2.4 and 5 GHz ranges. This allows you to put them just about anywhere you want and move them easily.
Dynamic airtime scheduling
The normal 802.11 protocols share the bandwidth on a packet-by-packet basis, so a slow device can cause all the faster devices to go slower even when there is empty airtime. I was told that an 802.11n device can go slower than a 802.11b device if it's remote and its signal has to go around barriers. Aerohive just checks how fast packets are coming in and allocates bandwidth on that ratio, like time-division multiplexing. If your device is ten times faster than someone else's and the bandwidth is available, you can use ten times as much bandwidth.
Dynamic rerouting
Aerohive hubs use mesh networking and an algorithm somewhat like Spanning Tree Protocol to reconfigure the network when a hub is added or removed. Furthermore, when you authenticate with one hub, its neighbors store your access information so they can pick up your traffic without taking time to re-authenticate. This makes roaming easy and allows you to continue a conversation without a hitch if a hub goes down.
Security checking at the endpoint
Each hub has a built-in firewall so that no unauthorized device can attach to the network. This should be of interest in an open, public environment like a hospital where you have no idea who's coming in.
High bandwidth
The top-of-the-line hub has two MIMO radios, each with three directional antennae.

Go virtual, part 1

VMware has customers in health care, as in other industries. In addition, they've incorporated virtualization into several products from medical equipment and service vendors,

Radiology
Hospitals consider these critical devices. Virtualization here supports high availability.
Services
A transcription service could require ten servers. Virtualization can consolidate them onto one or two pieces of hardware.
Roaming desktops
Nurses often move from station to station. Desktop virtualization allows them to pull up the windows just as they were left on the previous workstation.

Go virtual, squared

If all this talk of bandwidth and servers brings pain to your head as well as to the bottom line, consider heading into the cloud. At one talk I attended today on cost analysis, a hospital administrator reported that about 20% of their costs went to server hosting. They saved a lot of money by rigorously eliminating unneeded backups, and a lot on air conditioning by arranging their servers more efficiently. Although she didn't discuss Software as a Service, those are a couple examples of costs that could go down if functions were outsourced.

Lots of traditional vendors are providing their services over the Web so you don't have to install anything, and several companies at the conference are entirely Software as a Service. I mentioned Practice Fusion in my previous blog. At the conference, I asked them three key questions pertinent to Software as a Service.

Security
This is the biggest question clients ask when using all kinds of cloud services (although I think it's easier to solve than many other architectural issues). Practice Fusion runs on HIPAA-compliant Salesforce.com servers.
Data portability
If you don't like your service, can you get your data out? Practice Fusion hasn't had any customers ask for their data yet, but upon request they will produce a DVD containing your data in CSV files, or in other common formats, overnight.
Extendibility
As I explained in my previous blog, clients increasingly expect a service to be open to enhancements and third-party programs. Practice Fusion has an API in beta, and plans to offer a sandbox on their site for people to develop and play with extensions--which I consider really cool. One of the API's features is to enforce a notice to the clinician before transferring sensitive data.

The big selling point that first attracts providers to Practice Fusion is that it's cost-free. They support the service through ads, which users tell them are unobtrusive and useful. But you can also pay to turn off ads. The service now has 30,000 users and is adding about 100 each day.

Another SaaS company I mentioned in my previous blog is Covisint. Their service is broader than Practice Fusion, covering not only patient records but billing, prescription ordering, etc. Operating also as an HIE, they speed up access to data on patients by indexing all the data on each patient in the extended network. The actual data, for security and storage reasons, stays with the provider. But once you ask about a patient, the system can instantly tell you what sorts of data are available and hook you up with the providers for each data set.

Finally, I talked to the managers of a nimble new company called CareCloud, which will start serving customers in early April. CareCloud, too, offers a range of services in patient health records, practice management, and and revenue cycle management. It was built entirely on open source software--Ruby on Rails and a PostgreSQL database--while using Flex to build their snazzy interface, which can run in any browser (including the iPhone, thanks to Adobe's upcoming translation to native code).upcoming translation to native code). Their strategy is based on improving physicians' productivity and the overall patient experience through a social networking platform. The interface has endearing Web 2.0 style touches such as a news feed, SMS and email confirmations, and integration with Google Maps.

And with that reference to Google Maps (which, in my first blog, I complained about mislocating the address 285 International Blvd NW for the Georgia World Congress Center--thanks to the Google Local staff for getting in touch with me right after a tweet) I'll end my coverage of this year's HIMSS.

Report from HIMMS Health IT conference: toward interoperability and openness

Andy Oram @praxagora 2010-03-04

Yesterday and today I spent once again at the Healthcare Information and Management Systems Society (HIMSS) conference in Atlanta, rushing from panel session to vendor booth to interoperability demo and back (or forward--I'm not sure which direction I've been going). All these peregrinations involve a quest to find progress in the areas of interoperability and openness.

The U.S. has a mobile population, bringing their aches and pains to a plethora of institutions and small providers. That's why health care needs interoperability. Furthermore, despite superb medical research, we desperately need to share more information and crunch it in creative new ways. That's why health care needs openness.

My blog yesterday covered risk-taking; today I'll explore the reasons it's so hard to create change.

The health care information exchange architecture

Some of the vendors I talked to boasted of being in the field for 20 years. This give them time to refine and build on their offerings, but it tends to reinforce approaches to building and selling software that were prominent in the 1980s. These guys certainly know what the rest of the computer field is doing, such as the Web, and they reflect the concerns for interoperability and openness in their own ways. I just feel that what I'm seeing is a kind of hybrid--more marsupial than mammal.

Information exchange in the health care field has evolved the following architecture:

Electronic medical systems and electronic record systems
These do all the heavy labor that make health care IT work (or fail). They can be divided into many categories, ranging from the simple capturing of clinical observations to incredibly detailed templates listing patient symptoms and treatments. Billing and routine workflow (practice management) are other categories of electronic records that don't strictly speaking fall into the category of health records. Although each provider traditionally has had to buy computer systems to support the software and deal with all the issues of hosting it, Software as a Service has come along in solutions such as Practice Fusion.
Services and value-added applications
As with any complex software problem, nimble development firms partner with the big vendors or offer add-on tools to do what health care providers find too difficult to do on their own.
Health information exchanges (HIEs)
Eventually a patient has to see a specialist or transfer records to a hospital in another city--perhaps urgently. Partly due to a lack of planning, and partly due to privacy concerns and other particular issues caught up in health care, transfer is not as simple as querying Amazon.com or Google. So record transfer is a whole industry of its own. Some institutions can transfer records directly, while others have to use repositories--paper or electronic--maintained by states or other organizations in their geographic regions.
HIE software and Regional Health Information Organizations (RHIOs)
The demands of record exchange create a new information need that's filled by still more companies. States and public agencies have also weighed in with rules and standards through organizations called Regional Health Information Organizations.

Let's see how various companies and agencies fit into this complicated landscape. My first item covered a huge range of products that vendors don't like to have lumped together. Some vendors, such as the Vocera company I mentioned in yesterday's blog and 3M, offer products that capture clinicians' notes, which can be a job in itself, particularly through speech recognition. Emdeon covers billing, and adds validity checking to increase the provider's chances of getting reimbursed the first time they submit a bill. There are many activities in a doctor's office, and some vendors try to cover more than others.

Having captured huge amounts of data--symptoms, diagnoses, tests ordered, results of those tests, procedures performed, medicines ordered and administered--these systems face their first data exchange challenge: retrieving information about conditions and medicines that may make a critical difference to care. For instance, I saw a cool demo at the booth of Epic, one of the leading health record companies." A doctor ordered a diuretic that has the side-effect of lowering potassium levels. So Epic's screen automatically brought up the patient's history of potassium levels along with information about the diuretic.

Since no physician can keep all the side-effects and interactions between drugs in his head, most subscribe to databases that keep track of such things; the most popular company that provides this data is First DataBank. Health record systems simply integrate the information into their user interfaces. As I've heard repeatedly at this conference, the timing and delivery of information is just as important as having the information; the data is not of much value if a clinician or patient has to think about it and go searching for it. And such support is central to the HITECH act's meaningful use criteria, mentioned in yesterday's blog.

So I asked the Epic rep how this information got into the system. When the physicians sign up for the databases, the data is sent in simple CSV files or other text formats. Although different databases are formatted in different ways, the health record vendor can easily read it in and set up a system to handle updates.

Variations on this theme turn up with other vendors. For instance, NextGen Healthcare contracts directly with First DataBank so they can integrate the data intimately with NextGen's screens and database.

So where does First DataBank get this data? They employ about 40 doctors to study available literature, including drug manufacturers' information and medical journals. This leads to a constantly updated, independent, reliable source for doses, side-effects, counterindications, etc.

This leads to an interesting case of data validity. Like any researchers--myself writing this blog, for instance--First DataBank could theoretically make a mistake. Their printed publications include disclaimers, and they require the companies who licence the data to reprint the disclaimers in their own literature. But of course, the disclaimer does not pop up on every dialog box the doctor views while using the product. Caveat emptor...

Still, decision support as a data import problem is fairly well solved. When health record systems communicate with each other, however, things are not so simple.

The challenges in health information exchange: identification

When a patient visits another provider who wants to see her records, the first issue the system must face is identifying the patient at the other provider. Many countries have universal IDs, and therefore unique identifiers that can be used to retrieve information on a person wherever she goes, but the United States public finds such forms of control anathema (remember the push-back over Read ID?). There are costs to restraining the information state: in this case, the hospital you visit during a health crisis may have trouble figuring out which patient at your other providers is really you.

HIEs solve the problem by matching information such as name, birth date, age, gender, and even cell phone number. One proponent of the federal government's Nationwide Health Information Network told me it can look for up to 19 fields of personal information to make a match. False positives are effectively eliminated by strict matching rules, but legitimate records may be missed.

Another issue HIEs face is obtaining authorization for health data, which is the most sensitive data that usually concerns ordinary people. When requesting data from another provider, the clinician has to log in securely and then offer information not only about who he is but why he needs the data. The sender, for many reasons, may say no:

  • Someone identified as a VIP, such as a movie star or high-ranking politician, is automatically protected from requests for information.
  • Some types of medical information, such as HIV status, are considered especially sensitive and treated with more care.
  • The state of California allows ordinary individuals to restrict the distribution of information at the granularity of a single institution or even a single clinician, and other states are likely to do the same.

Thus, each clinician needs to register with the HIE that transmits the data, and accompany each request with a personal identifier as well as the type of information requested and the purpose. One service I talked to, Covisint, can query the AMA if necessary to verify the unique number assigned to each physician in the us, the Drug Enforcement Administration (DEA) number. (This is not the intended use of a DEA number, of course; it was created to control the spread of pharmaceuticals, not data.)

One of the positive impacts of all this identification is that some systems can retrieve information about patients from a variety of hospitals, labs, pharmacies, and clinics even if the requester doesn't know where it is. It's still up to them to determine whether to send the data to the requester. Currently, providers exchange a Data Use and Reciprocal Support Agreement (DURSA) to promise that information will be stored properly and used only for the agreed-on purpose. Exchanging these documents is currently cumbersome, and I've been told the government is looking for a way to standardize the agreement so the providers don't need to directly communicate.

The challenges in health information exchange: format

Let's suppose we're at the point where the owner of the record has decided to send it to the requester. Despite the reverence expressed by vendors for HL7 and other standards with which the health care field is rife, documents require a good deal of translation before they can be incorporated into the receiving system. Each vendor presents a slightly different challenge, so to connect n different products a vendor has to implement n2 different transformations.

Reasons for this interoperability lie at many levels:

Lack of adherence to standards
Many vendors created their initial offerings before applicable standards existed, and haven't yet upgraded to the standards or still offer new features not covered by standards. The meaningful use criteria discussed in yesterday's blog will accelerate the move to standards.
Fuzzy standards
Like many standards, the ones that are common in the medical field leave details unspecified.
Problems that lie out of scope
The standards tend to cover the easiest aspect of data exchange, the document's format. As an indication of the problem, the 7 in HL7 refers to the seventh (application) layer of the ISO model. Brian Behlendorf of Apache fame, now consulting with the federal government to implement the NHIN, offers the following analogy. "Suppose that we created the Internet by standardizing HTML and CSS but saying nothing about TCP/IP and DNS."
Complex standards
As in other fields, the standards that work best in health records are simple ones. There is currently a debate, for instance, over whether to use the CCR or CCD exchange format for patient data. The trade-off seems to be that the newer CCD is richer and more flexible but a lot harder to support.
Misuse
As one example, the University of Pittsburgh Medical Center tried to harmonize its problem lists and found that a huge number of patients--including many men--were coded as smoking during pregnancy. They should have been coded with a general tobacco disorder. As Dr. William Hogan said, "People have an amazing ability to make a standard do what it's not meant to do, even when it's highly specified and constrained."
So many to choose from
Dell/Perot manager Jack Wankowski told me that even though other countries have digitized their health records far more than the U.S. has, they have a lot fewer published standards. It might seem logical to share standards--given that people are people everywhere--but in fact, that's hard to do because diagnosis and treatment are a lot different in different cultures. Wankowski says, "Unlike other industries such as manufacturing and financial services, where a lot can be replicated, health care is very individual on a country by country basis at the moment. Because of this, change is a lot slower."
Encumbrances
The UPMC coded its problem lists in ICD-9-CM instead of SNOMED, even through SNOMED was far superior in specificity and clarity. Along with historical reasons, they avoided SNOMED because it was a licensed product until 2003 whereas ICD-9-CM was free. As for ICD-9-CM, its official standard is distributed as RTF documents, making correct adoption difficult.

Here are a few examples of how vendors told me they handle interoperability.

InterSystems is a major player in health care. The basis of their offerings is Caché, an object database written in the classic programming language for medical information processing, MUMPS. (MUMPS was also standardized by an ANSI committee under the name M.) Caché can be found in all major hospitals. For data exchange, InterSystems provides an HIE called HealthShare, which they claim can communicate with other vendors' systems by supporting HL7 and other appropriate standards. HealthShare is both communications software and an actual hub that can create the connections for customers.

Medicity is another key HIE vendor. Providers can set up their own hubs or contract with a server set up by Medicity in their geographic area. Having a hub means that a small practice can register just once with the hub and then communicate with all other providers in that region.

Let's turn again to Epic. Two facilities that use it can exchange a wide range of data, because some of its data is not covered by standards. A facility that uses another product can exchange a narrower set of data with an Epic system over Care Everywhere, using the standards. The Epic rep said they will move more and more fields into Care Everywhere as standards evolve.

What all this comes down to is an enormous redundant infrastructure that adds no value to electronic records, but merely runs a Red Queen's Race to provide the value that already exists in those records. We've already seen that defining more standards has a limited impact on the problem. But a lot of programmers at this point will claim the solution lies in open source, so let's see what's happening in that area.

The open source challengers

The previous sections, like acts of a play, laid out the character of the vendors in the health care space as earnest, hard-working, and sometimes brilliantly accomplished, but ultimately stumbling through a plot whose bad turns overwhelm them. In the current act we turn to a new character, one who is not so well known nor so well tested, one who has shown promise on other stages but is still finding her footing on our proscenium.

The best-known open source projects in health care are OpenMRS, the Veterans Administration's VistA, and the NHIN CONNECT Gateway. I won't say anything more about OpenMRS because it has received high praise but has made little inroads into American health care. I'll devote a few paragraphs to the strengths and weaknesses of VistA and CONNECT.

Buzz in the medical world is that VistA beats commercial offerings for usability and a general fit to the clinicians' needs. But it's tailored to the Veterans Administration and--as a rep for the vxVistA called it--has to be deveteranized for general use. This is what vxVistA does, but they are not open source. They make changes to the core and contribute it back, but their own products are proprietary. A community project called WorldVistA also works on a version of VistA for the non-government sector.

One of the hurdles of adapting VistA is that one has to learn its underlying language, MUMPS. Most people who dive in license a MUMPS compiler. The vxVistA rep knows of no significant users of the free software MUMPS compiler GT.M. VistA also runs on the Caché database, mentioned earlier in this article. If you don't want to license Caché from InterSystems, you need to find some other database solution.

So while VistA is a bona fide open source project with a community, it's ecosystem does not fit neatly with the habits of most free software developers.

CONNECT is championed by the same Office of the National Coordinator for Health Information Technology that is implementing the HITECH recovery plan and meaningful use. A means for authenticating requests and sending patient data between providers, CONNECT may well be emerging as the HIE solution for our age. But it has some maturing to do as well. It uses a SOAP-based protocol that requires knowledge of typical SOA-based technologies such as SAML.

Two free software companies that have entered the field to make installing CONNECT easier are Axial Exchange, which creates open source libraries and tools to work with the system, and the Mirth Corporation. Jon Teichrow of Mirth told me how a typical CONNECT setup at a rural hospital took just a week to complete, and can run for the cost of just a couple hours of support time per week. The complexities of handling CONNECT that make so many people tremulous, he said, were actually much easier for Mirth than the more typical problem of interpreting the hospital's idiosyncratic data formats.

Just last week, the government announced a simpler interface to the NHIN called NHIN Direct. Hopefully, this will bring in a new level of providers who couldn't afford the costs of negotiating with CONNECT.

CONNECT has certainly built up an active community. Agilex employee Scott E. Borst, who is responsible for a good deal of the testing of CONNECT, tells me that participation in development, testing, and online discussion is intense, and that two people were recently approved as committers without being associated with any company or government agency officially affiliated with CONNECT.

The community is willing to stand up for itself, too. Borst says that when CONNECT was made open source last year, it came with a Sun-based development environment including such components as NetBeans and GlassFish. Many community members wanted to work on CONNECT using other popular free software tools. Accommodating them was tough at first, but the project leaders listened to them and ended up with a much more flexible environment where contributors could use essentially any tools that struck their fancy.

Buried in a major announcement yesterday about certification for meaningful use was an endorsement by the Office of the National Coordinator for open source. My colleague and fellow blogger Brian Ahier points out that rule 4 for certification programs explicitly mentions open source as well self-developed solutions. This will not magically lead to more open source electronic health record systems like OpenMRS, but it offers an optimistic assessment that they will emerge and will reach maturity.

As I mentioned earlier, traditional vendors are moving more toward openness in the form of APIs that offer their products as platforms. InterSystems does this with a SOAP-based interface called Ensemble, for instance. Eclipsys, offering its own SOAP-based interface called Helios, claims that they want an app store on top of their product--and that they will not kick off applications that compete with their own.

Web-based Practice Fusion has an API in beta, and is also planning an innovation that makes me really excited: a sandbox provided by their web site where developers can work on extensions without having to download and install software.

But to a long-time observer such as Dr. Adrian Gropper, founder of the MedCommons storage service, true open source is the only way forward for health care records. He says we need to replace all those SOAP and WS-* standards with RESTful interfaces, perform authentication over OpenID and OAuth, and use the simplest possible formats. And only an enlightenment among the major users--the health care providers--will bring about the revolution.

But at this point in the play, having explored the characters of electronic record vendors and the open source community, we need to round out the drama by introducing yet a third character: the patient. Gropper's MedCommons is a patient-centered service, and thus part of a movement that may bring us openness sooner than OpenMRS, VistA, or CONNECT.

Enter the patient

Most people are familiar with Microsoft's HealthVault and Google Health. Both allow patients to enter data about their own health, and provide APIs that individuals and companies alike are using to provide services. A Journal of Participatory Medicine has just been launched, reflecting the growth of interest in patient-centered or participatory medicine. I saw a book on the subject by HIMSS itself in the conference bookstore.

The promise of personal health records goes far beyond keeping track of data. Like electronic records in clinicians' hands, the data will just be fodder for services with incredible potential to improve health. In a lively session given today by Patricia Brennan of Project HealthDesign, she used the metaphors of "intelligent medicines" and "smart Band-Aids" that reduce errors and help patients follow directions.

Project HealthDesign's research has injected a dose of realism into our understanding of the doctor-patient relationship. For instance, they learned that we can't expect patients to share everything with their doctors. They get embarrassed when they lapse in their behavior, and don't want to admit they take extra medications or do other things not recommended by doctors. So patient-centered health should focus on delivering information so patients can independently evaluate what they're doing.

As critical patient data becomes distributed among a hundred million individual records, instead of being concentrated in the hands of providers, simple formats and frictionless data exchange will emerge to handle them. Electronic record vendors will adapt or die. And a whole generation of products--as well as users--will grow up with no experience of anything but completely open, interoperable systems.

Report from HIMMS Health IT conference: from Silicon Valley technology to Silicon Valley risk-taking

Andy Oram @praxagora 2010-03-02

I'm in Atlanta for the biggest US conference in health care IT, run by the Healthcare Information and Management Systems Society (HIMSS). This organization, along with the branch of the federal government responsible for dispersing funds for a medical records overhaul, has to do a huge job in an extremely short time. I'll report what I hear (and how I interpret it) over the next few days, aiming both at people who care in general about the future of health care at particularly at readers who are wondering whether their next career move may be into health care.

Although many people have been saying that the medical field would benefit from a Silicon Valley approach to technology, it's coming to seem that even more important would be a Silicon Valley approach to risk-taking. I'll look at the events that created this imperative.

Where the pressure comes from

Why are most doctors in the U.S., some thirty years after IT became ubiquitous in American offices, still working with paper records? The main reason is that they work in small offices instead of large institutions, as I describe in an earlier blog about electronic health records. Most economically advanced nations have centralized, government-administered health care systems, and therefore electronic records. Large institutions in the U.S. also have them--the Veterans Administration's VistA is a famous example--and more hospitals have made the move than small physicians' offices.

A crisis in costs and achievements provides a nice impetus for change (an article in last Sunday's New York Times lays out the stakes), but cold cash does even better. Few things can make an industry perk up as much as a sudden infusion of twenty billion dollars. Thus the impact of a provision in the federal stimulus bill (properly known as the American Recovery and Reinvestment Act of 2009 or ARRA) that mandates the adoption of electronic medical records.

The framers and implementers of the stimulus bill were both ambitious and idealistic, but they weren't naive. They know that the adoption of electronic health records, like any computer system, can be botched and can turn out to miss the benefits that it's supposed to bring. So in this part of the act, called Health Information Technology for Economic and Clinical Health (HITECH), they lay out a demanding list of practices that clinics and hospitals must carry out to qualify for government money. In other words, HITECH is really about behavior and workflow, not technology.

What does HITECH call for? Some requirements seem fairly easy to meet, such as tracking key clinical conditions on patients. Others get quite complex, even at Stage 1 of the implementation. For instance, doctors are supposed to use their electronic systems to help check their treatment plans for errors and suggest best practices, a field called clinical decision support. Stages 2 and 3 require complex data exchanges with other organizations, which in turn requires interoperation among different systems from different vendors. I'll return to interoperability in a later blog. Other rules include evidence-based order sets (which bring up suggested treatment regimes based on research) and reporting treatments and results to registries to foster the further development of best practices.

All these requirements are put forward under the buzzword meaningful use, an ironic choice given that the requirements aren't even completely defined yet. Still, whether it's meaningful or not, the term has instantly leapt to the forefront of discussion among vendors and providers, because of the financial rewards attached to them.

In principle, the meaningful use criteria are good. The government bodies creating the requirements got stakeholders involved early and often. It's well understood that these things are needed to improve care and reduce inefficiency. As in any major attempt at social change, doom-sayers predict disaster. (I'm not necessarily saying they're wrong; in this case they include CIO Anthony Guerra and IT company CTO Evan Steele, both featured at the HIMSS conference.) But for the most part, the health care industry is lined up behind meaningful use--or at least behind the promise of the money that health care providers will be paid to implement it. Grumbling among vendors and practitioners focuses on the timetable for implementation, not on the practices themselves.

If you've gotten the impression from this summary that Congress and the Administration have bypassed all the political bickering around health care reform bills and implemented it under the guise of a financial recovery, I'd say you're right. HITECH doesn't directly address the health care industry or other controversial issues such as how to pay doctors. And those issues still need to be resolved. But HITECH does try to reform the health care system around better practices.

So the HIMSS conference is taking place at the height of a suspenseful moment in U.S. health care history. The Administration has released proposed final rules, but they're in the middle of a 60-day comment period. Meanwhile, working from the drafts that have been released all along, vendors are feverishly bringing their tools into conformance and claiming (how could they not?) that the tools will be ready soon for adoption.

The pressures extend to other players all throughout the health care industry. A certification body called the Certification Commission for Health Information Technology (CCHIT) is design certifications for vendors' systems as well as the hospitals and individual providers who adopt them, matching the meaningful use criteria "no more and no less," as said today by the outgoing CCHIT chair, Dr. Mark Leavitt. The pressure will then be on the providers--and here is where I'm seeing the most resistance.

Suspicion and silos

Given that many clinicians never adopted electronic systems, and others who did regretted doing so, we don't have to be surprised to hear that some don't think it will work or don't believe that they can make the change.

It would seem that heath care IT is hot right now. HIMSS is sprawled across three buildings in the Georgia World Congress Center (whose address completely confuses Google Local, by the way). Getting from one place to another between sessions means forcing my way through hundreds and hundreds of attendees. To walk from one corner of one show floor to the other would take several minutes. I picked up half a dozen magazines on health care IT.

But even here--among people who paid to attend a health care IT conference--dissent can be felt. There's a lot of anger at electronic systems and their vendors, complaints ranging from high costs and inflexible templates to user interfaces that slow down busy staff and problems with data exchange. A few observers claim that HIMSS and CCHIT are just vendor-controlled consortia who want to milk providers and walk away with government money. I must say, though, that debate here ranges across many points of view. I was impressed to see an HIMSS book on patient-centered records, which--if made the basis of health care--would produce a bigger revolution than anything discussed so far. (I'll explain why in another blog this week.)

Although attendees want to make the move to electronic records, many talk about other people who won't. And a typical session on clinical decision support was devoted, not to ways of using electronic medical systems, but to persuading the attendees that electronic medical systems would be worthwhile.

The timetable does seem like a forced march. Even though the systems that meet the meaningful use criteria are still under development--and so is the certification--providers will be rewarded as early as 2011 for installing them and using them heavily. (The records have to be used for 80% of some practices in order to get the money.) And each year that providers wait before meeting the meaningful use criteria, they get less money. A stick also accompanies the carrot; providers that accept Medicare and Medicaid will actually be penalized if they don't demonstrate meaningful use.

Furthermore, implementing the rules will require the hiring of more IT staff and telling clinical staff to take time to serve on committees. These considerations contributed to a declaration by a consortium of Chief Medical Officers that the timetable was too aggressive. Some Congressmen have recently made the same request, so you know someone has been talking to them.

But many others in the field--including the vendors, confident in their ability to deliver, and some hospital managers in the forefront of implementing electronic records--urge the government to stay the course. Their attitude is that the need is great (because health care costs are rising so precipitously), the schedule is demanding but still feasible, and "if not now, when?"

I stated earlier that HITECH was more about behavior and workflow than technology. The push to implement the meaningful use rules brings this to the fore. Old silos between IT, doctors, and other staff won't work; neither will silos between doctors, billing, labs, and other departments.

Providers trying to achieve meaningful use must talk to their staff: not just doctors, but also nurses, technicians, and anyone else who touches a record. They have to examine their workflows and be willing to admit when they don't conform to health care standards or are inefficient. They have to make some of the same mistakes offices and factories made when they computerized in the 1980s, and learn from those mistakes. To some extent, implementing an electronic system is a bottom-up activity.

That's why I say that a Silicon Valley approach to risk-taking is even more important for this field than a Silicon Valley approach to technology. I'm not so concerned with the famous Silicon Valley tolerance for failure. Health care is not a social network, and failure there has serious consequences. I'm more interested in a Silicon Valley willingness to cross organizational boundaries and to encourage people's opinions on things where other people are the recognized experts.

Overall, I don't think the money offered by HITECH will really drive the decision to change. I think providers will move as they hear of others running awesome applications to make life easier--and save lives. I've stopped using the phrase "killer app" in the health care field for obvious reasons, but the standards, protocols, and storage mechanisms won't have much impact until applications follow.

So I'll end with a nod toward a company for which I have a fond spot because it happens to be the first health care company I talked about in a blog, over six seven years ago. Vocera is still going strong, providing mobile devices with health care applications to over 600 sites. Medical staff can issue orders, call for help, or scan medications using these tiny clip-on devices.

Although Vocera doesn't work on the immensely popular iPhone, it does have partnerships with the makers of several other handhelds, including the Blackberry and one from Motorola. It collects scads of statistics about things such as the number of contacts made and the success of speech recognition software, to help sites judge its effectiveness. I find it an example of the kind of product that will drive electronic medical systems, because it will please not only a Chief Medical Officer, regulator, or insurance claim processor, but someone doing clinical work on the floor of the hospital.

NoSQL conference coming to Boston

Andy Oram @praxagora 2010-02-24

On March 11 Boston will join several other cities who have host conferences on the movement broadly known as NoSQL. Cassandra, CouchDB, HBase, HypergraphDB, Hypertable, Memcached, MongoDB, Neo4j, Riak, SimpleDB, Voldemort, and probably other projects as well will be represented at the one-day affair.

It's generally understood that characterizing a movement by what it's not is awkward, and it's hard to find an elevator speech to encompass all the topics of NoSQL Boston. Are these tools for "big data" problems? Usually, but sometimes even small web sites can find them useful. Are the tools meant for processing streams such as log files? Sometimes, but they can be useful for other text and data processing as well. And do they reject relational principles? Well, so you'd think--but different ones reject different principles, so even there it's hard to find commonality. (I compared them to relational databases in a blog last year.

The interviews I had with various projects leaders for this article turned up a recurring usage pattern for NoSQL. I was seeking particular domains or types of data where the tools would be useful, but couldn't see much commonality. What connects the users is that they carry out web-related data crunching, searching, and other Web 2.0 related work. I think these companies use NoSQL tools because they're the companies who understand leading-edge technologies and are willing to take risks in those areas. As the field gets better known, usage will spread.

I had a talk last week with conference organizer Eliot Horowitz, who is the founder and CTO of 10gen, the company that makes MongoDB. He let me know that the conference plans to bypass the head-scratching and launch into practical applications. The day will contain a coding session and a schema design session along with keynotes.

The resilience of open source

One question that intrigues me is why all the offerings in the NoSQL area are open source. Some have commercial add-ons, but the core technology is provided as free software. The few proprietary products and services in the market (such as Citrusleaf) get far less attention. Reasons seem to include:

  • The market is currently too small. Just as most computing innovations start off in research settings, this one is being explored by people looking for solutions to their own problems, more than ways to extract a profit. Numerous in-house projects exist in this space that are not free software (Google's Map/Reduce and BigTable, for instance, and Amazon's SimpleDB and Dynamo) but they aren't commercialized either.
  • Experimentation is moving too fast. Most of the projects are just a couple years old, and are rapidly adding features.
  • The ROI is hard to calculate. Horowitz says, "People won't pay for anything they don't really understand yet." (Nevertheless, 10gen and other companies are commercializing the open source offerings.)
  • Whatever problem an organization is trying to solve, each NoSQL offering tends to be piece of the solution. It has to be tuned for and integrated into the organization's architecture, and combined with pieces from other places.

The projects in this conference therefore demonstrate the innovative power of free software. CouchDB and Cassandra are particularly interesting in this regard because they are community efforts more than corporate efforts. Both are Apache top-level projects. (Cassandra was just moved from the incubator to a top-level project on February 17.) CouchDB committer J. Chris Anderson tells me that the Apache community process ensures a wide range of voices are heard, leading to (of course) occasional public wrangling but a superior outcome.

The BBC and (according to Anderson) SXSW are among the users of CouchDB, CouchDB has been integrated into Ubuntu, Mozilla Messaging is basing Raindrop (their next-generation messaging platform) on CouchDB, and even mobile handset manufacturers are looking at it. (O'Reilly Media also uses CouchDB.)

I also talked to Alan Hoffman of Cloudant, which offers a CouchDB cloud service that fills in some of the gaps left by bare CouchDB (consistent hashing, partitioning, quorum, etc.). Although a couple companies offer commercial support, no single company takes responsibility for CouchDB. Its community is highly distributed. Anderson listed 10 Apache committers working for 8 different companies, and nearly 40 other people who contribute patches. Support takes place on mailing lists (roughly one thousand messages a month) and IRC channels.

Jonathan Ellis, project chair of Cassandra, calls it an "open source success story" because it went from a state of near petrification to vibrant regrowth through open sourcing. Facebook invented it and brought it to a state where it satisfied their needs. They made it open in and moved it into the Apache Incubator in 2008 but declared that they would not be doing further development. It could easily have receded into obscurity.

Ellis says that he was hired at Rackspace and asked to find a distributed data store that was fast and scaled easily; he decided on Cassandra. Soon after he became a public and enthusiastic advocate, Digg and Twitter joined Rackspace as users and developers. Having multiple QA teams test each release--particularly in very different environments--helps quality immensely. Ellis find that Eric Raymond's "many eyes" characterization of open source bug fixing applies.

Although Cassandra is found mostly as a backing store for web sites with a lot of users, Ellis thinks it would meet the needs of many academic and commercial sites, and looks forward to someone offering a cloud service based on it.

Justin Sheehy, CTO of Basho, maker of the Riak data store, told me they can confirm the typical advantages cited for open source. Developers at potential customer sites can try out the software without going through a bureaucratic procurement process, and then become internal advocates who function much more effectively than outside salespeople.

He also says that companies such as Basho offer the best of both worlds to tentative customers. The backing of a corporation means that professional services and added tools are available to go along with the product those customers buy. But because the source is open and has a community around it, those customers can feel secure that development and support will continue regardless of the fate of the originating company. 10gen, of course, plays a similar role for MongoDB and Anderson's company Couchio offers support for CouchDB. For projects that are not closely associated with the backing of one company, the Apache Foundation's sponsorship helps to ensure continuity.

What are the fault lines in the NoSQL landscape?

Naturally, the projects I've mentioned in this blog borrow ideas from each other and show tiny variations on common solutions regarding such things as B-tree storage, replication, solutions to locality of reference, etc. Experience will eventually lead to a shake-out and a convergence among surviving projects. In the meanwhile, how can you get your head around them?

We'll pause here for a word from our sponsors, letting you know that O'Reilly has published books on CouchDB and Hadoop and is developing one about MongoDB.

Horowitz offers an initial subdivision of projects based on data model (document, key-value, or tabular), a theme he explored in another interview.

Roger Magoulas, a research director with O'Reilly, further subdivides projects into those that crunch large data sets in a batch manner--such as Hadoop--and those that retrieve views of data to fulfill visitor search requests on web pages or similar tasks. He goes on to say that you can compare them on the basis of particular features, such as automatic replication, auto-sharding or partitioning, and in-memory caches.

The most comprehensive attempts I've seen to make sense of this gangly crew of projects from a feature standpoint come in a blog by Ellis and one by blog by Vineet Gupta. (Gupta's blog is labeled "Part 1" and I'd love to see more parts.) But Sheehy says the various features of the offerings interact too strongly and have too many subtle variations to fit into an easy taxonomy. "Many people try to classify the projects, everyone does it differently, and nobody gets it quite right."

Community features

So who uses these things? To take Horowitz's MongoDB again as an example, many web sites gravitate toward it because the document structure makes some things--adding fields to rows, mapping objects to fields--easier than a relational database does. A few scientific sites also use MongoDB.

Riak also has a large following among web sites and startups, but their customers also include media companies, ad networks, SMS gateways, analytics firms, and many other types of organizations.

Magoulas finds that an organization's bent is determined by the background and expertise of its developers. Programmers with lots of traditional relational database experience tend to be wary of the recent upstarts, a position reinforced by legacy investments in tools that depend on their relational database and are sometimes very expensive.

On the other hand, web programmers look for tools that conform more closely to the data structures and programming techniques they're used to, and can actually be "flummoxed" by relational database logic or abstraction layers on top of the databases. These programmers may think it intuitive to do the kinds of filtering and sorting that seem like reinventing the wheel to a traditional RDMBS programmer. Anderson likes to quote Jacob Kaplan-Moss, the creator of Django, as saying, "Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP."

10gen's consultation with MongoDB users includes asking for votes on new features. They also see a great deal of code contributions in the driver layer and adapters (sessions, logging, etc.) but not much in the core. Sheehy said the same is true of Riak: although contributions to the core are rare, half the client libraries are developed by outsiders, and many of the tools.

Rapid change is part of life for NoSQL developers. Anderson says of CouchDB, "The ancillary APIs have been evolving rapidly in preparation for our 1.0 release, which should come out in the next few months and won't differ much from today's trunk. The new APIs include authentication, authorization, details of Map/Reduce, and functions for transforming and serving JSON documents as other datatypes such as HTML or CSV." Horowitz stressed that MongoDB will roll out a lot of new features over the upcoming year.

One hundred people have signed up for NoSQL Boston so far, and more than 150 are expected. I'll be there to take it in and try to reduce it to some high-level insights for this blog.

Innovation Lessons in "Start-Up Nation"

Andy Oram @praxagora 2010-02-15

One might expect Start-Up Nation: The Story of Israel's Economic Miracle to come from the pen of business school or economics professors, but the biographies of authors Dan Senor and Saul Singer reveal policy backgrounds. Both were advisors in the U.S. Federal Government.

These backgrounds give a clue that Senor and Singer aim beyond questions of how to be a successful entrepreneur or high-tech executive. In fact, their book is a serious investigation of the social, historical, and psychological traits that produce extraordinarily creative people--and significantly, creative people who can translate their cranial light-bulbs into technologies with the potential to change the world.

The book has garnered a fair amount of news coverage, but still not as much as it deserves, in my opinion. It took me only about three hours to read, and I highly recommend it as a refreshing--but not necessarily reassuring--perspective on a country that is profoundly misunderstood and misrepresented by media outside its diminutive borders.

In this blog I'll summarize the traits that that the authors find make Israel a successful incubator for innovation, distinguishing between traits that other countries can emulate and traits that seem uniquely embedded in Israel's historical and geographic circumstances. Finally, as I usually do in these book reviews, I'll lay out three observations that came to my mind while following the authors' argument: the importance of hard data, flipping axioms, and the creative role government can play.

Israel's stunning performance in world markets is beyond argument. The country lists more companies on NASDAQ than China, India, Japan, South Korea, Singapore, and all the countries of Europe--combined! Venture capital flows into the country. It's often noted that technically trained Israelis emigrate, but less often reported that many of them return and spectacularly exploit their international connections. And because investment is based on technical progress instead of clever financial book-keeping, Israel has weathered the current recession with some decline in revenues but not a single bank failure.

Start-Up Nation claims to reveal the reasons behind the success of this country that not so long ago was considered a victim of quasi-socialist stagnation and crippling costs related to war and violence. Some of the authors' observations have been made before--such as the importance of forcing young people to take serious responsibility in army service, and the value of accumulating highly trained immigrants from Russia and other places--but these observations have previously been oversimplified and leave out what Senor and Singer consider crucial inputs. Start-Up Nation not only tries to restore the proper balance and show traits from a more productive perspective, but integrates them into a comprehensive picture of a churning economy.

Traits that other countries can emulate

Although Israel has special advantages, some of the elements to which Senor and Singer trace its innovativeness can theoretically be achieved elsewhere. Briefly, these are:

  • A loyalty to the entire community that goes beyond personal success. The authors point out that, for all of Israelis' notorious fractiousness, they expend enormous effort helping total strangers. All of Israel is a single team, even a single family. (Obviously, this family feeling does not extend to non-Jews.) Israeli entrepreneurs who give talks abroad often play up the strengths of their country as well as their company.
  • A sense of dissatisfaction. To innovate, one must be convinced that things are not good enough the way they are now. For Israelis, this drive for change has both Biblical and more recent historical roots, but technology provides a new arena rewarding hopes for improvement.
  • A Do-It-Yourself approach to technology, which perhaps is one manifestation of the afore-mentioned innate dissatisfaction. The authors report that equipment purchased by the army is always being tinkered with. The same interest in taking things apart and jerry-rigging them extends throughout the culture.
  • A culture of challenging authority. The authors point out that this is a deep cultural value (and like many before them, trace it partly to the Jewish intellectual tradition), one that is particularly hard to foster in countries with controlling regimes.
  • A determination to succeed against all odds. Countries that get complacent and rest on their laurels--as most observers think North Americans are doing--eventually lose their privileged places. The authors highlight fascinating stories of Israelis keeping up production in the face of war, and of cheerfully taking on seemingly impossible challenges.
  • Interdisciplinary agility. Israelis tend to learn many skills--partly to survive in the armed forces--and to form companies closely linking people with different areas of expertise. In an age where many challenges require mashups between disciplines, this imparts a strong advantage.
  • A tolerance for failure. Like the Silicon Valley, Israel is a place where someone can start a company, manage it through bankruptcy, and then pick up to start another company. A single failure, the authors say, gives the entrepreneur a high chance of succeeding at the next venture. Even in the military, people are rewarded for tackling problems with creative intelligence--not so much for the ultimate success or failure of the attempt.
  • Providing young people with arenas to exert responsibility. In Israel, of course, this arena is its unusually unhierarchical armed forces (and people who don't do army service, such as Arabs and the ultra-orthodox, miss out on critical experiences). But other countries could find other ways to challenge youth in situations where taking charge is a must and where results really matter.
  • A fruitful mentoring relationship between venture capitalists and new entrepreneurs. Injecting money into new ventures (as so many countries do) is not enough; the managers must be guided through the shoals of financial, technical, and human resource challenges. Israel set up a unique program called Yozma in 1993 to bring together all the necessary elements.
  • Government policies friendly to startups. Israel has a decidedly mixed history here. Even after making a historic turn away from government control and toward a free market, its environment is most helpful to computer and high-tech companies. There are certainly innovations in many other areas--notably agriculture--but the authors say these fields encounter hampering regulations.
  • A truly open-arms approach to immigrants, who bring not only fresh perspectives but a high tolerance for risk. Once again, of course, Israel's liberal attitude toward immigrants applies only to Jews (and a lot of haggling goes on around deciding who qualifies). Even for Jews, it can take a long time to assimilate waves of newcomers and turn them into productive employees. But countries that don't make it easy to set down roots suffer economically. Short-term foreign workers never form the sustainable innovative institutions that can be planted by truly committed immigrants.

Traits unique to Israel's history and geography

Israel also possesses unique traits that the Senor and Singer draw in to explain its entrepreneurial success. No other culture has undergone thousands of years of persecution culminating in genocide--nor should that be a necessary price. But it's amazing how Israel has turned its liabilities into assets, and these lessons can be inspirational for people suffering in other parts of the world.

  • The country's regional isolation (boycotted by all its neighbors) and its small size has forced it to think internationally from the start. Israelis are also avid travelers, gathering up experiences that allow them to think creatively.
  • Demographic diversity is another trait that has many negative repercussions, but that the Israelis have also mined for its strengths. New ideas always seem less odd in a culture where one has to get along with different types of people.
  • A long intellectual tradition encourages a love of learning that extends to worldly accomplishments (although I have to point out that the same tradition contains many rabbis who railed against the study of mundane things); just as important, it encourages students to examine and re-examine every idea from new points of view--effectively making them ask constantly what would result if something obvious were not true. I will explore this intellectual attitude in the section Flipping axioms.
  • The tight-knit society leads to a situation where "everybody knows everybody" and one can easily find the right mix of staff for a new company. (I wonder whether this cliquishness extends far beyond the economically dominant ethnic group--the Ashkenazi Jews, who trace their ancestry to Northern and Western Europe.)
  • The constant existential threat posed by attacks and bombings call for hair-trigger creativity. It's worth noting here that the authors present Israel's response to war and terror in a consistently heroic light (along with recognition that they sometimes screw up). But the authors note that every engagement is followed by an assessment to determine what could have been done better.
  • The small size of the population (further reduced, as I noted earlier, by deferrals for the religious and for Arabs) forces each army recruit to take on responsibility right away.

The importance of hard data

At this point I will delve into some more subtle threads that I see running through the book.

The first is the importance of accumulating evidence to build a case for change. Two incidents from the book illustrate this nicely:

  • Israel's Intel facility, which invented the historic 8088 (and would later invent the Core 2 Duo) faced its biggest uphill battle when it approached its parent company to propose the principle behind the Centrino. Senor and Singer note that Israeli staff flew into Intel headquarters insistently to talk about the advantages of the new design--and seemed to deal with management resistance by increasing their own pressure. But it's important to note that they accumulated data to prove the superiority of the design. Mere persistence would probably not have carried the day.
  • An Israeli company called Fraud Sciences pitched a system for identifying untrustworthy customers and vendors to a manager at eBay. The manager was adamantly unwilling to believe that this tiny group of developers could do a better job than all of eBay's experts, and the founder of Fraud Sciences was not a good salesman. Luckily, eBay had a low-risk, low-cost way to test the start-up: eBay would give information about one hundred thousand customers to Fraud Sciences and compare their analyses to eBay's analyses as well as the subsequent history of those customers. Fraud Sciences passed the test with flying colors and ended up being bought by eBay. There was no way even the most determined cynic could argue with the facts.

We now see data-crunching brought right into the development process, as with a process called "predictive drug development" practiced by the Israeli company Compugen.

Flipping axioms

I mentioned before the oddly subversive style of the Jewish intellectual tradition. I call it flipping an axiom: find the key trait that holds back change--a trait that no one has challenged before because it seems unassailably true--and ask "What would happen if it were false?" This is a kind of non-Euclidean geometry of thought. Two examples of this thinking at work in Israel include:

  • The Centrino project at Intel, mentioned in the previous section, required overturning a tautology accepted throughout a whole industry and customer base. The problem they were facing was that increases in chip speed led to increases in heat dissipation. Laptops with advanced processors could not be designed because the build-up of heat would destroy them.

    So the Israeli team asked: what if we sped up processing without increasing clock speed? To give an indication what a radical thought this was, consider that U.S. Intel management rejected the idea because they had build their entire marketing strategy around clock speed. Clock speed had become the universal measure for computer performance. The Israelis threw out the assumption, found a solution, and sold it to management--and the world.

  • Start-Up Nation begins appropriately enough with the well-known innovator Shai Agassi, who wants to create a network of battery replacement stations to support an infrastructure of electric cars. The basic elements of the plan had all been thought up by auto manufacturers already and rejected as unfeasible. But Agassi found a promising way to cut through the difficulties (although it has not had a chance to be tested yet in everyday use).

    The key to Agassi's thinking, as presented by the authors, was to assume that cars had to stop using fossil fuels. The world is running out of them. Agassi rejected other options such as hydrogen and hybrids, so he decided cars must be electric around the same time manufacturers decided they must use gasoline (although other options such as ethanol are also being explored). Taking electric cars as an axiom, Agassi found all the other propositions to make cars work.

The creative role government can play

I'll end by exploring the positive role of government in Israel. Entrepreneurial government is just as important as entrepreneurial business--but it must do different things. In this regard the Israeli experience matches up somewhat with the positive role Nandan Nilekani proposed in Imagining India, which I reviewed a few months ago.

In the early years, the Labor government kept a firm hand on the economy. The authors suggest that this worked well in the Israel of the 1950s and 1960s, an underdeveloped country needing obvious infrastructure investments. When the country reached a higher level calling for more complex economic relationships, the socialist bent became dysfunctional. (Many economists have found similar trajectories in the Soviet Union and Maoist China.) It was time to retract, so to speak, to permit more creation.

The authors enthusiastically endorse the Chicago School shock treatment carried out by the right-wing Likud government that took over in the 1970s. They don't mention the troublesome income gap created by this turn to the markets, but it does seem to have unleashed the innovative capitalism that has made Israelis relatively affluent for their part of the world, and an attractive location to work for Jews as well as non-Jews from elsewhere.

The government doesn't make direct investments, but to some extent it chooses winners--hooking them up to external investors--through the Yozma program mentioned earlier.

Like all governments, Israel promotes R&D through the military. But it provides on particularly interesting kind of government incubator: Talpiot, an intensive multi-disciplinary training program for promising young recruits.

Talpiot looks over the entire eligible high school population and aggressively encourages applications from the people it believes will make the most creative problem-solvers. It puts them through basic training as well as a technical academic program, and assigns them quickly to real-life problems. The training goes on for forty-one months, and the recruits are required to stay in the military for six more years.

Most Talpiot graduates leave the armed forces when their required six years are up, and this is seen as a strike against the program by its critics. But the authors point out that these highly trained, highly practical experts go on to apply their skills in private industry, so the government has effectively provided an elite corps for the country.

Israel certainly faces problems that technology and organizational savvy cannot solve. But its political, economic, and cultural successes have made it a model for other developing countries--none more than the Arab nations, who are trying to emulate it in many ways. Start-Up Nation is surprisingly broad and deep, for such a short book. I'm persuaded that it has unraveled many of the mysteries behind Israel's business success. And it does its best, in a very uncertain world, to suggest how both Israel and other countries could replicate that success.

One hundred eighty degrees of freedom: signs of how open platforms are spreading

Andy Oram @praxagora 2010-02-05

I was talking recently with Bob Frankston, who has a distinguished history in computing that goes back to work on Multics, VisiCalc, and Lotus Notes. We were discussing some of the dreams of the Internet visionaries, such as total decentralization (no mobile-system walls, no DNS) and bandwidth too cheap to meter. While these seem impossibly far off, I realized that computing and networking have come a long way already, making things normal that not too far in the past would have seemed utopian.

Flat-rate long distance calls
I remember waiting past my bedtime to make long-distance calls, and getting down to business real quick to avoid high charges. Conventional carriers were forced to flat-rate pricing by competition from VoIP (which I'll return to later in the blog). International calls are still overpriced, but with penny-per-minute cards available in any convenience store, I don't imagine any consumers are paying those high prices.
Mobile phone app stores
Not that long ago, the few phones that offered Internet access did so as a novelty. Hardly anybody seriously considered downloading an application to their phones--what are you asking for, spam and fraudulent charges? So the iPhone and Android stores teaming with third-party apps are a 180-degree turn for the mobile field. I attribute the iPhone app store once again to competition: the uncovering of the iPhone SDK by a free software community.
Downloadable TV segments
While the studios strike deals with Internet providers, send out take-down notices by the ream, and calculate how to derive revenue from television-on-demand, people are already getting the most popular segments from Oprah Winfrey or Saturday Night Live whenever they want, wherever they want.
Good-enough generic devices
People no longer look down on cheap, generic tools and devices. Both in software and in hardware, people are realizing that in the long run they can do more with simple, flexible, interchangeable parts than with complex and closed offerings. There will probably always be a market for exquisitely designed premium products--the success of Apple proves that--but the leading edge goes to products that are just "good enough," and the DIY movement especially ensures a growing market for building blocks of that quality.

I won't even start to summarize Frankston's own writings, which start with premises so far from what the Internet is like today that you won't be able to make complete sense of any one article on its own. I'd recommend the mind-blowing Sidewalks: Paying by the Stroll if you want to venture into his world.

But I'll mention one sign of Frankston's optimism: he reminded me that in the early 1990s, technologists were agonizing over arcane quality-of-service systems in the hope of permitting VoIP over ordinary phone connections. Now we take VoIP for granted and are heading toward ubiquitous video. Why? Two things happened in parallel: the technologists figured out much more efficient encodings, and normal demand led to faster transmission technologies even over copper. We didn't need QoS and all the noxious control and overhead it entails. More generally, it's impossible to determine where progress will come from or how fast it can happen.

Innovation Battles Investment as FCC Road Show Returns to Cambridge

Andy Oram @praxagora 2010-01-14

Opponents can shed their rhetoric and reveal new depths to their thought when you bring them together for rapid-fire exchanges, sometimes with their faces literally inches away from each other. That made it worth my while to truck down to the MIT Media Lab for yesterday's Workshop on Innovation, Investment and the Open Internet, sponsored by the Federal Communications Commission. In this article I'll cover:

Context and background

The FCC kicked off its country-wide hearing campaign almost two years ago with a meeting at Harvard Law School, which quickly went wild. I covered the experience in one article and the unstated agendas in another. With a star cast and an introduction by the head of the House's Subcommittee on Telecommunications and the Internet, Ed Markey, the meeting took on such a cachet that the public flocked to the lecture hall, only to find it filled because Comcast recruited people off the street to pack the seats and keep network neutrality proponents from attending. (They had an overflow room instead.)

I therefore took pains to arrive at the Media Lab's Bartos Theater early yesterday, but found it unnecessary. Even though Tim Berners-Lee spoke, along with well-known experts across the industry, only 175 people turned up, in my estimation (I'm not an expert at counting crowds). I also noticed that the meeting wasn't worth a mention today in the Boston Globe.

Perhaps it was the calamitous earthquake yesterday in Haiti, or the bad economy, or the failure of the Copenhagan summit to solve the worst crisis ever facing humanity, or concern over three wars the US is involved in (if you count Yemen), or just fatigue, but it seems that not as many people are concerned with network neutrality as two years ago. I recognized several people in the audience yesterday and surmised that the FCC could have picked out a dozen people at random from their seats, instead of the parade of national experts on the panel, and still have led a pretty darned good discussion.

And network neutrality is definitely the greased pig everyone is sliding around. There are hundreds of things one could discuss in the context of innovation and investment, but various political forces ranging from large companies (AT&T versus Google) to highly visible political campaigners (Huffington Post) have made network neutrality the agenda. The FCC gave several of the movement's leaders rein to speak, but perhaps signaled its direction by sending Meredith Attwell Baker as the commissioner in attendance.

In contrast to FCC chair Julius Genachowski, who publicly calls for network neutrality (a position also taken by Barack Obama during his presidential campaign), Baker has traditionally espoused a free-market stance. She opened the talks yesterday by announcing that she is "unconvinced there is a problem" and posing the question: "Is it broken?" I'll provide my own opinion later in this article.

Two kinds of investment

Investment is the handmaiden, if not the inseminator, of innovation. Despite a few spectacular successes, like the invention of Linux and Apache, most new ideas require funding. Even Linux and Apache are represented now by foundations backed now by huge companies.

So why did I title this article "Innovation Battles Investment"? Because investment happens at every level of the Internet, from the cables and cell towers up to the applications you load on your cell phone.

Here I'll pause to highlight an incredible paradigm shift that was visible at this meeting--a shift so conclusive that no one mentioned it. Are you old enough to remember the tussle between "voice" and "data" on telephone lines? Remember the predictions that data would grow in importance at the expense of voice (meaning Plain Old Telephone Service) and the milestones celebrated in the trade press when data pulled ahead of voice?

Well, at the hearing yesterday, the term "Internet" was used to cover the whole communications infrastructure, including wires and cell phone service. This is a mental breakthrough all it's own, and one I'll call the Triumph of the Singularity.

But different levels of infrastructure benefit from different incentives. I found that all the participants danced around this. Innovation and investment at the infrastructure level got short shrift from the network neutrality advocates, whether in the bedtime story version delivered by Barbara van Schewick or the deliberately intimidating, breakneck overview by economist Shane Greenstein, who defined openness as "transparency and consistency to facilitate communication between different partners in an independent value chain."

You can explore his papers on your own, but I took this to mean, more or less, that everybody sharing a platform should broadcast their intentions and appraise everybody else of their plans, so that others can make the most rational decisions and invest wisely. Greenstein realized, of course that firms have little incentive to share their strategies. He said that communication was "costly," which I take as a reference not to an expenditure of money but to a surrender of control and relinquishing of opportunities.

This is just what the cable and phone companies are not going to do. Dot-com innovator Jeffrey Glueck, founder of Skyfire, would like the FCC to require ISPs to give application providers and users at least 60 to 90 days notice before making any changes to how they treat traffic. This is absurd in an environment where bad actors require responses within a few seconds and the victory goes to the router administrators with the most creative coping strategy. Sometimes network users just have to trust their administrators to do the best thing for them. Network neutrality becomes a political and ethical issue when administrators don't. But I'll return to this point later.

The pocket protector crowd versus the bean counters

If the network neutrality advocates could be accused of trying to emasculate the providers, advocates for network provider prerogative were guilty of taking the "Trust us" doctrine too far. For me, the best part of yesterday's panel was how it revealed the deep gap that still exists between those with an engineering point of view and those with a traditional business point of view.

The engineers, led by Internet designer David Clark, repeated the mantra of user control of quality of service, the vehicle for this being the QoS field added to the IP packet header. Van Schewick postulated a situation where a user increases the QoS on one session because they're interviewing for a job over the Internet, then reduces the QoS to chat with a friend.

In the rosy world envisioned by the engineers, we would deal not with the physical reality of a shared network with our neighbors, all converging into a backhaul running from our ISP to its peers, but with the logical mechanism of a limited, dedicated bandwidth pipe (former senator Ted Stevens can enjoy his revenge) that we would spend our time tweaking. One moment we're increasing the allocation for file transfer so we can upload a spreadsheet to our work site; the next moment we're privileging the port we use for an MPMG.

The practicality of such a network service is open to question. Glueck pointed out that users are unlikely ever to ask for lower quality of service (although this is precisely the model that Internet experts have converged on, as I report in my 2002 article A Nice Way to Get Network Quality of Service?). He recommends simple tiers of service--already in effect at many providers--so that someone who wants to carry out a lot of P2P file transfers or high-definition video conferencing can just pay for it.

In contrast, network providers want all the control. Much was made during the panel of a remark by Marcus Weldon of Alcatel-Lucent in support of letting the providers shape traffic. His pointed out that video teleconferencing over the fantastically popular Skype delivered unappealing results over today's best-effort Internet delivery, and suggested a scenario where the provider gives the user a dialog box where the user could increase the QoS for Skype in order to enjoy the video experience.

Others on the panel legitimately flagged this comment as a classic illustration of the problem with providers' traffic shaping: the provider would negotiate with a few popular services such as Skype (which boasts tens of millions of users online whenever you log in) and leave innovative young services to fend for themselves in a best-effort environment.

But the providers can't see doing quality of service any other way. Their business model has always been predicated on designing services around known costs, risks, and opportunities. Before they roll out a service, they need to justify its long-term prospects and reserve control over it for further tweaking. If the pocket protector crowd in Internet standards could present their vision to the providers in a way that showed them the benefits they'd accrue from openness (presumably by creating a bigger pie), we might have progress. But the providers fear, above else, being reduced to a commodity. I'll pick up this theme in the next section.

Is network competition over?

Law professor Christopher S. Yoo is probably the most often heard (not at this panel, unfortunately, where he was given only a few minutes) of academics in favor of network provider prerogatives. He suggested that competition was changing, and therefore requiring a different approach to providers' funding models, from the Internet we knew in the 1990s. Emerging markets (where growth comes mostly from signing up new customers) differ from saturated markets (where growth comes mainly from wooing away your competitors' customers). With 70% of households using cable or fiber broadband offerings, he suggested the U.S. market was getting saturated, or mature.

Well, only if you accept that current providers' policies will stifle growth. What looks like saturation to an academic in the U.S. telecom field looks like a state of primitive underinvestment to people who enjoy lightning-speed service in other developed nations.

But Yoo's assertion makes us pause for a moment to consider the implications of a mature network. When change becomes predictable and slow, and an infrastructure is a public good--as I think everyone would agree the Internet is--it becomes a candidate for government takeover. Indeed, there have been calls for various forms of government control of our network infrastructure. In some places this is actually happening, as cities and towns create their own networks. A related proposal is to rigidly separate the physical infrastructure from the services, barring companies that provide the physical infrastructure from offering services (and therefore presumably relegating them to a maintenance role--a company in that position wouldn't have much incentive to take on literally ground-breaking new projects).

Such government interventions are politically inconceivable in the United States. Furthermore, experience in other developed nations with more successful networks shows that it is unnecessary.

No one can doubt that we need a massive investment in new infrastructure if we want to use the Internet as flexibly and powerfully as our trading partners. But there was disagreement yesterday about how much of an effort the investment will take, and where it will come from.

Yoo argued that a mature market requires investment to come from operating expenditures (i.e., charging users more money, which presumably is justified by discriminating against some traffic in order to offer enhanced services at a premium) instead of capital expenditures. But Clark believes that current operating expenditures would permit adequate growth. He anticipated a rise in Internet access charges of $20 a month, which could fund the added bandwidth we need to reach the Internet speeds of advanced countries. In exchange for paying that extra $20 per month, we would enjoy all the content we want without paying cable TV fees.

The current understanding by providers is that usage is rising "exponentially" (whatever that means--they don't say what the exponent is) whereas charges are rising slowly. Following some charts from Alcatel-Lucent's Weldon that showed profits disappearing entirely in a couple years--a victim of the squeeze between rising usage and slow income growth--Van Schewick challenged him, arguing that providers can enjoy lower bandwidth costs to the tune of 30% per year. But Weldon pointed out that the only costs going down are equipment, and claimed that after a large initial drop caused by any disruptive new technology, costs of equipment decrease only 10% per year.

Everyone agreed that mobile, the most exciting and innovation-supporting market, is expensive to provide and suffering an investment crisis. It is also the least open part of the Internet and the part most dependent on legacy pricing (high voice and SMS charges), deviating from the Triumph of the Singularity.

So the Internet is like health care in the U.S.: in worse shape than it appears. We have to do something to address rising usage--investment in new infrastructure as well as new applications--just as we have to lower health care costs that have surpassed 17% of the gross domestic product.

Weldon's vision--a rosy one in its own way, complementing the user-friendly pipe I presented earlier from the engineers--is that providers remain free to control the speeds of different Internet streams and strike deals with anyone they want. He presented provider prerogatives as simple extensions of what already happens now, where large companies create private networks where they can impose QoS on their users, and major web sites contract with content delivery networks such as Akamai (represented at yesterday's panel by lawyer Aaron Abola) to host their content for faster response time. Susie Kim Riley of Camiant testified that European providers are offering differentiated services already, and making money by doing so.

What Weldon and Riley left out is what I documented in A Nice Way to Get Network Quality of Service? Managed networks providing QoS are not the Internet. Attempts to provide QoS over the Internet--by getting different providers to cooperate in privileging certain traffic--have floundered. The technical problems may be surmountable, but no one has figured out how to build trust and to design adequate payment models that would motivate providers to cooperate.

It's possible, as Weldon asserts, that providers allowed to manage their networks would invest in infrastructure that would ultimately improve the experience for all sites--those delivered over the Internet by best-effort methods as well as those striking deals. But the change would still represent increased privatization of the public Internet. It would create what application developers such as Glueck and Nabeel Hyatt of Conduit Labs fear most: a thousand different networks with different rules that have to be negotiated with individually. And new risks and costs would be placed in the way of the disruptive innovators we've enjoyed on the Internet.

Competition, not network neutrality, is actually the key issue facing the FCC, and it was central to their Internet discussions in the years following the 1996 Telecom Act. For the first five years or so, the FCC took seriously a commitment to support new entrants by such strategies as requiring incumbent companies to allow interconnection. Then, especially under Michael Powell, the FCC did an about-face.

The question posed during this period was: what leads to greater investment and growth--letting a few big incumbents enter each other's markets, or promoting a horde of new, small entrants? It's pretty clear that in the short term, the former is more effective because the incumbents have resources to throw at the problem, but that in the long term, the latter is required in order to find new solutions and fix problems by working around them in creative ways.

Yet the FCC took the former route, starting in the early 2000s. They explicitly made a deal with incumbents: build more infrastructure, and we'll relax competition rules so you don't have to share it with other companies.

Starting a telecom firm is hard, so it's not clear that pursuing the other route would have saved us from the impasse we're in today. But a lack of competition is integral to our problems--including the one being fought out in the field of "network neutrality."

All the network neutrality advocates I've talked to wish that we had more competition at the infrastructure level, because then we could rely on competition to discipline providers instead of trying to regulate such discipline. I covered this dilemma in a 2006 article, Network Neutrality and an Internet with Vision. But somehow, this kind of competition is now off the FCC agenda. Even in the mobile space, they offer spectrum though auctions that permit the huge incumbents to gather up the best bands. These incumbents then sit on spectrum without doing anything, a strategy known as "foreclosure" (because it forecloses competitors from doing something useful with it).

Because everybody goes off in his own direction, the situation pits two groups against each other that should be cooperating: small ISPs and proponents of an open Internet.

What to regulate

Amy Tykeson, CEO of a small Oregon Internet provider named BendBroadband, forcefully presented the view of an independent provider, similar to the more familiar imprecations by Brett Glass of Lariat. In their world--characterized by paper-thin margins, precarious deals with back-end providers, and the constant pressure to provide superb customer service--flexible traffic management is critical and network neutrality is viewed as a straitjacket.

I agree that many advocates of network neutrality have oversimplified the workings of the Internet and downplayed the day-to-day requirements of administrators. In contrast, as I have shown, large network providers have overstepped their boundaries. But to end this article on a positive note (you see, I'm trying) I'll report that the lively exchange did produce some common ground and a glimmer of hope for resolving the differing positions.

First, in an exchange between Berners-Lee and van Schewick on the pro-regulatory side and Riley on the anti-regulatory side, a more nuanced view of non-discrimination and quality of service emerged. Everybody on panel offered vociferous exclamations in support of the position that it was unfair discrimination for a network provider to prevent a user from getting legal content or to promote one web site over a competing web site. And this is a major achievement, because those are precisely the practices that providers liked AT&T and Verizon claim the right to do--the practices that spawned the current network neutrality controversy.

To complement this consensus, the network neutrality folks approved the concept of quality of service, so long as it was used to improve the user experience instead of to let network providers pick winners. In a context where some network neutrality advocates have made QoS a dirty word, I see progress.

This raises the question of what is regulation. The traffic shaping policies and business deals proposed by AT&T and Verizon are a form of regulation. They claim the same privilege that large corporations--we could look at health care again--have repeatedly tried to claim when they invoke the "free market": the right of corporations to impose their own regulations.

Berners-Lee and others would like the government to step in and issue regulations that suppress the corporate regulations. A wide range of wording has been proposed for the FCC's consideration. Commissioner Baker asked whether, given the international reach of the Internet, the FCC should regulate at all. Van Schewick quite properly responded that the abuses carried out by providers are at the local level and therefore can be controlled by the government.

Two traits of a market are key to innovation, and came up over and over yesterday among dot-com founders and funders (represented by Ajay Agarwal of Bain Capital) alike: a level playing field, and light-handed regulation.

Sometimes, as Berners-Lee pointed out, government regulation is required to level the playing field. The transparency and consistency cited by Greenstein and others are key features of the level playing field. And as I pointed out, a vacuum in government regulation is often filled by even more onerous regulation by large corporations.

One of the most intriguing suggestions of the day came from Clark, who elliptically suggested that the FCC provide "facilitation, not regulation." I take this to mean the kind of process that Comcast and BitTorrent went through, of which Sally Shipman Wentworth of ISOC boasted about in her opening remarks. Working with the IETF (which she said created two new working groups to deal with the problem), Comcast and BitTorrent worked out a protocol that should reduce the load of P2P file sharing on networks and end up being a win-win for everybody.

But there are several ways to interpret this history. To free market ideologues, the Comcast/BitTorrent collaboration shows that private actors on the Internet can exploit its infinite extendibility to find their own solutions without government meddling. Free market proponents also call on anti-competition laws to hold back abuses. But those calling for parental controls would claim that Comcast wanted nothing to do with BitTorrent and started to work on technical solutions only after getting tired of the feces being thrown its way by outsiders, including the FCC.

And in any case--as panelists pointed out--the IETF has no enforcement power. The presence of a superior protocol doesn't guarantee that developers and users will adopt it, or that network providers will allow traffic that could be a threat to their business models.

The FCC at Harvard, which I mentioned at the beginning of this article, promised intervention in the market to preserve Internet freedom. What we got after that (as I predicted) was a slap on Comcast's wrist and no clear sense of direction. The continued involvement of the FCC--including these public forums, which I find educational--show, along with the appointment of the more interventionist Genachowski and the mandate to promote broadband in the American Recovery and Reinvestment Act, that it can't step away from the questions of competition and investment.

Pew Research asks questions about the Internet in 2020

Andy Oram @praxagora 2010-01-07

pewinternet-lg.jpgPew Research, which seems to be interested in just about everything, conducts a "future of the Internet" survey every few years in which they throw outrageously open-ended and provocative questions at a chosen collection of observers in the areas of technology and society. Pew makes participation fun by finding questions so pointed that they make you choke a bit. You start by wondering, "Could I actually answer that?" and then think, "Hey, the whole concept is so absurd that I could say anything without repercussions!" So I participated in their and did it again this week. The Pew report will aggregate the yes/no responses from the people they asked to participate, but I took the exercise as a chance to hammer home my own choices of issues.

(If you'd like to take the survey, you can currently visit http://www.facebook.com/l/c6596;survey.confirmit.com/wix2/p1075078513.aspx and enter PIN 2000.)

Will Google make us stupid?

This first question is not about a technical or policy issue on the Internet or even how people use the Internet, but a purported risk to human intelligence and methods of inquiry. Usually, questions about how technology affect our learning or practice really concern our values and how we choose technologies, not the technology itself. And that's the basis on which I address such questions. I am not saying technology is neutral, but that it is created, adopted, and developed over time in a dialog with people's desires.

I respect the questions posed by Nicholas Carr in his Atlantic article--although it's hard to take such worries seriously when he suggests that even the typewriter could impoverish writing--and would like to allay his concerns. The question is all about people's choices. If we value introspection as a road to insight, if we believe that long experience with issues contributes to good judgment on those issues, if we (in short) want knowledge that search engines don't give us, we'll maintain our depth of thinking and Google will only enhance it.

There is a trend, of course, toward instant analysis and knee-jerk responses to events that degrades a lot of writing and discussion. We can't blame search engines for that. The urge to scoop our contacts intersects with the starvation of funds for investigative journalism to reduce the value of the reports we receive about things that are important for us. Google is not responsible for that either (unless you blame it for draining advertising revenue from newspapers and magazines, which I don't). In any case, social and business trends like these are the immediate influences on our ability to process information, and searching has nothing to do with them.

What search engines do is provide more information, which we can use either to become dilettantes (Carr's worry) or to bolster our knowledge around the edges and do fact-checking while we rely mostly on information we've gained in more robust ways for our core analyses. Google frees the time we used to spend pulling together the last 10% of facts we need to complete our research. I read Carr's article when The Atlantic first published it, but I used a web search to pull it back up and review it before writing this response. Google is my friend.

Will we live in the cloud or the desktop?

Our computer usage will certainly move more and more to an environment of small devices (probably in our hands rather than on our desks) communicating with large data sets and applications in the cloud. This dual trend, bifurcating our computer resources between the tiny and the truly gargantuan, have many consequences that other people have explored in depth: privacy concerns, the risk that application providers will gather enough data to preclude competition, the consequent slowdown in innovation that could result, questions about data quality, worries about services becoming unavailable (like Twitter's fail whale, which I saw as recently as this morning), and more.

One worry I have is that netbooks, tablets, and cell phones will become so dominant that meaty desktop systems will rise in the cost till they are within the reach only of institutions and professionals. That will discourage innovation by the wider populace and reduce us to software consumers. Innovation has benefited a great deal from the ability of ordinary computer users to bulk up their computers with a lot of software and interact with it at high speeds using high quality keyboards and large monitors. That kind of grassroots innovation may go away along with the systems that provide those generous resources.

So I suggest that cloud application providers recognize the value of grassroots innovation--following Eric von Hippel's findings--and solicit changes in their services from their visitors. Make their code open source--but even more than that, set up test environments where visitors can hack on the code without having to download much software. Then anyone with a comfortable keyboard can become part of the development team.

We'll know that software services are on a firm foundation for future success when each one offers a "Develop and share your plugin here" link.

Will social relations get better?

Like the question about Google, this one is more about our choices than our technology. I don't worry about people losing touch with friends and family. I think we'll continue to honor the human needs that have been hard-wired into us over the millions of years of evolution. I do think technologies ranging from email to social networks can help us make new friends and collaborate over long distances.

I do worry, though, that social norms aren't keeping up with technology. For instance, it's hard to turn down a "friend" request on a social network, particularly from someone you know, and even harder to "unfriend" someone. We've got to learn that these things are OK to do. And we have to be able to partition our groups of contacts as we do in real life (work, church, etc.). More sophisticated social networks will probably evolve to reflect our real relationships more closely, but people have to take the lead and refuse to let technical options determine how they conduct their relationships.

Will the state of reading and writing be improved?

Our idea of writing changes over time. The Middle Ages left us lots of horribly written documents. The few people who learned to read and write often learned their Latin (or other language for writing) rather minimally. It took a long time for academies to impose canonical rules for rhetoric on the population. I doubt that a cover letter and resume from Shakespeare would meet the writing standards of a human resources department; he lived in an age before standardization and followed his ear more than rules.

So I can't talk about "improving" reading and writing without addressing the question of norms. I'll write a bit about formalities and then about the more important question of whether we'll be able to communicate with each other (and enjoy what we read).

In many cultures, writing and speech have diverged so greatly that they're almost separate languages. And English in Jamaica is very different from English in the US, although I imagine Jamaicans try hard to speak and write in US style when they're communicating with us. In other words, people do recognize norms, but usage depends on the context.

Increasingly, nowadays, the context for writing is a very short form utterance, with constant interaction. I worry that people will lose the ability to state a thesis in unambiguous terms and a clear logical progression. But because they'll be in instantaneous contact with their audience, they can restate their ideas as needed until ambiguities are cleared up and their reasoning is unveiled. And they'll be learning from others along with way. Making an elegant and persuasive initial statement won't be so important because that statement will be only the first step of many.

Let's admit that dialog is emerging as our generation's way to develop and share knowledge. The notion driving Ibsen's Hedda Gabler--that an independent philosopher such as Ejlert Løvborg could write a masterpiece that would in itself change the world--is passé. A modern Løvborg would release his insights in a series of blogs to which others would make thoughtful replies. If this eviscerated Løvborg's originality and prevented him from reaching the heights of inspiration--well, that would be Løvborg's fault for giving in to pressure from more conventional thinkers.

If the Romantic ideal of the solitary genius is fading, what model for information exchange do we have? Check Plato's Symposium. Thinkers were expected to engage with each other (and to have fun while doing so). Socrates denigrated reading, because one could not interrogate the author. To him, dialog was more fertile and more conducive to truth.

The ancient Jewish scholars also preferred debate to reading. They certainly had some received texts, but the vast majority of their teachings were generated through conversation and were not written down at all until the scholars realized they had to in order to avoid losing them.

So as far as formal writing goes, I do believe we'll lose the subtle inflections and wordplay that come from a widespread knowledge of formal rules. I don't know how many people nowadays can appreciate all the ways Dickens sculpted language, for instance, but I think there will be fewer in the future than there were when Dickens rolled out his novels.

But let's not get stuck on the aesthetics of any one period. Dickens drew on a writing style that was popular in his day. In the next century, Toni Morrison, John Updike, and Vladimir Nabokov wrote in a much less formal manner, but each is considered a beautiful stylist in his or her own way. Human inventiveness is infinite and language is a core skill in which we we all take pleasure, so we'll find new ways to play with language that are appropriate to our age.

I believe there will always remain standards for grammar and expression that will prove valuable in certain contexts, and people who take the trouble to learn and practice those standards. As an editor, I encounter lots of authors with wonderful insights and delightful turns of phrase, but with deficits in vocabulary, grammar, and other skills and resources that would enable them to write better. I work with these authors to bring them up to industry-recognized standards.

Will those in GenY share as much information about themselves as they age?

I really can't offer anything but baseless speculation in answer to this question, but my guess is that people will continue to share as much as they do now. After all, once they've put so much about themselves up on their sites, what good would it do to stop? In for a penny, in for a pound.

Social norms will evolve to accept more candor. After all, Ronald Reagan got elected President despite having gone through a divorce, and Bill Clinton got elected despite having smoked marijuana. Society's expectations evolve.

Will our relationship to key institutions change?

I'm sure the survey designers picked this question knowing that its breadth makes it hard to answer, but in consequence it's something of a joy to explore.

The widespread sharing of information and ideas will definitely change the relative power relationships of institutions and the masses, but they could move in two very different directions.

In one scenario offered by many commentators, the ease of whistleblowing and of promulgating news about institutions will combine with the ability of individuals to associate over social networking to create movements for change that hold institutions more accountable and make them more responsive to the public.

In the other scenario, large institutions exploit high-speed communications and large data stores to enforce even greater centralized control, and use surveillance to crush opposition.

I don't know which way things will go. Experts continually urge governments and businesses to open up and accept public input, and those institutions resist doing so despite all the benefits. So I have to admit that in this area I tend toward pessimism.

Will online anonymity still be prevalent?

Yes, I believe people have many reasons to participate in groups and look for information without revealing who they are. Luckily, most new systems (such as U.S. government forums) are evolving in ways that build in privacy and anonymity. Businesses are more eager to attach our online behavior to our identities for marketing purposes, but perhaps we can find a compromise where someone can maintain a pseudonym associated with marketing information but not have it attached to his or her person.

Unfortunately, most people don't appreciate the dangers of being identified. But those who do can take steps to be anonymous or pseudonymous. As for state repression, there is something of an escalating war between individuals doing illegal things and institutions who want to uncover those individuals. So far, anonymity seems to be holding on, thanks to a lot of effort by those who care.

Will the Semantic Web have an impact?

As organizations and news sites put more and more information online, they're learning the value of organizing and cross-linking information. I think the Semantic Web is taking off in a small way on site after site: a better breakdown of terms on one medical site, a taxonomy on a Drupal-powered blog, etc.

But Berners-Lee had a much grander vision of the Semantic Web than better information retrieval on individual sites. He's gunning for content providers and Web designers the world around to pull together and provide easy navigation from one site to another, despite wide differences in their contributors, topics, styles, and viewpoints.

This may happen someday, just as artificial intelligence is looking more feasible than it was ten years ago, but the chasm between the present and the future is enormous. To make the big vision work, we'll all have to use the same (or overlapping) ontologies, with standards for extending and varying the ontologies. We'll need to disambiguate things like webbed feet from the World Wide Web. I'm sure tools to help us do this will get smarter, but they need to get a whole lot smarter.

Even with tools and protocols in place, it will be hard to get billions of web sites to join the project. Here the cloud may be of help. If Google can perform the statistical analysis and create the relevant links, I don't have to do it on my own site. But I bet results would be much better if I had input.

Are the next takeoff technologies evident now?

Yes, I don't believe there's much doubt about the technologies that companies will commercialize and make widespread over the next five years. Many people have listed these technologies: more powerful mobile devices, ever-cheaper netbooks, virtualization and cloud computing, reputation systems for social networking and group collaboration, sensors and other small systems reporting limited amounts of information, do-it-yourself embedded systems, robots, sophisticated algorithms for slurping up data and performing statistical analysis, visualization tools to report the results of that analysis, affective technologies, personalized and location-aware services, excellent facial and voice recognition, electronic paper, anomaly-based security monitoring, self-healing systems--that's a reasonable list to get started with.

Beyond five years, everything is wide open. One thing I'd like to see is a really good visual programming language, or something along those lines that is more closely matched to human strengths than our current languages. An easy high-level programming language would immensely increase productivity, reduce errors (and security flaws), and bring in more people to create a better Internet.

Will the internet still be dominated by the end-to-end principle?

I'll pick up here on the paragraph in my answer about takeoff technologies. The end-to-end principle is central to the Internet I think everybody would like to change some things about the current essential Internet protocols, but they don't agree what those things should be. So I have no expectation of a top-to-bottom redesign of the Internet at any point in our viewfinder. Furthermore, the inertia created by millions of systems running current protocols would be hard to overcome. So the end-to-end principle is enshrined for the foreseeable future.

Mobile firms and ISPs may put up barriers, but anyone in an area of modern technology who tries to shut the spigot on outside contributions eventually becomes last year's big splash. So unless there's a coordinated assault by central institutions like governments, the inertia of current systems will combine with the momentum of innovation and public demand for new services to keep chokepoints from being serious problems.

(Typo fixed in previous paragraph--see amusing first comment below from Details Matter.)

The fate of WIPO, ACTA, and other intellectual property pushes in the international economy

Andy Oram @praxagora 2010-01-06

Intellectual property wars are fiercer than ever, although the institutions most affected (including the media) prefer not to talk about them. But we may be in for a pendulum shift.

I recently put out a tweet on this topic and was asked to expand on it. The issues are too big and complex for me to give them a proper treatment here, but I'll throw around a few of them and see whether you think the trend I'm talking about shakes out.

Intellectual property has had an international component for a long time (the Berne convention on copyright being the best known). A lot of the international work has been centralized in the World Intellectual Property Organization, a branch of the United Nations. But certain issues are the responsibility of other organizations, such as ICANN's rulings on domain names and trademarks. All these organizations are stringently insulated from the public (see for instance my most recent post on ICANN).

One issue that's currently the buzz of the Internet NGOs is an Anti-Counterfeiting Trade Agreement that would expand the powers of copyright holders, as well as courts and government agencies representing them. You won't find a web page about ACTA at WIPO, or at the US Copyright Office, or at other institutions hammering out the deal, but a casual web search will turn up enough of a diversity of commentary to assure you it's really happening.

Up to now, economic trends in IP were pretty straightforward. Countries with a long history of development tended to export the products of IP, such as machine designs and cultural works, to developing nations. The developed nations always wanted strong control over IP to ensure a flow of revenue from the less developed nations, who in turn would resist.

You could see this dynamic in the nineteenth century when novels by Dickens were widely reprinted in the US without royalty payments, and when the founder of the Industrial Revolution copied English looms in order to start textile factories in Lowell, Massachusetts. Now we have battles over the licensing of Monsanto seeds, concerns over exports of machinery to China, and generic anti-AIDS medications being developed in India.

And that's why the juggarnaut of IP may be slowing. The idea behind this blog was suggested in passing by economist Adam S. Posen in an article titled "Who Will Sustain Globalization" for the November 2009 issue of Current History.

As I understand the argument, the institutions responsible for passing new rules respond to the most powerful countries. The US and Europe are on the decline in these organizations. All the countries that benefit from looser IP regimes--China, India, Brazil--are growing in economic strength and are finding themselves in more and more seats at the tables of the world's closed economic institutions. For just one concrete example, look at the shift of responsibility in recent years from the G-7 to the G-20. The G-7 is a familiar set of countries that were powerful from the 1950s through the 1970s. The G-20 is truly diverse, bringing in strong economies from around the world (but still just the ones with some international economic clout).

Of course, what's good for large companies that can spread their works internationally and enforce their IP is bad for innovators elsewhere. I am totally in favor of rewarding inventors, including large established firms, for the time, effort, and expertise they have put into their inventions. But as always, in IP, rewards for past work must be balanced against the promotion of further development. And right now, the world is moving more and more to crowdsourcing. The best ideas will increasingly come from people around the world pooling their ideas--including people with few resources and no connections to major institutions. Those institutions had better learn this lesson before they succeed in choking off inventions that make a life-or-death difference to people in developing countries.

Being online: Conclusion--identity narratives

Andy Oram @praxagora 2009-12-30

An honest tale speeds best being plainly told.

(This is the final post in a series called "Being online: identity, anonymity, and all things in between.")

After viewing in rotation the various facets of that gem that we call identity, it is time for us to polish and view them in one piece. This series has explored what identity means in an online medium, the most salient aspect of which is the digitization of information. Consider what the word digitization denotes: the fragmentation of a whole into infinitesimal, fungible, individually uncommunicative pieces. The computer digitizes everything we post about ourselves not only literally (by storing information in computer-readable formats) but metaphorically, as the computer scatters our information into a meaningless diaspora of data fields, status updates, snapshots, and moments caught on camera or in audio--as Shakespeare might say, signifying nothing.

No computer--only a person--can reassemble and breath life into these dry bones, creating from them a narrative.

Anthony Giddens, whom I quoted earlier in the section on selves, says that constructing a narrative for oneself is an obligatory part of feeling one has an identity. Giddens does not seem to take the Internet on in his writings. But it's a reasonable stretch to say that we build up narratives online, and others do so for us, through the digitized, disembodied (or to use Giddens's term, disembedded) bits of information posted over time.

In place of the term narrative, some psychologists, who would probably love to do an intake interview on Hamlet, refer to the self as being established through a soliloquy. However you look at identity formation, taking it online extends its reach tremendously. The soliloquies we engage in, and the narratives we create for ourselves, reshape our memories and determine our futures. But these self-interrogations that used to take place in our craniums while we lay in bed at night now happen in full view of the world.

College development staff and others who search for information on us are building up narratives haphazardly based on available data. On blogs and social networks, however, we quite literally provide them with the narrative. Perhaps that's why those media became popular so quickly, and why so many people urge their friends to follow them: social media take some of the anarchy out of our presentation of self.

The next step to gain more control over searches about yourself or your business may be emotionally formidable as well as time-consuming: when someone comments about you on any searchable forum, answer him. The answer can be on the same forum as the original comment or on some site more under your control, such as your blog--use whatever setting is appropriate for what you have to say. You can then only hope that your reply is picked up and treated as important by the search engines.

One indication of Shakespeare's genius was the parallel, distinct narratives he managed to create in Hamlet--or as Goffman might put it, his ability to develop two sophisticated frames that are totally at odds throughout the play. Similar stylistic devices have been worked into thriller moves, spy novels, and thousands of other settings since then.

Everyone except Hamlet himself (and a few sympathetic colleagues) created a narrative as uncompromising as it was terrifying. Hamlet was seen as irrational, brooding, provocative, ungrateful, impulsively amoral, cruel, dangerously violent, and totally out of control.

Only we, the audience, see Hamlet the way he saw himself: brilliant, sensitive, almost telepathically alert, courageous, unambiguously righteous, gifted with a hidden power, blessed by a divine mission--in short, a hero.

Upon all my readers I wish narratives unlike Hamlet's. I hope you never feel the need to construct for yourself a narrative, online or offline, as desperate as the ones he constructed. At the same time, I hope that other people de-digitizing a narrative from your online signals do not see you as Polonius or Laertes saw Hamlet.

But we have to accept that we are constrained in life by how others see us, that many will formulate opinions from the digital trail we are all building just by living in the modern world, and that we can't control how others see this trail. There are just a few things we can do to improve our prospects for surviving and thriving online.

We can assess the economic value of what we reveal: what we are allowing others to do by revealing something, and what we may get back of value. And like economists, we have to think long-term as well as short-term, because the data we reveal is up there forever.

We can also develop tolerance for others, learning not to judge them because we don't know the back story to what we see online, as I have recommended in an earlier article.

Finally, we should accept that we can't bring other people's image of us into conformity with what we feel is our true identity. But at least we can resist bringing our identity into conformity with their image.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics
  4. Your identity to advertisers: it's not all about you
  5. What you say about yourself, or selves
  6. Forged identities and non-identities
  7. Group identities and social network identities
  8. Conclusion: identity narratives (this post)

Being online: Group identities and social network identities

Andy Oram @praxagora 2009-12-28

So may a thousand actions, once afoot,
End in one purpose, and be all well borne
Without defeat.

(This is the seventh post in a series called "Being online: identity, anonymity, and all things in between.")

Despite all the variations played on the theme of personal identities in the previous sections, remember that identity is a group construct, not an individual one. If we never took part in groups, our personal identities would scarcely matter.

We're all members of certain groups without our choice: the particular race, social class, or gender that other people assign us to. When a woman posts a seductive picture online, she is helping to shape the way men and other women view womanhood in general. The same goes when she posts a demonstration of herself expertly fixing a computer or operating a super-collider. And the image every member of a racial minority puts up of himself or his cohorts, like it or not, determines the way all members of that race are judged.

It seems an invariant of human culture to exploit the image of an individual in order to leave an impression about the entire group to which he or she belongs. It has been done by the arts and mass media ever since they were invented, but the Internet gives millions of ordinary people the chance to inflect the process. This diffusion of influence was recognized by Time Magazine in 2006 when it designated "you" as its Person of the Year.

Going by Goffman's extremely broad definition of "framing"--any assumption or shared knowledge that lies behind a visible act is part of the frame--identity might be the most important frame of all, and the locus around which other frames revolve. Thus, my identity as an English-speaker and US native frames the starting point of this series from the perspective of a world technological and cultural center.

Others, though, may come to the Internet with an identity impaired by its very use. For instance, they may have to sacrifice their languages, or at least the character sets they traditionally use, in order to communicate online in a cost-effective way.

As Lisa Nakamura points out in her book Digitizing Race: Visual Cultures of the Internet (University of Minnesota Press, 2008), individuals can expand or criticize conventional images of women, Asians, Muslims, and others by reusing images and mashing them up in challenging ways. Nakamura even suggests that the typical slicing and recombination of digital images reflects the way people create their identities from fragments of older traditions, which in turn have been shattered by the economics and culture of modern global change.

Technology also groups us. Are we the first to jump on a new medium such as Voice over IP or Google Wave? Just as--to cite Giddens--we express identity through lifestyle choices such as vegetarianism or living in a downtown apartment instead of a house in the suburbs, we express identity through the devices we buy and the Internet services we use. And other people make assumptions about our identity based on these things.

Let's turn now to groups at a more intimate level. Every online forum has the potential to be a small community--and even a small government, with rules backed up by unique punishments--where people train each other to carry out their identities in various ways.

Groups must be explicit and conscious of group identity. Online media rarely provide chances for the equivalent of sitting at a bar with grizzled veterans and hearing their stories. That is why groups often post rules (check out Wikipedia's, which are complicated enough to call for an entire wiki of their own) to deal with churn and the lack of opportunities to pass on norms informally.

This article began with the hope of understanding the current state of the art in online group formation: social networking. The reason social networking sites hold promise is that they augment the individual, an echo of Douglas Engelbart's goal to augment personal achievement through the invention of the mouse and multimedia networking in the 1960s. In a 2004 article (PDF), anthropologist danah michele boyd made the observation--or perhaps just reported a subject's observation--that these networks try to represent each person's identity as the set of connections he or she has. At Friendster, at least (where people look up each others' friends for potential dates), the networks of friends become the main show. The same criticism could be made of LinkedIn, where the chief goal is career-building rather than dating.

Perhaps adding relationships to our definition of identity can humanize the concept, as suggested by Cynthia Kurtz. I explained the importance of sharing information with "friends of friends" in a comment added to an earlier section of this article. But when viewed in the worst light, Friendster and LinkedIn cheapen your identity to the connections you can offer other people.

Just as rudimentary digital cameras--especially when embedded in mobile devices--have confirmed the old notion that a picture is worth a thousand words, the connecting power of social networks will be multiplied a thousandfold if facial recognition improves to the point where it can automatically disseminate information about where we were and whom we met. If automated crawling tools could identify faces in millions of photos taken at parties, conferences, banquets, and even public places, and then combine the information to determine who knows whom, the amount of information that would become publicly available about our habits and associations would be staggering.

For instance, imagine if the recently announced service for photo recognition, Google Goggles, evolves to the point where it can match faces against faces in other photos. And then imagine that Google provides Goggles as an API for use with social networks where people tag photos with names. A single tag by a cousin on your photo at a party could lead to your being associated with everybody else in all other photos of you posted online. These developments, while not imminent, are plausible in the light of past advances in the technologies.

Social networks create a new personal information economy. We already have such an economy in real-life's customer reward cards: we give up valuable information about our long-term purchasing habits in exchange for discounts. Some business experts suggest a similar explicit arrangement for the Internet. Regulations would prohibit the retention of information unrelated to a sale, but allow retailers to offer discounts in exchange for the right to retain certain types of information. This would make privacy a class issue, because the affluent would be most likely to forgo the bribe and withhold their information. And because the affluent are the biggest spenders, businesses are unlikely to find it worth their while to support this compromise.

Everyone on social networks is engaging in the new personal information economy. We choose to post our favorite movies in order to meet fans and learn about new movies we'd like. And we reveal the colleges we attended so we can meet potential business partners from those institutions. We even post jokes and casual observations to earn people's admiration. While we're all having fun, every nugget we release is subjected first, consciously or unconsciously, to a key question: will we get some benefit from the social network commensurate with the value of the information we are about to give our contacts?

This view of social network as economy provides a partial answer to the questions posed at the very start of this series:

Should we post our age and marital status? Should we make our profile private or public? Should we reveal that we're gay?...

The answer is that each of us is responsible for assessing the value of posting at every moment, taking into consideration the tone of the network, how many people are watching our postings, what they can offer us, and more.

The economy extends to sending nude photographs of yourself to current or would-be lovers. A recent report from the Pew Research Center says no less: "Sexually suggestive images sent to the privacy of the phone have become a form of relationship currency." Exhibitionists don't seem to realize that their photos are likely to travel far beyond the person to whom they're entrusted--a bitter truth that, once admitted, would certainly alter the senders' economic calculations.

While filtering our contributions to the network, we also filter those who are entitled to receive them--and here the economy is out of balance. Rampant are the complaints about receiving connection requests you don't want from old boyfriends or the guys who smoked dope with you in high school. Social networking urgently needs to establish a culture in which it's OK to say that you're filtering your connections. (A couple years ago I rejected a connection and got a death threat in return. Looking at the person's profile, I determined that it was a joke--but I still think twice about visiting the city where he lives.)

Although connections on social networks are definitive, no one asks about the identity of the social network itself (except shareholders hoping to increase its popularity and critics trying to change its policies). But some online communities head in a very different direction. Law professor Beth Simone Noveck, in an essay titled A democracy of groups, points out that self-organized groups can mold their own unique identities in order to effect collective action.

Noveck's optimism regarding self-organizing groups led to the current experiments with online democracy pursued by the Obama administration, where Noveck was appointed to both the transitional team and a Deputy CTO position to start implementation of the Open Government initiative that Obama released on his first day in office.

In Noveck's theory, a group's effectiveness depends on each member's success is gelling his or her individual identity. "Through visual and graphical representation, this new technology enables people to see themselves and others and to perceive the role they have assumed. Appearing as a defined person--whether by name or in an embodied avatar--makes it easier to sense oneself as part of a group and, arguably, will facilitate the inculcation of the social norms at the heart of a group's culture."

These are intriguing claims, but it's odd that Noveck does not consider the ability to import external markers of identity into the group space, or to check members' assertions of identity against these external markers. For instance, what if visitors to Second Life could receive a token from her law school (through the OAuth protocol, say) that validates her as a professor?

One way to tie individuals more tightly together in online groups, as explained in her article, is to make online forums feel more like real-world places so that people can develop "forms of attachment" to the forums in ways that they feel emotionally attached to their town square, college, or other local "great good place" (to borrow the name of a popular book by Ray Oldenburg). As Noveck writes, "The new generation of technology is reintroducing the concept of space and place online." As an example she cites Second Life, which was growing rapidly in popularity at the time. Effectively, she is granting groups identities, just like individuals, and recommending that a group foment stronger ties among its members by creating a stronger group identity.

No one in the Obama administration has picked up the most aggressive suggestion in A democracy of groups, that the law recognize groups as entities--"new forms of collective legal personhood"--in a similar manner to how it now recognizes corporations. But Vermont has taken a step in that direction by changing its laws to allow virtual corporations, and ultimately we may be dealing with group identity online as much as with individual identity.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics
  4. Your identity to advertisers: it's not all about you
  5. What you say about yourself, or selves
  6. Forged identities and non-identities
  7. Group identities and social network identities (this post)
  8. Conclusion: identity narratives (to be posted December 30)

Being online: Forged identities and non-identities

Andy Oram @praxagora 2009-12-26

Haply you shall not see me more; or if, a mangled shadow.

(This post is the sixth in a series called "Being online: identity, anonymity, and all things in between.")

One reason Sherry Turkle saw the Internet through the prism of invented identity--or, perhaps, found the aspects of Internet life that corroborated her own interests as a psychologist with a fondness for postmodernism--was her choice to seek out initial contacts from serious players of 1970s multi-user dungeons. These environments were fantasy lands, entirely concerned with forged identities; indeed, it would be well-nigh impossible to create an identity in those environments that was the least bit realistic.

All the old MUDs survive, and have been joined by even more popular ones such as World of Warcraft, along with more general fantasy environments such as Second Life and IMVU. But they no longer set the tone for Internet participation. The momentum has gone to social networks such as MySpace, Facebook, and Orkut, where people are asked to bring their external life online in as genuine a fashion as possible. Disclosure rather than concealment is widely recognized now as the trend, such as heard in the conversations of leading Internet watchers at the 2008 Aspen Institute Roundtable on Information Technology.

One can see why modern commerce would prefer social networking to MUDs, because people discussing music, clothes, movies, and sports are much easier to sell things to than orcs and medieval monks. Current social network sites depend on their funding--if they have weaned themselves to any degree from venture capital--through advertising. Ironically, though, they create the kinds of empowered, self-organized communities that can find and disseminate product information on their own and therefore render advertising increasingly redundant.

MySpace has taken advertising to the next stage and become a platform whose members are the ads. Pop musicians don't need billboards and radio spots any more; MySpace is their promotion.

But a few social network visitors still find fantasy more rewarding than the presentation of their real selves. Obvious examples include people who create dummy accounts so as to laud their own organizations or writings and rate them up.

Unlike World of Warcraft players, these forged identities move through a landscape of overwhelmingly real denizens who assume that the forged identities are real. The result can unfortunately be deadlier than the most aggressive World of Warcraft encounter.

Middle-aged men posing as teenagers to snare girls are one real danger. The opportunity to post anonymously or pseudonymously accounts facilitates cyberbullying and threats, while terrorists are reported to do their recruiting on social networks as well. But perhaps the saddest story of forgery is the suicide of Megan Meier.

Meier was a 13-year-old girl with a history of depression--in fact, she had made suicide attempts before--and apparent difficulties fitting in at school. One of her female peers, along with two older female confederates--one of them being her 49-year-old mother, who one would have expected to have more sense--created a MySpace account purporting to be a boy named Josh. This forged Josh befriended Meier in 2006, drew close to her in a relationship that could serve in many online communities as a romantic encounter, then abruptly terminated contact--with nasty language eerily echoing the taunts Meier had repeatedly experienced from schoolmates.

Megan Meier, like a modern Ophelia, took the rejection to heart and killed herself. The plot was uncovered and the mother arrested. But now an odd legal twist intervened: no law could be found to apply. Before her case was dismissed on appeal, she was handed a misdemeanor conviction under the Computer Fraud and Abuse Act. The prosecution hung from a thread, however, because the district attorney was reduced to arguing that her "fraud" consisted of violating a routinely ignored clause in the MySpace terms of service that prohibited misrepresenting oneself. (At the time this post is written, their terms of service require that "all registration information you submit is truthful and accurate.")

Had the original ruling been upheld, it would have instantly criminalized thousands of people, including my adult daughter, who created a Facebook account for the stuffed animal she has held on to since childhood. (Like many other people, she defies the research of psychoanalyst Donald Winnicott, who called the child's stuffed animal a "transitional object.")

So it is still legal to masquerade online. But while plenty of people stretch the truth, few go so far as to create an entire persona from whole cloth. Turkle points out that the strain of keeping up appearances is too great. I have reflected the difficulty of lying online by using the term "forged" for such identities--forged not just in the sense that they're fake, but in the sense that creating one recalls the intense exertion of beating a metal artifact out on the anvil.

One person who found the effort worthwhile was amateur economist Park Dae-Sung of South Korea. As profiled in the Washington Post and WIRED, Park frequented popular web forums for financial discussions and tossed his opinions into the stew with hundreds of other casual posters. There is nothing unusual about this (my own brother likes to go on such forums), but Park distinguished himself in two ways: he predicted some of the global financial disasters that hit in late 2008, and he was so authoritative that he gave the impression he was some macher high up in government or finance.

The South Korean government was embarrassed by his accurate criticisms of the finance industry's greed and of the government's own policies. Apparently, however, some of his postings were also incorrect. Once they uncovered his identity, the police found an excuse to arrest him "on charges of spreading false data in public with a harmful intent." He was acquitted, but South Korea still appears to be a place where it's dangerous to be anonymous.

I pointed out in an earlier post in this series that logging in to a coffee shop network effectively renders one anonymous, and that some countries prohibit such logins in an anti-crime posture. Most of what the governments are fighting is the unauthorized exchange of copyrighted music and movies, although they like to claim that they're also trying to prevent violent criminals and terrorists from hiding their tracks.

One would expect restrictions on anonymity in countries that have a history of suppressing free speech or political activity. But one of the strongest controls on identity was set recently by France in a law that combats illegal file-sharing by actually forcing repeat offenders offline. The British government has recently proposed a similar bill. Clearly, to enforce a ban on Internet use, France and Britain must also prevent anonymous logins.

Once you're online, you can hide your activities with a degree of effort. In a country that monitors its residents' visits to web sites, you can run software that connects you him to another computer--probably located in a more tolerant country--to request a web page and have it tunneled back through the proxy computer.

In the 1990s, a very popular anonymous remailer was run out of Finland under the name anon.penet.fi. Anyone could send email through it; the server would assign a random email address and send it on to the requested destination. Return email would be matched up with the real email address of the original sender and delivered in the other direction.

anon.penet.fi was heavily used by critics of the Church of Scientology to post secret church documents to public news groups. The church finally resorted to Finnish law to force the server's administrator to reveal the email address of one of these posters, and the administrator decided to shut down the server because he could no longer guarantee the anonymity it promised. In a significant historical premonition, the pretext used by the church to squelch the exchange of information was copyright infringement, the same claim that drives most of the current laws and court cases forcing ISPs to reveal their users' identities.

A more formal and sophisticated version of this proxying is provided by onion networks, which route traffic from one randomly chosen computer to another in a series, and send the replies back through the same path. To establish that the two endpoints actually exchanged traffic would be impossible later, unless an investigator could trace every link between every pair of neighbors. (There are also forms of snooping based on timing the traffic leaving and arriving at different systems, so some onion networks go so far as to insert random delays to make these attacks harder.)

The use of a proxy exposes that proxy to prosecution instead of the person it is protecting, but some proxies operate out of jurisdictions where they can advertise their services without fear, and onion networks tend to be tolerated because even law enforcement and military organizations find them useful for their own purposes. The US Navy, for instance, actively supports the development of onion networks.

As people ask for help online, and respond to that help, as with InnoCentive and Amazon Mechanical Turk, actors with possibly objectionable motives can outsource work without revealing their identities and goals. Law professor Jonathan Zittrain, in a video lecture, listed both actual and potential abuses of crowdsourcers' good will.

The principle behind this form of identity hiding is that many crowdsourcing sites are proxies, and therefore play the the role of proxies in masking the identities of those who post tasks to perform. The Internet has often been hailed as a disintermediator--allowing vendors and buyers to communicate directly, for instance--but as sites aggregate tasks, the Internet can also re-intermediate the clients who offer tasks, and hide their identities along the way.

In short, the Internet is not yet MySpace. With some exceptions, you can strut out onto the Internet as anyone you want to be, and duck under the radar of those who want to delve into your real identity.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics
  4. Your identity to advertisers: it's not all about you
  5. What you say about yourself, or selves
  6. Forged identities and non-identities (this post)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Being online: What you say about yourself, or selves

Andy Oram @praxagora 2009-12-24

Which is the natural man,
and which the spirit? who deciphers them?

(This post is the fifth in a series called "Being online: identity, anonymity, and all things in between.")

What we've seen so far in this series would be enough to shake anyone's sense of identity. We've found that the technology of the Internet itself fudges identity (but does not totally succeed in hiding it), that companies use fragmented and partial information to categorize you, and that your actual identity is perhaps less important to these companies than your role as snippet of a statistic within a larger group. This post demands an even greater mental stretch: we have to face that what we say about ourselves is also distorted and inconclusive.

Sociological and psychologists tend to see our activities online as inherently artificial, referring to them as aspects of "the performative self." But the pundits haven't succeeded in getting their point of view across to the wider public. For instance, the millions of people who view personal video weblogs, or vlogs, fervently believe--according to a recent First Monday article by Jean Christian--in the importance of authenticity in people's video self-presentations. Viewers reject vlogs over such telltale signs as overediting or reading from scripts.

The touchstone for discussions of people's appearances and what their appearances say about them is Erving Goffman's classic Presentation of Self in Everyday Life, whose lessons I applied to the Internet in a recent blog. The book suggested that we fashion our appearances not to hide our true selves, but to reveal them in a manner others find meaningful. My blog reinforced this insight, pointing out that, although we do prettify ourselves online as claimed in one newspaper article, we can't compartmentalize aspects of ourselves. In other words, whatever presentation we make in one context or forum is likely to leak out elsewhere.

In another blog about Goffman, I focused on the signals we give out and pick up instinctively about each other in real life, indicating that they have to be specified explicitly in online media (although graphics and video now bring back some instinctive reactions).

Goffman's career ended before the Internet became a topic of sociological analysis, so at this point it's appropriate to bring in the chief researcher in the area of identity and the Internet, psychologist and sociologist Sherry Turkle. She claims that we do maintain multiple online identities, and that this is no simple game but reflects a growing tendency for us to have multiple selves. The fragmentary and divided presentation of self online reflects the truth about ourselves, more than we usually acknowledge.

Turkle's research, unfortunately, got channeled early in the Internet's history into landscapes that don't reflect its later use as a mass medium. She became fascinated, during the early years of popular computing and gaming in the 1980s, with the whims so many people indulged for portraying themselves as someone of a different age, gender, or profession, or just for hiding as much as they could in order to try out a different personality. This orientation colors both of her books on the subject, The Second Self (1984) and Life on the Screen (1995), and relegates her work to a study of psychological deviation.

Still, Turkle's work can make us think about the vistas that the Internet opens up for the Self. Surveying the multiple identities we create online and the ways we represent or misrepresent ourselves, she finds that people don't do this just for play or to maliciously deceive other people. Many do it to don identities that are hard to try on in real life.

A woman pretending to be a man might open up scenarios for practicing assertive behaviors that would produce a backlash if she rolled them out in real life. A shy person might learn, through an invented personality, how to flirt and even to practice mature love. Both of these forms of mimicry, which go back at least as far as Shakespeare's As You Like It, have proven useful to many people online.

But beyond these simple sorts of play-acting (for which real life provides its settings: acting classes, long journeys, spiritual retreats, "What happens in Las Vegas stays in Las Vegas") we glimpse in online personas a contemporary view of the self that is multi-layered and multi-faceted--by no means integrated and consistent.

Turkle also explores the psychological impact of computer interfaces. In particular, programs that act like independent, autonomous decision-makers push us to rethink our own human identities.

In the 1960s, people would spend hours typing confessions into the psychologist persona presented by Joseph Weizenbaum's ELIZA program. Trying out ELIZA now, it's hard to imagine anyone could be enticed into a serious conversation with it. But as we've grown more sophisticated, so have the deceits that programmers toss at us. Turkle reports an interaction with a robot at the MIT AI Lab that drew her in with a veracity that made her uncomfortable. "Despite myself and despite my continuing skepticism about this research project, I had behaved as though in the presence of another being."

Affective technologies have leapt even further ahead since 1995. Someday, robots for the disabled and elderly will try to reflect their feelings in order to provide care that goes beyond washing and feeding. Turkle draws on many strands of psychology, sociology, neurological science, and philosophy to show how our intellectual substrate has been prepared throughout the twentieth century for the challenges to Self that sophisticated computer programs present. Had the field of synthetic biology existed when Turkle wrote her books, it would have provided even more grist for her thesis.

This is one place where I part company with Turkle. I don't believe we're getting more and more confused about the dividing line between Computer Power and Human Reason (the title of a classic book by Weizenbaum, ELIZA's creator). I have more faith in our discernment. Just as we can see through ELIZA nowadays, we'll see through later deceptions as we become familiar with them. Simulated intelligences will not perennially pass the Turing test.

Turkle's view of online behavior is more persuasive. I'm willing to grant that exploring identity on the Internet can help us develop neglected sides of our identity and integrate them into our real selves. She expects us to go even further--to develop these sides without integrating them. We can quite happily and (perhaps) healthily live multiple identities, facilitated by how we present ourselves online.

Let's review the social setting in which Turkle inserts her arguments. Looking over the period during which the technologies and social phenomena Turkle researches have grown--the period from 1970 to the present, when MUDs and other online identity play developed--we see an astonishing expansion of possibilities for identity throughout real life. We have more choices than ever in career, geographic location, religious and spiritual practice, gender identification, and family status--let alone plastic surgery and drugs that alter our minds or muscles. People have reclaimed disappearing ethnic languages and turned vanishing crafts into viable careers. And people are experimenting with these things in countries characterized by repression as well as those considered more open.

Changes in speech and clothing allow us to try out different identities in different real-life settings with relative safety. We can sample a novel spiritual rite without relinquishing our traditional church. But of course, doing all these things online is even safer than doing them in physical settings.

Global information and movement lead to what sociologist Anthony Giddens, in his 1991 book Modernity and Self-Identity, calls reflexivity. I showed in the previous section how reflexivity works in the data collected by advertisers and corporate planners. Toward the cause of producing more of what we want and marketing it to us effectively, the corporations are constantly collecting information on us--purchases, web views and clicks, sentiment analysis-and feeding it back into activities that will, on the next phase, produce more such information. Reflexivity is a fundamental trait of modern institutions. But individuals, as Giddens points out, are also reflexive. We imitate what we see, online as well as offline. Online, it's even easier to try something and learn from the results. Goth clothing and body piercings we pick up online are cheaper and easier to discard than real ones when we have to clean up our image.

However, we're becoming more circumspect over the past few years as we realize that people will be able to tie our online forays back to us in the future; this may cause the lamentable end to experimentation with the Self.

Turkle refers to a story that was widely circulated and much discussed in an earlier decade, of a male psychiatrist who posed as a disabled but capable woman on CompuServe. He quickly entered supportive online relationships with a number of women. But as the relationships became too deep, he had to extricate himself from his virtual friends' dependencies, leaving a good deal of anger and numerous sociological questions.

But the most interesting aspect of the story to me is that no one can verify it. It appears to be a conflation of various incidents involving different people. In a way, drawing any conclusions at all would be pointless, because we don't know what emotions were involved and can't investigate the participants' positive and negative reactions. Thus does an influential and highly significant case study about Internet identity take on a murky identity of its own.

Today's digital trails are more persistent than those ones that created the legend of the CompuServe psychiatrist. Anyone engaging with strangers today would probably carry on through social networks, blogs, or wikis that do a better job of preserving the trail of logins and postings.

Thus, I return to my assertion that identity is becoming more unified online, not more fragmented. We may not be exactly as we appear online, but for the purposes of public discourse, what we appear to be is adequate.

When college student Jennifer Ringley began her famous webcam of daily life in 1996, it was seen either as a bold experiment in conceptual art or a pathetic bid for attention. Soon, though, the inclusion of cheap cameras in cell phones fostered a youth culture that captured and distributed every trivial moment of their lives, a trend driven further by ease of using Twitter from a cell phone.

Handy access to networks by cameras and video devices made it inevitable that people would impulsively send sexually suggestive photos of themselves to people with whom they were having intimate relationships, or with whom they wanted such relationships. A rather unscientific survey by The National Campaign to Prevent Teen and Unplanned Pregnancy found that 20% of teenager send nude or semi-nude photos of themselves to other people. A less sensationalistic report from the Pew Research Center finds only 4%, but raises the mystique-shattering admonition that the trust shown by the senders of the photos is routinely violated by their recipients, either right away or later when the relationship is ruptured.

Addressing the safety issue in an earlier article, I suggest that "along the spectrum of risky behaviors young people engage in (eating disorders, piercings in dangerous locations, etc.) to deal with body image problems that are universal at that age, a nude photo isn't so bad." But I would love to see a deeper psychological inquiry into why young (and not always young) people perform deal such blows to their own privacy. I think such counter-intuitive behavior embodies the very contradictions in image and reality that run through this series of articles.

Perhaps the eroticism of releasing intimate photos over the network reflects the core contradiction people sense in online identity. The nude photo is a unique token of one's deepest identity, without actually being that identity. Like René Magritte's famous pipe painting, the photo of you is not you. But by sending it to someone with whom you want a sexual relationship, you're saying, "Hey bud, this could be me if you follow through in the flesh."

For a long time the Internet was praised as a place to shed the baggage of race and other defining traits ("nobody knows you're a dog"). But as researchers such as Lisa Nakamura point out, postings that brim over with images and videos reintroduce race, gender, and other artifacts of daily life with a vengeance. And research by anthropologist danah michele boyd shows that people self-segregate in social forums, reinforcing rather than breaking down the social divisions that frustrate the prospects for mutual understanding among different races and groups.

One could throw in, as another consequence of the growth of identity, the oft-observed tendency to read only political articles that reinforce one's existing views. Unlike other observers, who look back wistfully at an age where we all got our information from a few official media sources, I have applauded the proliferation of views, but agree that we need to find ways to encourage everyone to read the most cogent arguments of their opponents. Censorship--even self-censorship--does not contribute to identity formation in a healthy manner.

There's also more than a hint of the trend toward asserting identity in the participatory culture chronicled and analyzed by Henry Jenkins: the fan fiction, the commentary sites for X Files and The Matrix, the games and consumer polls held by movie studios, and so forth. This participatory culture is mostly a community affair, which creates a group identity out of many unconnected individuals. But surely, creating an unauthorized sequel or re-interpreting a scene in a movie is also an act of personal expression. I would call it placing a stake in the cultural ground, except that the metaphor would be far too static for an ever-changing media stream. It would be more apt to call the personal contributions a way of inserting a marker with one's identity into the ongoing reel of unfolding culture.

It's a lot easier nowadays to be real when you're on the Internet. But some people still, for many reasons, adopt forged identities or non-identities. We'll explore that phenomenon next.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics
  4. Your identity to advertisers: it's not all about you
  5. What you say about yourself, or selves (this post)
  6. Forged identities and non-identities (to be posted December 26)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Peer to Patent Australia recruits volunteer prior art searchers

Andy Oram @praxagora 2009-12-24

The Peer to Patent project has already earned its place in history. It was explicitly cited as inspiration for the open government initiative in the Obama administration, which recently released a comprehensive directive (available as a PDF) covering federal agencies. The founder of the project, law professor Beth Noveck, began implementation of the directive as Deputy CTO in the US government. But I've been wondering, along with many other people, where Peer to Patent itself is going.

It's encouraging to hear that a new pilot has started in Australia and has gathered a small community of volunteer patent art seekers. You can check out the official site and its Wikipedia page. Because Australia is much smaller in population than the US and sees much less patent activity, the scope of the pilot is smaller but seems to be chugging along nicely.

The pilot started on December 9 and plans to run for six months, offering 40 patents for review in the areas of software and business methods (the same ones as the US Peer to Patent project). Among participating patent applicants are IBM, General Electric, Hewlett-Packard, Yahoo!, CSIRO, and Aristocrat. Right now, 15 patents are posted, each has at least one volunteer reviewer, and one boasts two suggestions for potential prior art.

Professor Brian Fitzgerald of the Queensland University of Technology, the Project Leader of Peer to Patent Australia, says, "Peer to Patent allows people from anywhere to plug into the patent examination process and to add what value they can. And from what we have seen in the US, it works: examiners are relying on the Peer to Patent prior art notifications. Our aim is to help build an international platform for the project as well as embed its benefits within the Australian patent system. We ask you to join the Australian project and help contribute to the development of Peer to Patent on a worldwide basis."

While the U.S. pilot is undergoing evaluation, Peer to Patent's executive directory Mark Webbink says, "Signs are good for a potential restart of the program some time in 2010. Dave Kappos, the Under Secretary of Commerce and Director of the USPTO, has long been a supporter of Peer to Patent, and the prior art contributions appear to be proving useful. The worldwide economy produced some drag on program expansion when the UK Intellectual Property Office delayed its anticipated pilot. However, the Japan Patent Office, which previously ran its own peer review pilot, now appears interested in expanding its program. IP Australia and Queensland University of Technology are to be commended for moving on the pilot so quickly." Brian Fitzgerald says that China and other Asian countries are watching Japan and Australia with interest.

I have followed Peer to Patent since fairly early drafts of the proposal, have written about it frequently, and believe it is both viable and necessary. The recent ruling against Microsoft Office shows that patents in software, at least, are way out of control. Prior art cannot in itself solve a broken system, but a robust examination process can at least make applicants think twice about trying to exert ownership over routine concepts such as separating a document's markup from its content. (That's the purpose of markup in the first place.) Incidentally, Australia has its own version of the famous Bilski patent case, Grant v Commissioner of Patents.

In fact, the progress Peer to Patent has made in many countries proves my faith in it. Just think about the inertia of government agencies and the impenetrability of both the individual patent application and the patent process as a whole. Who would imagine, putting all those barriers together, that Peer to Patent could have accomplished so much already?

We're not on Internet time here, but on policy time. Peer to Patent is still a baby, and with enough care and feeding it can thrive and grow strong.

Being online: Your identity to advertisers--it's not all about you

Andy Oram @praxagora 2009-12-22

Thy self thou gav'st, thy own worth then not knowing

(This post is the fourth in a series called "Being online: identity, anonymity, and all things in between.")

Voracious data foraging leads advertisers along two paths. One of their aims is to differentiate you from other people. If vendors know what condiments you put in your lunch or what material you like your boots made from, they can pinpoint their ads and promotions more precisely at you. That's why they love it when you volunteer that information on your blog or social network, just as do the college development staff we examined before.

The companies' second aim is to insert you into a group of people for which they can design a unified marketing campaign. That is, in addition to differentiation, they want demographics.

The first aim, differentiation, is fairly easy to understand. Imagine you are browsing web sites about colic. An observer (and I'll discuss in a moment how observations take place) can file away the reasonable deduction that there is a baby in your life, and can load your browser window with ads for diapers and formula. This is called behavioral advertising.

Since behavioral advertising is normally a pretty smooth operator, you may find it fun to try a little experiment that could lift the curtain on it bit. Hand your computer over for a few hours to a friend or family member who differs from you a great deal in interests, age, gender, or other traits. (Choose somebody you trust, of course.) Let him or her browse the web and carry on his or her normal business. When you return and resume your own regular activities, check the ads in your browser windows, which will probably take on a slant you never saw before. Of course, the marketers reading this article will be annoyed that I asked you to pollute their data this way.

Experiences like this might arouse you to be conscious of every online twitch and scratch, just as you may feel in real life in the presence of a security guard whose suspicion you've aroused, or when on stage, or just being a normal teenager. Online, paranoia is level-headedness. Someone indeed is collecting everything they can about you: the amount of time you spend on one page before moving on to the next, the links you click on, the search terms you enter. But it's all being collected by a computer, and no human eyes are ever likely to gaze upon it.

Your identity in the computerized eyes of the advertiser is a strange pastiche of events from your past. As mentioned at the beginning of the article, Google's Dashboard lets you see what Google knows about you, and even remove items--an impressive concession for a company that has mastered better than any other how to collect information on casual Web users and build a business on it. Of course, you have to establish an identity with them before you can check what they know about your identity. This is not the last irony we'll encounter when exploring identity.

But advertisers do more than direct targeting, and I actually find the other path their tracking takes--demographic analysis--more problematic. Let's return to the colicky baby example. Advertisers add you to their collection of known (or assumed) baby caretakers and tag your record with related information to help them understand the general category of "baby care." Anything they know about your age, income, and other traits helps them understand modern parenting.

As I wrote over a decade ago, this kind of data mining typecasts us and encourages us to head down well-worn paths. Unlike differentiation, demographics affect you whether or not you play the game. Even if you don't go online, the activities of other people like you determine how companies judge your needs.

The latest stage in the evolution of demographic data mining is sentiment analysis, which trawls through social networking messages to measure the pulse of the public on some issue chosen by the researcher. A crude application of sentiment analysis is to search for "love" or "hate" followed by a product trademark, but the natural language processing can become amazingly subtle. Once the data is parsed, companies can track, for instance, the immediate reaction to a product release, and then how that reaction changed after a review or ad was widely disseminated. Results affect not only advertising but product development.

Once again, my reaction to sentiment analysis mixes respect for its technical sophistication with worries about what it does to our independence. If you add your voice to the Twittersphere, it may be used by people you'll never know to draw far-reaching conclusions. On the other hand, if you refuse to participate, your opinion will be lost.

Google's Dashboard tells you only what they preserve on you personally, not the aggregated statistics they calculate that presumably include anonymous browsing. But you can peek at those as well, and carry on some rough sentiment analysis of your own, through Google Trends.

Considering all this demographic analysis (behavioral, sentiment, and other) catapults me into a bit of a 21st-century-style existential crisis. If a marketer is able to combine facts about my age, income, place of birth, and purchases to accurately predict that I'll want a particular song or piece of clothing, how can I flaunt my identity as an autonomous individual?

Perhaps we should resolve to face the brave new world stoically and help the companies pursue their goals. Social networking sites are developing APIs and standards that allow you to copy information easily between them. For instance, there are sites that let you simultaneously post the same message instantly to both Twitter and Facebook. I think we should all step up and use these services. After all, if your off-the-cuff Tweet about your skis from the lounge of a ski resort goes into planning a multimillion dollar campaign, wouldn't it be irresponsible to send the advertiser mixed messages?

My call to action sounds silly, of course, because the data gathering and analysis will obviously not be swayed by a single Tweet. In fact, sophisticated forms of data mining depend on the recent upsurge of new members onto the forums where the information is collected. The volume of status messages has to be so high that idiosyncrasies get ironed out. And companies must also trust that the margin of error caused by malicious competitors or other actors will be negligible.

We saw in an earlier section that your online presence is signaled by a slim swath of information. At the low end, marketers know only your approximate location through your IP address. At the other extreme they can feast on the data provided by someone who not only logs into a site--creating a persistent identity--but fills out a form with demographic information (which the vendor hopes is truthful).

As another example of modern data-driven advertising, Facebook delivers ads to you based on the information you enter there, such as age and marital status. A tech journal reported that the Google Droid phone combines contacts from many sources, but I haven't experienced this on my Droid and I don't see technically how it could be done.

Most browsing takes place in an identity zone lying between the IP address and the filled-out profile. We saw this zone in my earlier example from the coffee shop. The visitor does not identify himself, but lets the browser accept a cookie by default from each site.

Each cookie--so long as you don't take action to remove one, as I did in my experiment--is returned to the server that left it on your browser. If you use a different browser, the server doesn't know you're the same person, and if a family member uses your browser to visit the same server, it doesn't know you're different people.

Because the browser returns the cookie only to servers from the same domain--say, yahoo.com--that sent the cookie, your identity is automatically segmented. Whatever yahoo.com knows about you, oreilly.com and google.com do not. Servers can also subdivide domains, so that mail.yahoo.com can use the cookie to keep track of your preferred mail settings while weather.yahoo.com serves meteorological information appropriate for your location.

This wall between cookies would seem to protect your browsing and purchasing habits from being dumped into a large vat and served up to advertisers. But for every technical measure protecting privacy, there is another technical trick that clever companies can use to breach privacy. In the case of cookies, the trick exploits the ability of a web to can display content from multiple domains simultaneously. Such flexibility in serving domains is normally used (aside from tweaks to improve performance) to embed images from one domain in a web page sent by another, and in particular to embed advertising images.

Now, if advertisers all contract with a single ad agency, such as DoubleClick (the biggest of the online ad companies), all the ads from different vendors are served under the doubleclick.com domain and can retrieve the same cookie. You don't have to click on an ad for the cookie to be returned. Furthermore, each ad knows the page on which it was displayed.

Therefore, if you visit web pages about colic, skis, and Internet privacy at various times, and if DoubleClick shows an ad on each page, it can tell that the same person viewed those disparate topics and use that information to choose ads for future pages you visit. In the United States, unlike other countries, no laws prohibit DoubleClick from sharing that information with anyone it wants. Furthermore, each advertiser knows whether you click on their ad and what activity you carry on subsequently at their site, including any purchases you make and any personal information you fill out in a form.

Put it all together, and you are probably far from anonymous on the Internet. In addition, a more recent form of persistent data, controlled by the popular Flash environment through a technology called local shared objects, makes promiscuous sharing easy and removing the information much harder.

The purchase of DoubleClick in 2007 by Google, which already had more information on individuals than anybody else, spurred a great protest from the privacy community, and the FTC took a hard look before approving the merger. A similar controversy may surround Google's recently announced purchase of AdMob, which provides a service similar to DoubleClick for advertisers on mobile phones.

So far I've just covered everyday corporate treatment of web browsing and e-commerce. The frontiers of data mining extend far into the rich veins of user content.

Deep packet inspection allows your Internet provider to snoop on your traffic. Normally, the ISP is supposed to look only at the IP address on each packet, but some ISPs check inside the packet's content for various reasons that could redound to your benefit (if it squelches a computer virus) or detriment (if it truncates a file-sharing session). I haven't heard of any ISPs using this kind of inspection for marketing, but many predictions have been aired that we'll cross that frontier.

Governments have been snooping at the hubs that route Internet traffic for years. China simply blocks references to domains, IP addresses, or topics it finds dangerous, and monitors individuals for other suspected behavior. The Bush administration and American telephone companies got into hot water for collecting large gobs of traffic without a court order. But for years before that, the Echelon project was filtering all international traffic that entered or left the US and several of its allies.

One alternative to being tossed on the waves of marketing is to join the experiments in Vendor Relationship Management (VRM), which I covered in a recent blog. Although not really implemented anywhere yet, this movement holds out the promise that we can put out bids for what we want and get back proposals for products and services. Maybe VRM will make us devote more conscious thinking to how we present ourselves online--and how many selves we want to present. These are the subjects of the next section.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics
  4. Your identity to advertisers: it's not all about you (this post)
  5. What you say about yourself, or selves (to be posted December 24)
  6. Forged identities and non-identities (to be posted December 26)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Being online: Your identity online--getting down to basics

Andy Oram @praxagora 2009-12-20

What men daily do, not knowing what they do!

(This post is the third in a series called "Being online: identity, anonymity, and all things in between.")

Previous posts in this series explored the various identifies that track you in real life. Now we can look at the traits that constitute your identity online. A little case study may show how fluid these are.

One day I drove from the Boston area a hundred miles west and logged into the wireless network provided by an Amherst coffee shop in Western Massachusetts. I visited the Yahoo! home page and noticed that I was being served news headlines from my home town. This was a bit disconcerting because I had a Yahoo! account but I wasn't logged into it. Clearly, Yahoo! still knew quite a bit about me, thanks to a cookie it had placed on my browser from previous visits.

[A cookie, in generic computer jargon, is a small piece of data that a program leaves on a system as a marker. The cookie has a special meaning that only the program understands, and can be retrieved later by the program to recall what was done earlier on the system. Browsers allow web sites to leave cookies, and preserve security by serving each cookie only to the web site that left it (we'll see in a later section how this limitation can be subverted by data gatherers).]

Among the ads I saw was one for the local newspaper in my town. Technically, it would be possible Yahoo! to pass my name to the newspaper so it could check whether I was already a subscriber. However, the Yahoo! privacy policy promises not to do this and I'm sure they don't.

As an experiment, I removed the Yahoo! cookie (it's easy to do if you hunt around in your browser's Options or Preferences menu) and revisited the Yahoo! home page. This time, news headlines for Western Massachusetts were displayed. Yahoo! had no idea who I was, but knew I was logging in from an Internet service provider (ISP) in or near Amherst.

What Yahoo! had on me was a minimal Internet identity: an IP address provided by the Internet Protocol. These addresses, which usually appear in human-readable form as four numbers like 150.0.20.1, bear no intrinsic geographic association. But they are handed out in a hierarchical fashion, which allows a pretty good match-up with location. At the top of the address allocation system stand five registries that cover areas the size of continents. These give out huge blocks of addresses to smaller regions, which further subdivide the blocks of addresses and give them out on a smaller and smaller scale, until local organizations get ranges of addresses for their own use.

Yahoo! simply had to look up the ISP associated with my particular IP address to determine I was in Western Massachusetts. But the technology is a bit more complicated than that. I was actually associated with three IP addresses--a complexity that shows how the fuzziness of identity on the Internet extends even to the lowest technological levels.

First, when I logged in to the coffee shop's wireless hub, it gave me a randomly chosen IP address that was meaningful only on its own local network. In other words, this IP address could be used only by the hub and anyone logged in to the hub.

The hub used an aged but still vigorous technology known as Network Address Translation to send data from my system out to its ISP. As my traffic emanated from the coffee shop, it bore a new address associated with the coffee shop's wireless hub, not with me personally. All the people in the coffee shop can share a single address, because the hub associates other unique identifiers--port numbers--with our different streams of traffic.

But the ISP treats the coffee shop as the coffee shop treats me. The coffee shop's own address is itself a temporary address that is meaningful to the local network run by the ISP. A second translation occurs to give my traffic an identity associated with the ISP. This third address, finally, is meaningful on a world scale. It is the only one of the three addresses seen by Yahoo!.

However, an investigator (hopefully after getting a subpoena) could ask an ISP for the identity of any of its customers, submitting the global IP address and port numbers along with the date and time of access. The coffee shop didn't require any personal information before logging me in and therefore could not fulfill an investigator's request, but a person doing illegal file transfers or other socially disapproved activity from a home or office would be known to the hub system and could therefore by identified--so long as logfiles with this information had not been deleted from the hub.

The combination of IP address, port numbers, and date and time allows the Recording Industry Association of America to catch people who offer copyrighted music without authorization. And this technological mechanism underlies the European Union requirement for ISPs to keep the information they log about customer use, as mentioned in the first section of this article.

If I want to hide this minimal Internet identity--the IP address--I have to use another Internet account as a proxy. In the case of my visit to Western Massachusetts, I was protected by logging in anonymously to a coffee shop, but in some countries I'd be required to use a credit card to gain access, and therefore to bind all my web surfing to a strong real-world identity. Many European countries require this form of identification, outlawing open wireless networks.

To generalize from my Amherst experiment, the information we provide as we use the Internet is very limited, and can be limited even further through simple measures such as removing cookies (a topic covered further in a later section of this article). But what the Internet still allows can be used in a supple manner to respond instantly with ads and other material--such as the nearest coffee shop or geographically relevant weather reports--that are hopefully of greater value than the corresponding material in print publications we peruse.

This post has explored the use of IP addresses metaphorically, as well as illustratively, to show how our Internet identity is context-sensitive and can change utterly from one setting to another. Usually, we provide more of a handle to the people we communicate with over email, instant messaging, forums, and so forth. Here too we have multiple identities and spend hours collecting each other's handles.

Email, the oldest form of personal online communication, ironically has one of the better hacks for combining identities. You email accounts can be set up to forward mail, so that mail to the address you kept from your alma mater goes automatically to your work address.

In contrast, you can't use your AIM instant message account to contact someone on MSN, so you need a separate account on each IM service and no one will know they all represent you unless you tell them. Twitter is experimenting with ways to assure users that accounts with well-known names are truly associated with the people after which they're named.

If IM services all agreed to use XMPP (or some other protocol) you could reduce all your IM accounts to one. And if every social network supported OpenSocial, you could do a lot of networking while maintaining an account on just one service.

A widely adopted protocol called OpenID allows one identity to support another: if you have an account on Yahoo! or Blogger you can use it to back up your assertion of identity on another site that accepts their OpenID tokens. OpenID and related technologies such as Information Card don't validate your existence or authenticate the personal traits you have outside the Internet, but allow the identity you've built up on one site to be transferable.

My next post shows how the minimal elements of online identity have been expanded by advertisers and other companies, who combine the various retrievable polyps of our identity. Following that, we'll see how we ourselves manipulate our identities and forge new ones.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Being online: Your identity in real life--what people know
  3. Your identity online: getting down to basics (this post)
  4. Your identity to advertisers: it's not all about you (to be posted December 22)
  5. What you say about yourself, or selves (to be posted December 24)
  6. Forged identities and non-identities (to be posted December 26)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Being online: Your identity in real life--what people know

Andy Oram @praxagora 2009-12-18

But he that writes of you, if he can tell
that you are you, so dignifies his story.

(This post is the second in a series called "Being online: identity, anonymity, and all things in between.")

Long before the Internet, much of our private lives were available to those who took an interest, and not just if we were a celebrity chased by paparazzi or a lifelong resident of a small village. Investigators with many good reasons for ferreting out such knowledge--non-profit organizations, college development offices, law enforcement professionals, private detectives--pursued their quarries with incredibly sophisticated strategies for uncovering as much information as they could and shrewdly deducing even more. The Internet has simply infused these methods with new ingredients.

For background, I interviewed a development professional at a private college. The goal of such professionals is to deduce a person's ability to contribute, using publicly available information such as purchases and sales of land, marriage and divorce records, and stock prices for the companies in which prospects hold leading positions. A few golden sources exist for tracking the most attractive fundraising candidates:

  • Publicly traded companies reveal the compensation (salary, bonuses, and stock) of their five highest paid employees.
  • Law journals report the compensation of the partners at the top 200 law firms.
  • Foundations owned by prospective donors file public reports, as Series 990 tax forms, listing the foundation's assets and donations.
  • Salaries of public officials are open records.

More generally, Lexis-Nexis offers easy and powerful searches on articles from which development professionals can glean valuable biographical information and indications of how well the prospects' companies are faring.

If your name is John Smith or Ali Khan, you may be a bit hard to track over the decades. But casual details such as place of residence or number of children can allow the development staff to piece together information sources. If you provide the alumni office with even one or two scraps of such information, you help snap the connecting rods in place.

The Internet has sprung upon the development field like a geyser--with particularly rich pools of information in Zillow.com's real estate listings, corporate biography sites, and donor lists for philanthropic organizations--while the new social networks make fund-raising professionals even giddier. For instance, social network traffic makes it much easier for development offices to keep track of alumni's family members, which offer indications of their financial means. Weblogs where a prospective donor trumpets his or her passions can help shape the right appeal to loosen the purse strings.

If any of this has made you nervous, let me stake out the position that legitimate development research is crucial for social progress. Colleges and non-profits depend on the donations of those fortunate enough to have disposable income. People whose incomes render them subjects of this sort of tracking know the score; dealing with fund-raisers is just part of the responsibility of wealth management. And the fund-raisers have high professional standards, such as the Association of Professional Researchers for Advancement's statement of ethics.

The general population is less well informed than the rich about the public aspects of their private lives, which is why I've chosen this section to begin my survey of identity. I myself run into surprise from ordinary citizens I call up when I'm volunteering for a political campaign and trying to mobilize potential supporters. Some people express annoyance that I know they voted in a Democratic or Republican primary. Indeed, although their choice of candidate on the ballot is a secret, the fact that they voted on that ballot is public information. (Forty-eight states in the US provide it to anybody who asks, while the other two have ways of getting it less directly.)

Democracy relies the use of voter rolls by campaign workers like me to reach out to our neighbors, drum up the vote, and convey our message. The extensive time we put into these pursuits is one of the few counterbalances to the dominance of TV and radio ads in determining public opinion. Those who don't understand the value of open records in voting might be even more upset to know that anyone can easily find out what candidates they gave money to, and how much. But get used to it: your actions matter to society, and our right to know often trumps your right to be left alone.

Of course, I haven't recounted the ways banks, retail chains, and insurance companies track us; we're all aware of it. A section of this article is devoted to the slice of this activity that makes up behavioral advertising online. When WIRED journalist Evan Ratliff gave a up month of his life to be voluntarily hunted, ditching his identity and trying to hide behind a new one, he discovered that savvy investigators, working with cooperating vendors but with no help from law enforcement, could decipher when and where he got money from ATMs, made routine purchases, and arranged air flights.

Ultimately, you can be most reliably identified through your DNA, but the methodology and data are usually available only to law enforcement. The police used to trace you through fingerprints, but we've learned over the decades how unreliable those are. So DNA is the gold standard for identity.

The British police have been using any excuse to take a DNA sample from everyone they come across. Recently, upon being told by the European Court of Human Rights that preserving samples for indefinite lengths of time were a violation of privacy, the police grudgingly agreed to destroy the samples taken from innocent people after six years.

In many British localities--and a number of American ones as well--your identity is extended to include your automobile. These are areas where governments have installed cameras to capture license plates, and where the traffic ticket will come to you if some other person driving your car goes through a red light or exceeds the speed limit.

To the security system at your workplace, you may be your key card, or the numeric code you enter on a touchpad, or your facial bone structure or iris image. Security experts like to distinguish three kind of identifying traits that correspond to these security checks: something you possess, something you know, and something you are.

Even anonymized data such as census figures can be associated with individuals through a little--surprisingly little--bit of additional information. In the most famous and dramatic demonstration of the power of joined data, a Carnegie Mellon student obtained the health records of a public figure simply by combining publicly available information. Such exploits are fodder more for identity thieves than for fund-raisers or advertisers, but they show how exposed you can become when tiny pieces of your life float around on public sites. The Internet provides an enormous, integrated platform for retrieving identities.

The next post in this series, turning to our presence on the Internet itself, reduces our focus to the minimal data technically available on the Internet. As we'll see, while it restricts what web servers know about us, it compensates by providing immediate, dynamic exploitation of that information.

The posts in "Being online: identity, anonymity, and all things in between" are:

  1. Introduction
  2. Your identity in real life: what people know (this page)
  3. Your identity online: getting down to basics (to be posted December 20)
  4. Your identity to advertisers: it's not all about you (to be posted December 22)
  5. What you say about yourself, or selves (to be posted December 24)
  6. Forged identities and non-identities (to be posted December 26)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Being online: identity, anonymity, and all things in between

Andy Oram @praxagora 2009-12-17

To be or not to be: that is the question.

Hamlet's famous utterance plays a trick on theater-goers, a mind game of the same type he inflicted constantly on his family and his court. While diverting his audience's attention with a seemingly simple choice between being and non-being, Hamlet of all people would know very well how these extremes bracket infinite gradations.

Our fascination with Hamlet is precisely his instinct for presenting a different self to almost everyone he met. Scholars have been arguing for four hundred years about Hamlet's moral compass, whether his feigned insanity masked a true mental illness, whether the suffering and death he inflicted on those around him was a deliberate strategy, what psychological complexes fueled his cruel excoriation of Ophelia, and other dilemmas that come down to questions about his identity.

We can appreciate, therefore, why actors up to the present day have to memorize Hamlet's "Speak the speech" passage. As a thespian, Hamlet outshown all the Players.

We can bring this critical perspective on identity into our own 21st-century lives as we populate social networks and join online forums. When people ask who we are, questions multiply far beyond the capacity of a binary "to be" digit.

No matter how candidly we flesh out our digital representations online, they remain skin-deep. They can never reflect how we are known to our families, neighbors, and workmates. Even if we stole a vision from science fiction and preserved a complete scan of our brains, the resulting representations would not be able to demonstrate the dexterity we've built by playing basketball every Saturday, or show the struggles we have to control Tourette's syndrome.

I don't believe anybody has tied down the meaning of online presence, and I don't presume to do so here. But we may find better resolutions to some of the everyday dilemmas we face by exploring, over the course of this article, facets of self that have been discovered and debated in the age of computers.

Before widespread participation in Web 2.0-style forums, the question of online identity was framed as an issue of privacy under assault by large institutions. Only governments and major corporations could install and program the mainframe computers that stored the digital evidence of our identities. Within that framework, starting in the 1970s, European countries that were still shadowed by the history of Nazi round-ups started to limit the sharing of personal information gathered during commerce and other transactions.

But at the same time that these laws, enshrined in a 1995 Data Protection Directive and further extended to transactions that the EU carries out with other countries, set a standard for the regulation of commercial data collection, these same European governments have also, ironically, unleashed surveillance in response to the terror that hit them during this decade. Internet providers are required to retain information about the connections made by their customers for periods of time ranging from six months to many years. London has led the world in putting up more than one million surveillance cameras--which helped to identify the 2005 Underground bombings--and yet, according to the BBC, has fewer cameras per capita than many other cities.

To faceless spies and intrepid marketers, our identity is defined by the web site we just visited about surveillance cameras, the tube of spermicidal jelly we bought on vacation in Florida, or other odds and ends that allow them to differentiate us from other people with similar ordinary profiles. The result may be a knock on the door from Interpol or just a targeted ad for romantic getaways.

But in the age of social networks and Web 2.0, we become the agents of our own undoing. And therefore, discussions about identity must be fashioned with a subtler clay. At every juncture--morning, noon and night--we redefine our own identities.

Should we post our age and marital status? Should we make our profile private or public? Should we reveal that we're gay? (Data-crawling programs can make a pretty good guess about it even if we don't.) Should we boast on Twitter that we applied for a grant? Should we talk about the ravages of chronic Crohn's disease? This article will lead its readers, hopefully, to a fruitful way of thinking about these choices.

Next, what about the elements of our identity that are controlled less by us than by other random individuals? Should we ask that freshman to take down the photo he posted where we lay passed out at a party? Should we respond to the blogger who mangled the facts during a blustering attack on our latest political activity?

And the ultimate arbiter of identity: what turns up when people search for us? Yes, our selves are all in the hands of Google (and for the most wretched of all--the famous--Wikipedia). Admitting its hegemony over identity, Google now lets us store our own profiles to be served up when people search for us. They also reveal (at least some of) how they're tracking us at a service called Dashboard. As we'll see, social networking allows us more control over the image we present--at the cost of entering discussions that are not of our choosing.

Truly, social networking is the Internet phenomenon of the year and deserves an end-of-the-year profile (this post is the first in a series of eight). In a recent 19-month period, Facebook rose from 75 million to 300 million members, and Twitter has gone from perhaps 1.3 million users (depending on how you count them) to an estimated 18 million.

Not only have the sites dedicated to social networking swollen voluminously, but their techniques have been watched carefully by others. Analysts advise corporations that, to maintain their customer bases, it's not enough to offer a good product, not enough to market it adeptly and back it up with good service, not enough even to invite comments and customer reviews on popular web sites--no, the corporation must build community. They have to entice customers to socialize and come to feel that they're part of a common mission--a mission centered on the corporation.

Increasingly, the forward march of social networking can be seen on site for other services and organizations. It inspires things as trivial as visitor pictures and profiles, or as complex as mechanisms for encouraging visitors to sign up more recruits, mark other members of the site as friends, form affinity groups, post content, and compete for points that harbor some promise of future value.

Although I'd like to drop in to buy a cup of coffee or a shirt without social networking, and many of the ground-breaking techniques for building community turn into gimmicks when reduced too crassly to attention-getting techniques, I think this trend is beneficial. People are more effective when they know each other better. And the basis for knowing each other will be found in personal and group identity.

Before the end of the year, I'll post eight related entries that add up to a treatise titled "Being online: identity, anonymity, and all things in between:"

  1. Introduction
  2. Your identity in real life: what people know (to be posted December 18)
  3. Your identity online: getting down to basics (to be posted December 20)
  4. Your identity to advertisers: it's not all about you (to be posted December 22)
  5. What you say about yourself, or selves (to be posted December 24)
  6. Forged identities and non-identities (to be posted December 26)
  7. Group identities and social network identities (to be posted December 28)
  8. Conclusion: identity narratives (to be posted December 30)

Good News: The Daily Me is a stop on the way to richer discussion

Andy Oram @praxagora 2009-12-03

Recent reports about a preference for reading news and opinion pieces from sources we agree with has raised alarms, including a brief and informative posting by Joshua-Michéle Ross on this Radar blog site. Surveys highlight an undeniable trend: as weblogs continue to post alternatives to the mainstream media and people's viewing habits are shaped more and more by invitations from friends ("gotta check out this video!"), we are cocooning ourselves in worlds of information that reinforce our existing prejudices. Our personal choice to exercise the prior restraint of free speech in news reading has been dubbed the "Daily Me."

I'll plead guilty right away. Sometimes I happen upon a thoughtful article by a conservative commentator that rips away the progressive lenses through which I read up on the issues and (perhaps) jump to conclusions. At such times I think--gee, I should get more of this diet. But usually I let Eric Alterman read and summarize the right-wing press for me.

So I agree we have a problem, but I don't lament the end of the "shared cultural literacy" or "common point of reference" that we've lost. I wonder what the commentators who utter such complaints want us to return to. Are they nostalgic for the years during which Americans got all their news from three TV networks, when papers and magazines across the country slavishly took their cues from Time Magazine and the New York Times (as Noam Chomsky would demonstrate) concerning what news was fit to print?

The government and other established forces managed to cover up enormous crimes in those years, such as the transfer of Nazi leaders to South American by the Office of Strategic Services and its successor, the CIA, to carry out torture and repression. This particular outrage was well-known on the left but completely blacked out in the mainstream press, until John Kerry lifted the veil a bit in the 1980s with his hearings on the drug-contra connection. (And even progressives trailed along with the fiction during the George W. Bush years that his staff introduced torture into American policy. It's time for a revival of Costa Gavras's 1973 film State of Siege.)

I find much to like in the current environment, where investigations and proposals on the left, right, and everywhere in between are easy to find. But putting on blinders does create a breeding ground for irresponsible reporting and junk science, which are reaching epidemic proportions. That's why we need to hoist ourselves out of our comfortable milieux.

Bias is nothing new. Religious and political establishments have been burning books (and their authors) ever since written language was invented. Most of our knowledge about alternative movements (such as heterodox Christian sects) comes from scholars' historical analysis of the vituperative screeds written by the orthodox.

Critics of the current situation don't realize how much we have moved from the orthodox to the orthogonal. We live in different worlds and even speak different languages. The tower of axioms, historical citations, and interpretations built up on each side--the narrative, as social scientists like to call it--has become so powerful that we can't productively read another side's viewpoints because we interpret the language and events through our existing prisms.

In the US, for instance, thinkers on right and left hail the country's founders, but what we take from their writings and behavior is completely different. Madison, Paine, Jefferson, and others provide plenty of grist for both the current left wing and current right wing.

So just following the writings of people with whom you disagree, while a good start, is not enough. To really listen requires a new attitude.

I'm hopeful that this will come about. It's not because we'll all put on humble sackcloth and love our enemies as ourselves. It's because we'll be forced to listen as our opponents exert their power.

In the woefully divided Senate, Republicans have held up laws, administrative staffing, and judicial posts by a variety of tactics. The filibuster is not the most effective tool, although it is the best-known; usually obstructionists rely on novel exploitations of committee rules. This exercise of power to stop government from functioning when they lack the votes to get what they want is a way of forcing their views on the table.

At any particular point one can attribute a particular Congressman's action to a lobbyist's donation, partisan angling, or political deal-making, but a prolonged and widespread campaign of such tactics must reflect a point of view that has some following in the country.

Whereas Congressmen can sabotage a majority agenda through procedural subtleties, less powerful people do it by blocking a door, throwing a stone, or even strapping on a suicide belt.

The lead actors can't necessarily be dissuaded from obstructing doorways or judicial appointments. But we have to understand them and the people whose causes they claim to represent in order to find a way out of the jam. All sides of an issue use media to recruit to their cause, so they must at some point reveal their logic and subject it to debate. Everyone is vulnerable to soft power at the source.

But as I pointed out, it's not enough to hammer on the facts from your point of view, because the entire way you view the language and the context is anathema to your opponents. To change their minds or undercut support among their base, you have to mentally enter their world. In another article I've suggested one technical system that might help.

The necessities of power, then, rather than the weak urges of good will, eventually will get us to listen to each other. And having all the materials at our fingertips will help. Too much news is good news.

Encouraging results from Peer-to-Patent(Peer-to-Patent令人振奋的效果)

Andy Oram 2008/07/02

Congratulations to the organizers of Peer-to-Patent, which is carrying off one of the most audacious experiments in Internet activism in our day. A lot of ink has been spilled about Barack Obama's application of social networking techniques to presidential campaigning (and to Ron Paul's successful fund-raising before that) but Peer-to-Patent makes those achievements seem entirely run-of-the-mill.

The premise behind Peer-to-Patent, which many observers called impractical, was that thousands of experts in technical fields would flock to the site to read patent applications (if you've ever read one, you'd hike the stakes against success several notches right there) and would find prior art that would lead to rejection or restrictions on patent claims.

Well, it's working. A report released by the non-profit project in PDF format reports the data from surveys and an analysis of patents handled during the first year of the project. The sample is small (23 patents) but bears some impressive fruit.

First, people are signing up: over 2,000 so far. Second, they're submitting prior art: 202 pieces. Most important: they're enjoying the work and would volunteer again.

The patent examiners--employees of the US Trademark and Patent Office who are responsible for evaluating applications--also like the project. They overwhelmingly say they appreciate the submissions and would like to work with the community more.

How about the proof of the patent pudding? Nine rejections of patent claims cited prior art found by the Peer-to-Patent volunteers.

Even more significant is the sources of the prior art. When patent examiners reject a patent, they usually cite previous patents as prior art. This has undeniable value by keeping someone who is not truly an inventor from gaining control over an existing technology, but it doesn't perform the crucial role of the Patent Office in protecting public information that is already open for use by everyone.

So when the Peer-to-Patent project finds that volunteers submit a relatively high percentage of non-patent prior art, it suggests that they can really keep free information free: unencumbered by unwarranted patents.

We need a lot more data, of course, before we'll know whether Peer-to-Patent really works. But it's different from other patent-busting projects because it's structured around effective group participation. It's not just a form to fill out or a Slashdot-style, free-for-all comment page. It's a real community, with a clearly defined purpose and a spirit of cooperation.

Even if it becomes institutionalized, Peer-to-Patent can't fix everything that's wrong about the patent system. That will require a close look at laws, at patent office regulations and funding, and even at the structure and incentives in the court system that handles patent litigation.

What Peer-to-Patent does suggest is that governments and volunteers from around the world can work together to solve problems. Government can become more efficient and respond more flexibly to public needs, while individuals can effectively wield power by working together. Technology is central to the effort. Let's watch this project.

翻译:yuwen

祝贺Peer-to-Patent(公众专利审查)的组织者,这一活动正在完成这个时代关于Internet实践最富创意的实验之一。大量报道渲染贝瑞克奥巴马在总统竞选中使用社交网络技术(以及此前Ron Paul成功筹款),但是这些在Peer-to-Patent面前就显得相形见绌了。

Peer-to-Patent项目的前提(很多人认为是不切实际的)是很多相关技术领域的专家将会聚集到项目网站来读那些专利申请,并且会发现先前技术,从而导致该专利申请被驳回或被限制。

现在它起作用了。非盈利组织PDF format的一份报告给出了一些调查数据,还包括Peer-to-Patent项目第一年处理的专利的分析情况。取样小了些(23个专利)但成果斐然。

首先,人们参与进来了:截至目前超过2000人。其次,他们提交了先前技术:202份。更为重要的是:参与的人喜欢这件事并将再次作为志愿者参加工作。

专利审查员——负责审查申请的美国商标与专利局雇员——也拥护这个项目。他们都说感谢那些提交并喜欢和公众社区开展更多合作。

那么专利审查结果如何?9项被驳回的专利申请引用了由Peer-to-Patent志愿者发现的先前技术。

更重要的是这些先前技术的来源。通常专利审查员驳回一个专利申请时他们总是引用先前专利作为先前技术。对于非发明人想要控制一项已经存在的技术的情况这绝对没问题,但是专利局对于保护那些已经被大家使用的公开信息的情况就有些力不从心了。

所以当Peer-to-Patent项目看到志愿者提交了相对比较高的非专利先前技术百分比,这意味着可以保证自由信息的自由了。

当然在明确Peer-to-Patent是否真正有作用之前我们还需要更多数据。但是它与其他那些Patent-busting之流的项目不同,它是围绕有效的团体参与结构构建的。它并不是那种要用户填写的表单或者Slashdot式的开放讨论页面。它是一个真正的公众社区,有定义清晰的目标和合作精神。

即使制度化了Peer-to-Patent也不能解决关于专利制度的一切问题。那将需要仔细研究法律、专利局规章和资金,乃至处理专利诉讼的法庭系统的结构和动机。

Peer-to-Patent让我们看到政府和来自各方面的志愿者能够一起工作来解决问题。当人们有效地合作时政府将变得更加有效率,对公众需求做出更灵活的反馈。技术是这种努力的核心。让我们关注这个项目吧。

Hacking TCP/IP To Support Location Aware Services

Andy Oram 2008/06/19

I just received a simple proposal (which is usually the best type) from Brian McConnell, an O'Reilly author and old phone hand who has founded several telecom companies. His proposal, which follows, represents a creative linking of the GPS/location domain and TCP/IP. If you thought there was no use for IPv6, read on (but it could work with IPv4 now).

Hacking TCP/IP To Support Location Aware Services

by Brian McConnell

18 June 2008

Most of the Internet services we rely on have prospered because they are based on open standards. The Internet itself would not exist in its present form were it not for open services such as DNS. Location-aware services, while they have great potential, have yet to coalesce around a simple, open standard that encourages an ecosystem of products and vendors to develop around it.

I would like to share a rather simple idea for hacking the TCP/IP protocol (specifically IP addresses) to support location-based services; I'm calling it geoIP.

I'll start with IP version 6 which, as most of you know, creates a very large address space (128 bits, enough to support 3.4 x 10^38 unique addresses, vastly more than we can ever use). Because the address space is so large, we can partition it to create a block of location aware addresses. Here's how this would work.

The 128-bit IPv6 address can be broken into smaller sub-addresses to embed latitude and longitude information within the address. We might use a formula like the following:

Bits 0-8Top Level Address (e.g. FF = location-aware subnet)\
Bits 9-23Latitude (medium resolution, approximately 1/3 mile or ha\ lf a kilometer)
Bits 24-39Longitude (medium resolution, approximately 1/3 mile or \ half a kilometer)
Bits 40-127Remaining 88 bits used to identify devices on subnet

Note: high resolution location information and other metadata would be presented in a separate message, such as an XML file sent during a device-device handshake.

The idea here is to enable devices to automatically search for other IPv6 addresses that are mapped to physical locations. If a device knows its latitude and longitude, it can scan the address space FFLLLLLLNNNNNNNN* to find other devices in the local zone. This can probably be done most efficiently by sending a multicast message that routers repeat to devices that are nearby with the matching public IP addresses (e.g., "I am trying to contact all mobile phones near 37.46N / 122.26W").

What this address space does is to subdivide a larger address into 1/3 mile × 1/3 mile subnets, each of which can accommodate a very large number of devices. We can then find all devices within this zone via a simple peer-to-peer handshake, and filter the results by device type, precise location, and other characteristics.

We could do this with IPv4 using a lower geographical resolution, plus a requirement for devices to reside behind a NAT interface of some sort. In the case of IPv4, the formula could be as follows: Reserve a top level address (such as 254.*.*.*) for geoIP service. The remaining 24 bits are mapped to a 12-bit × 12-bit latitude/longitude address space, yielding approximately 0.088° accuracy, or about a 7-mile/11-kilometer cell size at the equator).

A number of tricks could be employed to reduce the grid size, such as weighting the address space so that cells are densest at low and middle latitudes, while becoming larger in high latitudes, using a simple algorithm to map latitude/longitude coordinates into the IP addresses. The same approach can be used to adjust cell width (in longitude) so it is compressed over land masses, and more sparse over oceans. Compression can also be expressed as a simple equation that translates an angular (longitudinal coordinate) into 12-bit number. With a little bit of fine tuning, it should be possible to adjust cell/grid size so that the average cell is 2 or 3 miles (3 to 5 kilometers) in low/middle latitudes over land masses, yet considerably larger over areas that are, on average, less populated. This is obviously not as simple as the IPv6 approach, but it is not very difficult either, and should work fine.

In both cases, we do not need to embed super high resolution position data in the IP address itself. What we want to enable is a simple peer-to-peer method for discovering nearby devices through direct interrogation. If I know I am at an IP address that maps to 37.5N / 122.25W, I can start scanning the nearby address space in expanding circles to see who else is around, or send a multicast message to the geoIP subnet asking devices to report back to my static IP address. The geoIP address may change frequently, which will update the device via DHCP.

The idea here is to create a vendor-neutral method for device discovery and search. A device does not need to know anything about the other devices attached to the network except the formula used to embed location information within IP addresses.

These devices can then expose a simple web server on a designated port at the location based address. Other devices, upon discovering active devices in their local zone, can fetch pages or XML files that contains additional information about the device, its capabilities, etc.

The combination of these techniques would make device discovery and search open and vendor-neutral. Once devices have discovered each other, they can communicate using existing Internet services, such as email, XMPP, SIP, etc. via conventional IP addresses. The key idea is to create a peer-to-peer discovery mechanism that is not dependent on centralized services or proprietary vendor APIs. With that in place, any device should be able to find and talk to other devices within a cell or group of cells defined by latitude and longitude coordinates.

Appendix : Sample Algorithms To Map Location Based IP Addresses

IPv4 Latitude

Lat = 12-bit integer ranging from 000 (-90/90S) to FFF (90/90N)

Resolution can be improved by using a non-linear mapping algorithm that emphasizes low and mid latitude addresses.

IPv4 Longitude

Long = 12-bit integer ranging from 000 (-180/180W) to FFF (180/180E)

Note: because latitude covers a 180-degree range while longitude covers a 360-degree range, and because we can reasonably assume we don't need high resolution cells above or below 66° North or South, we could specify 11-bit addresses for latitude (approximately 5.85-mile/9.4 kilometer resolution, better with non-linear mapping), and 13 bit-addresses for longitude (2.9-mile/4.67-kilometer resolution).

IP Address Assignment Via DHCP

Because a device may be in motion, we should assume that geoIPs are dynamic, with a short time to live. GeoIP-enabled devices would obtain a second location-based IP address from a DHCP server. DHCP already recognizes an optional DHCP client ID, which can be used to transmit location information in a dotted format (e.g. mad_address . latitude . longitude). With th\ is information, the DHCP server can then assign a public or NAT'ed IP address that appears to other devices as a geoIP.

Ignite Boston shows the way to beat commerce interruptus

Andy Oram 2008/05/30

I felt like was I drifting back to the dot-com boom last night during Ignite Boston. Movements that I saw getting stalled seven years ago seem to be finding their way forward again.

Ignite Boston, a party held every few months by O'Reilly, draws people from around the region who are interested in technology and socializing. Last night, the approximately 325 attendees packed two floors of a bar, and it's a good thing the street outside was closed off because there were plenty of celebrants out there as well, escaping the noise inside to have a conversation.


All the formal talks were intriguing and delivered well. Several could be filed under the category "socially beneficial applications of Web 2.0." For instance, HealthMap, which tracks reports of disease outbreaks around the world, serves as an important resource for the Center for Disease Control and government agencies. CO2Stats determines how much your web site contributes to global warming, estimating your energy usage as well that of your visitors and the networks they traverse.

Microsoft's WorldWide Telescope makes it so easy to create a video of space objects that a sophisticated six-year old can use it. This falls under the category of "make science exciting" projects I praised in a blog about Maker Faire.

Other presentations recalled the experiments O'Reilly documented in the book Peer to Peer. Here's where I felt technologists were picking up again on the themes of the dot-com era.

Tool developer Jesse Vincent is promoting a distributed database system called Prophet as a way to break out of the walled gardens maintained by portals and social networks. His idea is that those services disempower their users by holding on to their data, and that users can create their own networks without giving up control.

Noting that the popular Twitter site goes down from time to time (including this week), causing all twitterers to be disconnected during such periods, Joe Cascio proposed a Distributed Twitter service based on communicating servers. He compared it to the distributed server approach in Jabber (XMPP). His diagrams also reminded me of the superpeer approach added to Gnutella as it grew.

Our Ignite Boston events regularly fulfill their goals, one of which is to show that Boston has a lot of inventive technologists doing cool stuff. I think such projects, nationwide, will pull us out of the slump that left so many dreams in the bit bucket after 2001. The question is whether the upcoming recession will trash the tech recovery. But I don't think it will.

The costs of developing software tools and web presences have come way down since 2001, thanks to advances in infrastructures. Open source projects and peer production (which I highlighted in an article two weeks ago) lower the barriers to successful projects even more.

The recession can actually inject new life into small-scale projects. Knowing that some paid jobs are out of reach, people may turn to open source and do things that seize their imaginations instead. (The dearth of computing job opportunities that Europe provides, relative to North America, is often credited for the greater participation in open source projects there.)

People are also turning away from the pursuit of glossy fashion and unnecessary material things. They are taking to heart the realization that consumption for its own sake is bad for the planet.

And they might be reading the psychological studies showing that you're happier if you spend money to help somebody else than to buy something for yourself. When we all learn this, advertisers will turn from glorifying luxury and envy to urging investments in social causes.

Commerce will continue, and it will be better commerce. We'll still enjoy seeing people such as Shava Nerad--who has given so much of her career to helping the world through Tor and other projects--express a child-like glee to find herself earning money from a machinima project she started with friends for fun. We'll share more of what we have, and appreciate it more too.

The wiretapping accusation against P2P and copyright filtering: evidence that we need more user/provider discussion

Andy Oram 2008/05/24

I would by no means argue with celebrated law expert Paul Ohm when he suggests that cable companies and other ISPs might be breaking the federal wiretap law by doing deep packet inspection. This was the recent news from a WIRED reporter blogging from Computers Freedom & Privacy.

I will leave it up to the lawyers to decide whether the wiretap law was passed with the intent to keep providers from reducing traffic that strains their bandwidth, or from complying with requests from movie studios to prevent the unauthorized exchange of first-run films. I'll also let lawyers decide whether the ISPs are shielded by exemption that allows them to protect their service.

But I can't help observing that the same kinds of deep inspection that Ohm decries (and that permits China and other governments to censor content) is also used for spam and virus filtering. Superficial traffic analysis could perhaps, someday, identify spam and viruses, but it's currently critical to check for the signatures of malicious content. Would Professor Ohm like to personally handle the 2000% increase in email he'd get if he forced his ISP to stop filtering?

On the other hand, I wonder whether web mail services such as Hotmail, Yahoo! and Google would be guilty of wiretapping if they check traffic. After all, they are not delivering traffic to another system as Comcast is; they are terminating the traffic on their own systems, where their users access it. I'd think they have a much stronger defense, partly because the data is technically on their own systems, and partly through the claim that they need to run filters to protect these systems from viruses, or even just excessive traffic.

These dilemma suggest to me that the relationship between ISPs (or mail service providers) and customers has to change, and perhaps that the wiretap statute has to adapt. What we want is that most perplexing of legal solutions: to screen out malicious behavior and impacts that users don't like, while leaving positive and desired behavior alone.

Many have called on providers to publish (at least in broad terms) what kinds of filtering their doing, and to make it explicit parts of their contracts with users. To extend this idea, users could explicitly request what they want blocked.

It could be done on a fine-grained level; for instance, you could implicitly grant your ISP a right to filter out Korean messages (assuming you don't understand Korean and consider the messages spam) by checking a box on your service agreement that says, "Please block anything containing Korean characters." Or it could be done on a more coarse-grained level, by granting your provider the discretion to look for viruses.

Laws regarding notice and consent would make it harder for providers to toss in practices that users don't want. They could still do so by insisting on it as part of their contracts. My suggestion is that we revamp our philosophy about filtering. That would still leave the difficult task of balancing adequate notice and consent with the need of ISPs to respond with agility to every-changing conditions.

Yochai Benkler, others at Harvard map current and future Internet

Andy Oram 2008/05/15

Harvard's world-renowned Berkman Center for Internet & Society is celebrating its tenth anniversary with a conference called Berkman@10. I'll report here on today's sessions, which were organized as a fairly conventional symposium (although as loosely as one could run it with 450 attendees). Tomorrow will be set up as an unconference, where the audience defines most of the topics and self-organizes into small-group discussions.


Whither peer production--wither peer production?

The Internet is not monolithic--as speaker after speaker today recounted--so it's not fair to expect an organization studying it to be monolithic either. The Berkman Center is diverse to the point of being hard to characterize, as I'll detail later, but one theme that echoed through the day was the collaborative production of value, or "peer production" as the economically-minded like to call it.

Yochai Benkler, author of key Internet analyses such as The Wealth of Networks (available in both printed form and as a PDF) and Coase's Penguin, or Linux and the Nature of the Firm, set out his stake near the beginning of the day when he called on the Center to move from creating tools to examining social change, such as what it's like for dissidents around the world to be able to work together.

Wikipedia is a central piece of evidence in Benkler's case--along with Linux, a connection I'll explore later--and also forms Exhibit A in the recent book The Future of the Internet (And How to Stop It) by Jonathan Zittrain, cofounder of the Berkman Center and keynoter at today's conference. (I won't cover the keynote because I have already reviewed the book at length.)

So it's quite in keeping that Jimmy Wales joined Benkler in an afternoon session at the conference. Benkler laid out the traits distinguishing both the process and product of peer production from what we're used to getting in the market: peer production is unpredictable, unstable, loose, and people-driven. Wikipedia matches those criteria so well that Wales admitted its fate is still up in the air.

A member of the audience described how science--another, very different culture of peer production--has been corroded during the past few decades by commercialization and well-meaning over-regulation (which he did not specify), then asked whether Wikipedia's turn for regulation by government will come. Wales didn't address the precise question, but admitted that "humans will eventually screw up Wikipedia just as we manage to screw up everything else," yet promised to keep it true to its mission as long he can.

The challenge for peer production, according to Zittrain, Benkler, and Wales, is to avoid seizing up and imposing new controls when things go wrong. Instead, one must learn to deal with the damage through side channels.

Money is a secondary question. Benkler pointed out tanies now pay employees to contribute to peer-produced products (Linux, where most development is now done by the employees of various companies, provides an obvious example) and said the ultimate impact of this trend is unknown. Possibly, the current in-rush of volunteer labor is a temporary phase in the evolution of peer production. In any case, we should carefully examine the process so that the projects can remain fair and keep people motivated when some are paid and some are not.

Besides production for money and production for fun, I noticed another motivation for contributions when Wales mentioned that India has become an increasing source of Wikipedia pages as computers and Internet access spread there. I sense that regional and cultural pride can drive many efforts--the feeling that "if that city over there can do it, why can't we?" This suggestion shows the need for models, which I'll describe in another section.

The conscious commons

Although Wikipedia was described by conference participants from many angles--as an amazing example of volunteerism, a triumph of people's ability to resolve conflicts, and so on--I think one key trait has gone unremarked: Wikipedia has reached such a high level of value that participants are willing to put its success above any other considerations. No matter how much someone desires to express opinions, they know that fighting hard enough to damage the entire venture would be counter-productive. So people usually settle among themselves. In other words, they are conscious about protecting their commons.

Think, as a metaphor, of a town commons where people not only graze their cattle, but water the grass and spread around the manure so that it's properly fertilized.

As civilization develops, we tend to get lazy about maintaining our commons. We discover the benefits of turning functions over to large companies (they gain efficiencies from scale and from the use of professionals) or to governments (who provide transparency and equitable distribution of resources, when done right).

This trend is not limited to advanced economies. Esther Dyson reported that, after she encouraged Internet users in one developing African nation to share wireless networks with their neighbors, someone complained to her that it was up to the government to reduce costs and provide wider Internet access.

But nowadays, sophisticated manufacturing methods reduce gains from scaling, and the benefits of training ordinary people to perform useful roles outranks the value of employing a small professional elite over and over. In fact, Benkler mentioned that learning is a key part of peer production and a driver of its success.

Meanwhile, transparency has become more available to all actors, including governments, through communications networks.

So we've started to turn back to ourselves in order to support what we hold in common. The modern equivalents of barn raisings are the groups in the 1990s who came together to wire their local schools (before WiFi made that less necessary), or people who made the news recently by installing solar panels on each others' homes.

Most notable, for the sheer size of the effort, is the self-mobilization of communities, both locally and nationally, in the wake of the Hurricane Katrina floods and the failure of government response. My own synagogue has sent two building teams to the New Orleans area over the past two months; hundreds of others have made similar donations.

The remaining problem to solve is the equitable distribution of resources mentioned earlier. For this, peer production and the conscious commons have to go global. We need to feel an immediate connection to all creatures around the globe--and that leads to the most audacious proposal that came up at the Berkman Center today.

What will a Harvard for six billion people look like?

This bold discussion began with a taunt lobbed into the arena from a surprising corner, former FCC chair Reed Hundt. In my opinion, Hundt's tenure in the mid-1990s stood out for the FCC's recognition that its landscape would be overwhelmingly changed by the evolution of networks and the media transferred across them, but in the end proved too timid and compromised to pursue the implications of these insights.

There was no timidity, though, to Hundt's proposal that well-endowed universities such as Harvard help the six billion people who are now deprived of the education they need to make a decent living.

Charles Nesson, a famous attorney and cofounder of the Berkman Center, picked up the tune without missing a beat. He talked of the enormous amount of high-quality online material that Harvard is making available. The Faculty of Arts and Sciences recently voted to open access to all scholarly articles, and the law school soon followed suit.

(Ironically, Harvard is one of the few Boston-area colleges that doesn't allow the public into its brick-and-mortar libraries. This policy is understandable though, because to make them open would overwhelm them with the throngs of odd creatures that circulate among the literate classes who inhabit Cambridge.)

Dyson then interrupted with an astute distinction between content and helping people to teach themselves, a goal that is people-intensive and requires a lot of side activities such as making sure children have enough to eat.

Nesson countered by saying that the goal was not to provide sterile content, but to provide content that would stimulate children's interest and sense of play, which in turn would lead to a peer production of education.

As an aside, Dyson mentioned an invention she saw at Microsoft's Bangalore facility. Using software that allows a USB port to be multiplexed, one computer through a single USB port can support up to eight mice. Thus, eight children can play a game or manipulate items on a screen. This doesn't turn a Windows system into an XO (One Laptop Per Child) network, but it's an advance for needy communities.

Designing for cooperation

There was much worth retelling in Benkler and Wales's session, although a good deal of it can be found in other works of theirs. Benkler pointed out that our economic system is designed around the notion of human beings as "selfish rationalists," but that no society ever studied has many people who actually behave that way. He said at most 30% are primarily motivated by material rewards.

Now, as we know, the people so motivated can be extraordinarily productive, creating some of the most important technological changes in history. However, we also know that a large part of that 30% lie, cheat, and steal. Anyway, Benkler seems ready to try something different.

His talk involved questions instead of answers--a research agenda rather than a curriculum. He suggested we draw on the disciplines of organizational sociology and experimental economics. He laid out the intrinsic motivations we want to encourage for peer production--solidarity, empathy, trust, fairness--and started an exploration of extrinsic motivations.

The extrinsic motivations include rewards and punishments, along with transparency. The latter leads in turn to reputation systems, which embody twin goals: control (so others know whom to trust) and motivation (because contributors expect future rewards).

Wales, in a private conversation, demonstrated the cooperative spirit in his comparison of Wikia Search (which I described in an article yesterday) with traditional search engines. He said we tend to place too much faith in algorithms. Good algorithms are certainly valuable--particularly in searching for the long tail, as when someone knows only a few phrases in a book or song--but don't have to be sophisticated enough to prevent all gaming of the search engine.

"If the community decides something is spam, they can simply block it outright," he said. You don't have to insist on creating a search algorithm so smart that it pushes spam down in the results list.

In his presentation, Wales said that a majority vote is not enough to ensure quality content. If only 70% of editors like a Wikipedia page, something is still wrong with it. Therefore, discussion continues until everybody is happy except a few unreasonable people who are usually disruptive in other ways as well. He said that unanimity is not the goal, but consensus.

His contrast of unanimity and consensus struck home with me, because the exact same distinction (using the same words) is made by community organizers in the international network created by the historic Industrial Areas Foundation created by Saul Alinsky in 1940. As a volunteer for a local community organization, I know its power to build consensus as well as its success at building power.

And it's worth nothing that Barack Obama spent years as a community organizer with the IAF, while Hillary Clinton wrote a thesis on it as a young student (and turned down a job offer from Saul Alinsky). Someday, community organizing experience could well become a prerequisite for a management job in any business or government position.

Seeding and modeling

Another aspect of peer production escaped discussion today. Zittrain mentioned in passing that Wikipedia began as a set of comment forums on Nupedia, which had reached the limits of its growth with seven articles from paid experts. Small though this starting point was, I believe these seven seeds were critical to show what could be done and give volunteers models to emulate.

Consider also that a worldwide, Internet-based development effort on an operating system could not begin (although all the variants of BSD were produced by volunteers using more traditional team methods) until Linus Torvalds seeded the effort by publicizing his budding kernel.

So peer production requires models. Not coincidentally, Dyson said that social change also requires what she called models for courage. An Asian journalist from the audience claimed that most Chinese Internet users think government censorship is a good thing. People will push for change--but most of them to see someone else start.

The dilemma of openness

"Open" was probably the most frequently uttered word of the day. Nesson, in his opening remarks, chanted of "open talk, open access, open education..."

But the problem, as I explained eight years ago, is that openness, in a context of unequal power, just puts more power in the hands of those who already hold it--those with the guns, the funds, or other ways of controlling the public agenda.

In small ways, blogging and efforts such as the Sunlight Foundation (represented by its head Ellen Miller at the conference) take power out of the hands of its current possessors and distribute it more widely among the public. The Internet can also help democratize fund-raising, as both the Ron Paul and Barack Obama campaigns proved. But there is still a lot more that governments and large institutions can do with information than ordinary people.

The Berkman Center broadens

Although Berkman's ten-year anniversary formed the occasion for this conference, celebrating the anniversary was not its main goal. Thus, I saved a description of the Berkman Center for the end of this article.

As I mentioned earlier, the center far from monolithic. It is a conglomeration of many people, both lawyers and non-lawyers, who study the Internet and add their efforts to empower its users.

Legal studies of the Internet were by no means a new field when the Berkman Center was founded. In fact, current director Terry Fisher says such studies were already a "fad." The Berkman Center is distinguished in many ways, such as by its independence (although it has corporate sponsors in addition to Jack N. and Lillian R. Berkman's gift) and the caliber of its professors and fellows.

But in my opinion, the most salient contribution of the Berkman Center is its devotion to new research instead of pure theory. (Another such research center is Do Tank.)

At the conference, one example of this valuable approach was a fascinating visualization of blogs in Iran--the fourth largest blogging community in the world--and of which sites are blocked by the Iranian government. As one would expect, most blocked sites are written by secularists, reformists, and ex-patriots.

It's also impressive, however, that most sites by these groups are allowed through the government's filters. Too much blocking, as the Berkman researcher said, would lead to a loss of legitimacy for the government.

Among the major research and production activities at Berkman are:

  • The OpenNet Initiative--a tracking system that reports government censorship worldwide
  • Global Voices Online--a blog for people who previously had no way of reaching the public outside their nations
  • StopBadware--a service that recognizes infected web sites and (in cooperation with Google) interpolates warnings when users try to visit them

These projects go far beyond the field of law, and in fact, law school head Elena Kagan announced at the conference that the center was moving outside the law school to become a general Harvard institution.

The Berkman Center also exemplifies the openness they speak about. Long-term and temporary associates mingle with invited guests and passers-by. If you're in the Boston area and are interested in where digital networks and media are heading--technically as well as politically--Berkman events are among the best places to spend your time.

Google Friend Connect and limits to sharing

Andy Oram 2008/05/14

We're all tired of acquaintances tugging on us to sign up for new social networks, and of the torque we feel bouncing between the networks we're on if we can't resist the herding instinct that brings us to join them. But we wouldn't want to have just one big social network, either. That would inhibit innovation and prevent people from enjoying a site's special features and cultural uniqueness.

Google's Friend Connect, which was announced on Monday and covered by Radar as well as other sites, represents a small step toward a middle ground. It could be considered the natural succession to Google's OpenSocial, also discussed extensively on Radar. The OpenSocial API forms the basis for communications between Friend Connect widgets and the site hosting them, using lightweight Ajax and JSON protocols. Friend Connect uses the APIs provided by other sites for communication with them.

I had a little tour of Friend Connect last night at the party celebrating the opening of Google's new Cambridge office, covered in another blog.

Previously, if you wanted to advertise a cool video and ask people to pass it on to all their friends on Facebook, you'd have to be a member of Facebook and post the link on Facebook (or ask your friends to do all the heavy lifting manually). Now you can put the notice on your own site and still leverage the powerful viral information spreading features of Facebook--and Orkut, and Yahoo!, and other sites whose APIs Friend Connect supports.

There's still some friction preventing this publicity machine from attaining perpetual motion. You have to individually invite each of your friends (even if they're already connected to you on Facebook, Orkut, etc.) to your new site, and they have to explicitly log in to your site, although OpenID reduces this to a one-time step.

This explicit signing up is probably a design choice, negotiating the traditional tension between sociability and privacy. The easier Google made information sharing, the more risk there would be of having it take place when someone doesn't want it.

The same can be said for the initial disappointment some reviewers expressed that Friend Connect doesn't do more. They were hoping it would allow seamless access from any web site to data and API functions on various social networks.

But to do so would mean mingling your data and social networking functions fully with any site that supports a Friend Connect widget. Once you logged in to a friend's site, it would be able to do anything it wanted--and attacks would soon materialize that exploit innocent people's sites. A compromise to a single provider hosting a few hundred web sites could quickly become an epidemic of data theft.

Instead, each Friend Connect runs within an IFrame, so that data from the social networks are not available to the surrounding web page. By logging in, you still trust the widgets, but this is the same level of trust (which some people don't have) as accepting a Facebook application.

One way to allow more sharing might be to expose data and functions between sites but provide immediate feedback to each user about running processes and data transfers. This would require substantial changes to web architectures and interfaces, and I'm not sure what it could look like.

I also doubt that we could achieve seamless integration of social network functions because different networks offer different features, backed up by different architectures and data structures. Sharing among networks is limited to certain features they have in common, just as cooperation between different programming languages is limited by the interfaces they expose to let functions in one language call functions in another.

Although I think an explicit login is a good security feature, I fear it will hold back adoption of Friend Connect. It becomes tedious to log in, even once, to every page someone invites me to. On the other hand, it's an acceptable burden if I really intend to leave ratings and comments, play games, or engage in the other activities offered by widgets

Maker Faire mimesis and open speculation

Andy Oram 2008/05/03

O'Reilly's Make magazine and the Maker Faire that we're hosting today and tomorrow in San Mateo, California have been described in many ways, ranging from a revival of the mid-20th-century love for Popular Mechanics magazine to an exciting new impetus for teaching children about science. During my six hours there today, I noted its strong connections to powerful and fundamental human urges toward creation, mastery, and the reproduction of our own culture.

Some of the Maker Faire centers are devoted to the kind of do-it-yourself projects shown in our magazine. Anyone from a four-year-old to a mechanically adept adult can find challenge and satisfaction at these tables. Projects in another building took a big step up, showcasing the brain children of engineers who devoted their spare time to building games and toys or aiding their communities with research projects. A number of the booths seemed to be run by Renaissance men and women who were making a living from their creative combinations of art and technology.

In this regard, I found many science projects at Maker Faire more aesthetically satisfying than the self-consciously mind-altering artworks I've seem at some contemporary art shows. Many artists seem to lose their intuition for balance and beauty when trying to make a point, and their explorations of the promising channels offered by technology can end up clogged in its pipes. There is some computer-generated and networked art that is beautiful, thought-provoking, or both, but I'm been disappointed too often by art shows. Maker Faire focused on the fun first of all, the achievement second, and the aesthetics third. Ironically, this worked better.

The difference between the more modest DIY tables and the advanced displays were like the difference between shooting off a toy rocket and planning a trip to the moon. Both of the latter activities were represented at the show, incidentally. I talked to the lunar project, which had already produced a tiny rover robot and was competing for the . They offered attendees the chance to record a message to leave on the moon, using a solid-state storage chip. I asked what database they used, expecting something such as BDB or Derby, but found out it was good old MySQL. So I wrote a message saying that I hoped relational logic was consistent throughout the universe.

Maker Faire is a string-and-duct-tape combination of O'Reilly's, Emerging Technology, Open Source, and Money:Tech conferences. It features a fair number of expected hacks, such as a 1956 Ford Truck retrofitted with a Navy boat diesel engine and upgraded to run biodiesel, or an industrial-sized version of the old Diet Coke and Mentos fountain. But it's core commitment to pushing the boundaries of science and engineering are clear, and many of the satellite booths cover such topics as organic gardening and solar energy. It also showcases people reviving obsolete technologies such as blacksmithing. The very first Make project was there (a camera suspended from a kite to take aerial photos), right next to a more formal and sophisticated approach that has been on sale since 1989.

The open source facet of Maker Faire comes in the publishing and teaching of techniques. It's a kind of shared speculation about the future and what we could all do if we tried. The ultimate impact, like the free software movement, is to enhance everyone's mastery of their environments and both the tools and the confidence for solve one's own problems.

This kind of training is particularly important for children, who get turned off from science early in conventional schooling and rarely even encounter the joys of engineering. O'Reilly's Make division is involved in many projects, at Maker Faire and elsewhere, to change the way children learn science. This process--which reflects the way most of the great scientists became their mature selves--can not only increase the number of scientists and engineers, but alter the kinds of scientists and engineers they are.

And as a movement, Maker Faire offers a complete social and business environment. One building was given over to companies offering DIY tools such as laser cutters.

As MIT professor Neil Gershenfeld wrote in his book FAB: The Coming Revolution on Your Desktop--From Personal Computers to Personal Fabrication, the spread of DIY knowledge internationally can let people in communities everywhere create the tools they need to build their economies and fix their environmental problems. Maker Faire stands at the center of a movement that can save the world.

If that sounds grandiose, let me argue that there is no shortage of grand ideas at the show. I was struck by how many Maker Faire participants loved to create images of people, animals, or (especially in the case of the fabulous Flaming Lotus Girls (who are not all female), plants. Many of them (including again the Flaming Lotus Girls) also have a fascination for setting their creations on fire or blowing them up in other ways. Thus do the intensely inspired tinkerers show their awe toward the universe's most intense creative and destructive powers.

Another psychological grounding for many of the projects was mimesis, a Greek word often used to describe the attempts of artists to reflect reality. Maker Faire participants loved to use new and idiosyncratic materials to build familiar objects, or the reverse.

As an illustration, one of the most popular and highly visible projects was a hundred-foot wide, fifteen-foot tall reproduction of the old children's Mousetrap game out of spare parts and discarded planks. The mad scientist behind the whole thing called it both Weapon of Mouse Destruction and Life Size Mousetrap. The latter was an understatement, because the scale was more on the size of humans than mice. Unlike the original game version, the Life Size Mousetrap almost always works, presumably because its creators are truly trained engineers and the larger scale and masses allows them to calculate the components' behavior accurately.

As I already explained, many of the Maker Faire exhibits were artistic as well feats of engineering, so it was fun to see the Life Size Mousetrap accompanied by Esmerelda Strange, the one-woman band, and a cat-and-mouse skit.

I can't hide the pleasure I had today at Maker Faire; it was perhaps the most effective combination I've ever seen of fun, education, and appreciation for a job well done. It must be thrilling for people who have spent evenings and weekends for the past fifteen years working on some project with intense personal meaning to be able to show it off to thousands.

The 50,000 expected visitors to Maker Faire probably add up to more people than ever read all the books I've edited for O'Reilly in my fifteen years here. Of course, several of my books have had ripple effects through society, as Maker Faire does. But to anyone who's attended, seen what it does for children, and felt its effects on oneself, there's really nothing more to say.

Book review: "The Future of the Internet (And How to Stop It)"

Andy Oram 2008/04/14

Most of us in the computer field have heard more than our fill about the free software movement, the copyright wars, the scourge of spyware and SQL injection attacks, the Great Firewall of China, and other battles for the control of our computers and networks. But your education is stifled until you have absorbed the insights offered by comprehensive thinkers such as Jonathan Zittrain, who presents in this brand new book some critical and welcome anchor points for discussions of Internet policy. Now we have a definitive statement from a leading law professor at Harvard and Oxford, who combines a scholar's insight into legal doctrines with a nitty-gritty knowledge of life on the Internet.

You can read Zittrain for cogent discussions of key issues in copyright, filtering, licensing, censorship, and other pressing issues in computing and networking. But you're rewarded even more if you read this book to grasp fundamental questions of law and society, such as:

  • What determines the legitimacy of laws and those who make and enforce them?
  • What relationship does the law on the books bear to the law as enforced, and how does the gray area between them affect the evolution of society?
  • What is the proper attitude of citizens toward law-makers and regulators, and how much power is healthy for either side to have?
  • How can community self-organization stave off the need for heavy-handed legislation–and how, in contrast, can premature legislation preclude constructive solutions by self-organized communities?

Core questions such as these power Zittrain's tour of technology and law on today's networks. “The Future of the Internet” takes us briskly down familiar paths, offering valuable summaries of current debates, but Zittrain also tries always to hack away at the brambles that block the end of each path. Thanks to his unusually informed perspective, he usually–although not always–succeeds in pushing us forward a few meticulously footnoted footsteps.

Zittrain has summarized the points in this book in an online article, but reading the whole book pays off because of its depth of legal reasoning.

Informed recommendations

One of Zittrain's most applicable suggestions–and one that exemplifies the positive philosophy he brings to his subject–is his solution for handling computer viruses. Currently, non-expert computer users are either helpless in the face of viruses or employ inadequate firewall products that block useful programs along with infections. When Internet service providers scramble to block malware at the router, proponents of network neutrality complain that they're violating the end-to-end principle. The dilemma seems unsolvable.

Zittrain cuts the Gordian knot by suggesting user empowerment. Experts who know how to track and identify viruses or spyware can label them as such, and less expert users can check ratings on every download. Tools are urgently needed that aggregate widely distributed ratings and present them to users in a very simple screen of information whenever they initiate something potentially dangerous. (Zittrain cites, as a model, the partnership between Google and the StopBadware project run by his colleagues at the Berkman Center.)

Users could have a choice of proxies to help them decide what on put on their computers. Additionally, instead of politely hiding network activity from users, mass-market operating systems can show the information in a manner that is easy to grasp, so that the user has a clue when the computer is at risk of turning into a zombie. Zittrain would probably be gratified by a simple security enhancment recommended in the Febuary issue of Communications of the ACM: a suggestion that a wireless router notify each host using the router how many hosts are currently using it, so that wardriving could immediately be detected by users.

Other people have suggested distributed self-defending security systems, but Zittrain links the whole endeavor to the hope provided by the Internet's ability to bring together people who shared positive goals. If software vendors and Internet security researchers gathered around this vision, a self-interested and self-organized community could protect itself, with more able members educating the less able ones.

As an alternative to restrictive software that sinks roots deep into the operating system and locks down computers, such tools could actually improve Internet users' knowledge and sense of community while putting a dent in identity theft, spam, and distributed denial of service attacks.

Throughout the wide range of topics described in his book, Zittrain looks first to technically powered solutions that unite people of good will and encourage potential malfactors to renounce anti-social behavior. But his tone lies far from that of cocky cyberpunk hackers who boast that their technological solutions can protect them from all cyberharm (and damned be less savvy cybercitizens). Zittrain is too good a lawyer to dismiss the power of governments, or to assume that such power can only be oppressive. Thus:

  • He calls for a new Manhattan Project that would draw in government, research institutions, and individual programmers to solve the afore-mentioned malware problem.
  • He allows that the government should be allowed a lower threshold for access to financial data than access to other personal data.
  • He suggests regulation to enforce data portability, so that user data stored by online services could be retrieved by the owners when they wanted to switch services or when the services failed. (This is the online equivalent to the historic endorsement of open office standards that has been passed by governments in several countries and was nearly hatched in the state of Massachusetts, before a careless legislature ran an off-road vehicle over it.)

Zittrain is not a fan of network neutrality as most proponents describe it, but he sympathizes with the end-to-end principle and would like the principle of neutrality applied to APIs offered by web services such as Google's. If web service providers claim that their data is available for creative uses by outsiders, they should not be allowed to arbitrarily cut off those outsiders that happen to be competitively successful or disruptive to their business models.

I find this recommendation particularly intriguing, because the promising area of web services is currently fraught with uncertainty that's clearly holding back socially beneficial uses. Traditional PCs seem a rock of stability in comparison to the services exploited by modern web services, which vendors can whisk away like apparitions in the night.

You probably know, from such scandals as Yahoo!'s cooperation with the Chinese government in tracking down dissidents and Microsoft's release of search data for a “research project” at the Department of Justice, that data stored at an online service is intrinsically less secure than data stored on your computer. But did you know that the law itself in the U.S. grants substantially less protection against search and seizure to your data when it's stored at a service? Zittrain's elucidation of this legal limbo, although it demands close reading, is a valuable window into the issues of technology and policy for lay readers.

Concerning medical privacy, in particular, the World Privacy Forunm noted in a February report (PDF) that personal health records stored by generic organizations such as Microsoft or Google are not protected by the Health Insurance Portability and Accountability Act (HIPAA). Therefore, the records will probably be fair game for subpoenas in divorce cases, lawsuits, etc. The individual also has fewer rights when trying to correct entries.

Well, I've given you the quick tour of Zittrain's book, which is like doing the Smithsonian National Museum of Natural History in an hour. Now we'll meet back in the lobby by the elephant statue, as it were, and examine the key concept that runs through his book.

Generativity: the new battle cry

We've all heard so much in the past decade about “innovation” that I'm in danger of having my readers snap the browser tab shut on this web page when they see the word. (I remember when the fingers-down-the-throat word in the business world was “synergy.” That word finally disappeared along with the businesses that invoked it to justify their mergers.)

Zittrain has coined a term that captures with more richness and potential what's happening in our economy: generativity, a measure of how many new, unexpected, and (occasionally) useful things can be developed thanks to an available platform. He lists a number of famous generative technologies, ranging from duct tape and Lego bricks to the all-time heavyweight champion of generativity, the core Internet protocols. But the effects of the Internet are predicated on many other generative technologies that have contributed to the wave of innovation over the past fifteen years or so:

  • Personal computer hardware, which accepts an unlimited variety of devices
  • Personal computer operating systems, which let ordinary consumers load any program that's compiled to run on them
  • Free software, which encourages infinite extensions

The boon of generativity is threatened in two major ways: network restrictions and locked-down devices such as the Xbox, TiVo, and iPhone, which Zittrain calls tethered appliances. The network and the endpoint are symbiotically linked in their power: freedom in one can help keep the flame of freedom burning on the other, while correspondingly, dousing the embers on one can dim generativity on the other.

Appliances are not bad. The Xbox, TiVo, and iPhone have their place, and Zittrain points out that even the trenchantly open One Laptop Per Child system embeds a trusted computing substrate called Bitfrost that combines digital signatures, sandboxing, and mandatory access controls to prevent downloads from harming the system. Unlike trusted computing platforms in proprietary products, Bitfrost can be overridden by a sophisticated user, but requires a BIOS reflash.

The degree to which a system is “appliancized” is inversely related to its generativity. We need to make sure that at least some of the population can preserve generativity in order to create technology at new levels. Furthermore, everyone needs generative systems in order to prevent vendors from choking off mass adoption of innovations.

Many of the Internet's dangers stem from the attributes of a good generative system. Zittrain, in addition to highlighting about ease of mastery and accessibility, points out that a highly generative system makes it easy to transfer capabilities from highly sophisticated developers to untrained users. This is not entirely sweet. For instance, security guru Bruce Schneier has repeatedly pointed out that easy transferability is the bane of Internet security.

It's bad enough, Schneier says, that systems inevitably contain bugs that can be fatally exploited by top-notch coders and cryptography experts. What really threatens the Internet is that these experts can bundle the exploits into kits that script kiddies can download and use with minimal education. Sharing tools that perform intrusions is not in itself malicious; these tools are important for system administrators, programmers who reverse engineer applications (another skill with both good and evil applications), and other users. But the practice definitely swells the number of malicious programs foraging the Internet for victims.

Once we accept the value of generativity, technical solutions can allow us to preserve it while protecting ourselves from the bugs and intrusions that it makes us so easy to succomb to. For instance, instead of adopting a fortress mentality, public libraries and other institutions could run virtual operating systems on computers they want to protect. In our homes, our computers could have one operating system open to experimental applications (and instantly reloadable if compromised), side by side with another that is locked down. This would allow ordinary people the same generative freedom as programmers, who typically maintain work platforms and development platforms.

Value at the fringe

Among Zittrain's most alarming insights is how calls for a safer Internet, and for one more friendly to copyright and trademark holders, can feed into general governmental control ovehttp:

Every business has suffered from the hammerlock of a new computer system that turns out to prevent employees from making the tiny exceptions to rules that previously allowed smooth operations. Perfect control on operating systems or the Internet could cause similar disasters, which range from the added costs of DRM in schools to clamp-downs by repressive regimes. Zittrain lays out several interesting legal considerations that aren't usually raised, overtly in defense of deliberately leaky enforcement regimes.

Concurring and dissenting opinions

I should mention before going further that Zittrain showed me an early paper on the subject underlying his book, and cited me in his acknowledgments as one of the people whose conversations with him influenced the book. Had I the chance to discuss the following issues with him, I would have advised a few changes to the text.

The intractability of privacy violations

Zittrain's last chapter focuses on privacy, which is widely understood to have passed a threshold in the past few years. Given cell phone cameras, the complex data-sharing services on popular social networks, and other tools in the hands of ordinary computer users, privacy can now be violated by irresponsible crowds in addition to large companies and governments.

First, I think Zittrain exaggerates the shift. If he believes that government and corporate abuses are now only a tiny sliver of a larger problem created by peer production on the Internet, I wonder whether he's ever been barred from an airplane by the TSA or denied coverage by an insurance company.

But the problems he points to in privacy-violating activities that have suddenly become everyday behaviors–such as tagging photos on Flickr with people's names–are real. He tries to apply lessons from an earlier chapter focusing on the checks and balances that make Wikipedia successful. Unfortunately, I think the analogy is weak.

Wikipedia, as Zittrain points out, remains a centralized institution under the ultimate control of one man. Authority fans out from creator Jimbo Wales in an admirably broad and flexible spread, but creativity and control at each level depend on the backstop provided at the next higher level. I agree with Zittrain that some of the solutions found here can be translated to the wider and wilder Internet, but in the area of privacy I don't find the analogy persuasive.

Even appliances depend on generative systems

The forward thrust created by generative technologies is so powerful that one finds them in even supposedly non-generative appliances. Most embedded devices with non-trivial capabilities (devices that need more than a while-loop for an operating system) use general-purpose operating systems, often Linux or the reduced-fat version of Windows known as Windows CE.

Zittrain contrasts generative PCs and free software to appliances such as the TiVo, Xbox, and iPhone. The irony is that these are all based on generative technologies. The manufacturers could not resist the opportunity to cut development costs by using robust and freely available platforms.

TiVo uses Linux as its operating system, the Xbox runs on general-purpose hardware that has been successfully hacked to run Linux, and the iPhone–which epitomizes to Zittrain the supreme tethered appliance–has BSD inside. Because of its innately generative qualities (including the relatively transparent language of its API, Objective-C), the iPhone was opened up just a few months after its release in a textbook kind of collaboration among self-organized hackers, leading to a free software toolkit that lets any programmer create new applications using all the features of the iPhone.

These examples underline the challenge Tim O'Reilly used to pose to Microsoft: without open platforms, where will its next wave of technology come from? It looks like Microsoft listened, considering its current tentative support for a few free free software projects. An industry of appliances would be poorer without generative technology.

The tether chafes

One of the central points of Zittrain's book is that embattled computer users, worn down by the onslaught of malware, tend to retreat and give up control to centers of authority, whether by installing restrictive firewalls or buying tethered appliances that were built from the ground up to be closed.

Zittrain has several wonderful sections laying out the long-term detriment of this choice, not only for obvious topics such as technological innovation and fair use of copyrighted material, but for the balance between government and individual rights. He's on top of all the abuses caused by manufacturers who keep control of their devices and send them automated updates–sometimes updates that deliberately disable previously available features. Tethered appliances respond to their vendors with the same flexible slavishness as computers taken over by roving bots.

But Zittrain does not use available evidence to rebut the seductive claim that choosing appliances over applications leads to more safety for the user and the overall community. Does it?

I think we have plenty of evidence to resist the tethering of previously open computers. For instance, what would most computer users trust more than a CD from Sony? And to ward off the dangers of the open Internet, should we turn to telephone companies to protect our privacy and personal data? I need say no more.

Among web services, the same worries apply. The dominant Internet appliance is Google, and every service it unveils seems to raise such fears about privacy that it has to perennially trot out its “don't be evil” motto.

But nowhere has the trust in appliances been more dangerous than the calamitous rush to electronic voting machines without paper output, which cannot be adequately audited after deployment. We need to say loudly: closing down open systems is no solution to security risks. (Richard M. Stallman made similar points in response to Zittrain's article, and Susan Crawford in her response.)

Web 2.0 extends generativity

The wide-area-network equivalent of a tethered alliance is “software as a service,” also known as an Application Service Provider. Here, I have to insist that Zittrain gets his terminology wrong. In place of these common industry terms, he refers to the phenomenon as Web 2.0.

Controversy has always surrounded the term Web 2.0, to be sure, despite attempts to define the phrase by Tim O'Reilly, who is credited with inventing it. Although everybody reads his own biases into the term, I don't see any meaningful definition of Web 2.0 that includes web sites where users just log in to run an application remotely. I did see one other speaker misunderstand the term this way, but we have to resist the trend to “mash up” useful terms to the point where they lose their value and all come out in some bland uniformity.

Web 2.0 features–such as simple APIs and ways to incorporate user-submitted content–extend generativity as much as blogs and wikis do. They're a critical stage in the ongoing evolution of the Internet. But Zittrain does offer some important critiques. Google Maps can discourage competition by co-opting it through its powerful API. And this ultimately means more control for Google–control it could leverage to artificially set the direction for mapping applications.

Thus, Web 2.0 technologies can be seen as an enablers that open up the data and applications controlled by corporations, but also as the soft glove than allow the corporate fist to push itself further and further into their clients' lives.

My glosses and musings on “The Future of the Internet” show how much meat it provides for analysis and discussion. Anyone who can make it through this long review would get a lot from the book. In addition to drawing links among useful recommendations for preserving our freedom, Zittrain proves that the legal frameworks for making such decisions are more complex than most technologists and policy makers credit them for.

user/andy_oram.txt · 最后更改: 2010/01/02 由 radarman
O'Reilly Home | O'Reilly Beijing | Ignite China(点燃之夜在中国) | Privacy Policy ©2005-2010, O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
京ICP备05003502号