CrisisCamps and the Pattern of Disaster Technology Innovation
Jesse Robbins @jesserobbins 2010-01-23
![]() Upcoming Crisis Camps
|
Jesse Robbins (@jesserobbins) is CEO of Opscode and a recognized expert in Infrastructure, Web Operations, and Emergency Management.
He serves as co-chair of the Velocity Web Performance & Operations Conference and contributes to the O’Reilly Radar. Prior to co-founding Opscode, he worked at Amazon.com with a title of “Master of Disaster” where he was responsible for Website Availability for every property bearing the Amazon brand. Robbins is a volunteer Firefighter/EMT and Emergency Manager, and led a task force deployed in Operation Hurricane Katrina. His experiences in the fire service profoundly influence his efforts in technology, and he strives to distill his knowledge from these two worlds and apply it in service of both. |
Jesse Robbins (@jesserobbins),Opscode首席执行官,业界广为认可的架构专家,Web运营专家,紧急事件管理专家。
Robbins还是Velocity Performance & Operations Conference的联合主席,也参与O'Reilly Radar。参与创建Opscode之前他为Amazon.com服务,职位是“灾难专家”,负责网站可用性从而维系包括Amazon品牌在内的一切资产。 Robbins是消防员志愿者/急诊医生和紧急事件管理人员,曾经领导了一支部署在卡特里娜飓风营救行动中的特遣分队。他在消防领域的经验深深地影响他在技术方面的工作,他努力从这两方面吸取知识然后再应用到两个领域中去。 |
Jesse Robbins @jesserobbins 2010-01-23
![]() Upcoming Crisis Camps
|
Jesse Robbins @jesserobbins 2009-11-24
|
We're entering our third year of Velocity, the Web Performance & Operations Conference.. Velocity 2010 will be June 22-24, 2010 in Santa Clara, CA. It's going to be another incredible year. Steve & I have set a new theme this year, "Fast by Default". We want the broader Velocity community & to adopt it as a shared mission & mantra. The reason for this is simple... Fast isn't a Feature. Fast is a Requirement.At Velocity earlier this year Marissa Meyer explained why performance mattered so much to Google. Then Eric Schurman (Bing & Velocity Program Committee member) and Jake Brutlag (Google Search) made history with a co-presentation on just how crucial performance is to revenue . Phil Dixon of Shopzilla explained that a 5" second performance improvement increased their revenue by 7-12 percent":http://http://velocityconference.blip.tv/file/2290648/ while reducing hardware spend by 50%!!! Fast means Client, Server, Infrastructure, Operations, & OrganizationsGetting to Fast isn't just about any one part of the system. Browser & Client performance is crucial, and requires an equally fast server & infrastructure to support it. When load increases, infrastructure must scale quickly or performance suffers. The operational tools and processes for managing software & infrastructure must support rapid changes in a dynamic environment, and be backed by an organization & culture that embraces it. We're Looking for Speakers - Submit your Proposal by January 11One more thing…
Quite a few people have asked us to have Velocity conferences more frequently & beyond the SF Bay Area, and so we're going to try something new. On December 8 we'll be running our first ever Velocity Online Conference. Past Velocity Conference participants get a 50% discount & get a 25% discount off Velocity 2010.
See the full schedule after the jump...
Tuesday, December 8, 2009 9:00am-12:40pm PT |
Jesse Robbins @jesserobbins 2009-10-01
|
At Velocity this year Microsoft, Google and Shopzilla each presented data on how web performance directly impacts revenue. Their data showed that slow sites get fewer search queries per user, less revenue per visitor, fewer clicks, fewer searches, and lower search engine rankings. They found that in some cases even after site performance was improved users continued to interact as if it was slow. Bad experiences have a lasting influence on customer behavior. What about smaller websites that aren't yet at this scale?Alistair Croll and Sean Power, the authors of the new book Complete Web Monitoring, have continued this research for sites at smaller scale. They used a Strangeloop Networks web acceleration appliance to optimize half the sessions to a smaller production website, tagging optimized and unoptimized visitors so they could be analyzed in Google Analytics. The Strangeloop device applies many of Steve Souders' performance rules to an existing site automatically (a kind of "Steve-in-a-Box" ;-). The results of their analysis show how significant a reduction in page latency can be. In addition to reducing bounce rates, and increasing pages per visit & time on site, they found a 16.07% increase in conversion rates and a 5.50% increase in average order value. Check out the full post on the Watching Websites blog. |
Jesse Robbins @jesserobbins 2009-08-06
|
Twitter is suffering outages today as they fend off a Denial of Service attack, and so I thought it would be helpful to post John Adams’ exceptional Velocity session about Operations at Twitter. Good luck today John & team… I know it’s going to be a long day! Update: Apparently Facebook & Livejournal have had similar attacks today. Rich Miller from Data Center Knowledge reminds us that this is just the latest in a series of major attacks. |
Jesse Robbins @jesserobbins 2009-06-24
|
We were honored to have Jonathan Heiliger, Facebook’s VP of Technology Operations, as our opening keynote speaker at Velocity. Jonathan is one of the most accomplished leaders in our field, and is a master of the craft. Here is his keynote in its entirety: Note: Other videos from Velocity are being posted to VelocityConference.blip.tv |
Jesse Robbins @jesserobbins 2009-06-08
|
The deadline for talks is May 11th, so submit your talks now!As with all Ignites each speaker will only get 20 slides that each auto-advance every 15 seconds for a total of five minutes. We'll be looking for fun geek topics like hacks, how-to's, and insights. (Talks don't have to be Velocity-related!) If you're not sure what an Ignite talk looks like check out the Ignite Show. |
Jesse Robbins @jesserobbins 2009-06-07
|
CrisisCamp Ignite! Session Kick OffTime: Friday, June 12, 2009 from 7:30-9PM CrisisCamp - Saturday, June 13 & Sunday, June 14thStart Time: 9:00am both days |
Jesse Robbins @jesserobbins 2009-05-21
|
Galactic Center of Milky Way Rises over Texas Star Party from William Castleman.
[via the Primary Tentacle @ Laughing Squid] |
Jesse Robbins @jesserobbins 2009-05-17
![]()
from nasa hq photostream [via slashdot] |
Jesse Robbins @jesserobbins 2009-05-08
|
(tag cloud created from Velocity session & speaker information using wordle.net)
My favorite interview question to ask candidates is: "What happens when you type www.(amazon|google|yahoo).com in your browser and press return?"
While the actual process of serving and rendering a page takes seconds to complete, describing it in real detail can take an hour. A good answer spans every part of the Internet from the client browser & operating system, DNS, through the network, to load balancers, servers, services, storage, down to the operating system & hardware, and all the way back again to the browser. It requires an understanding of TCP/IP, HTTP, & SSL deep enough to describe how connections are managed, how load-balancers work, and how certificates are exchanged and validated... and that's just the first request! Web Performance & Operations is an emerging discipline which requires incredible breadth, focusing less on specific technologies and more on how the entire system works together. While people often specialize on particular components, great engineers always think of that component in relation to the whole. The best engineers are able to fly to the 50,000 foot view and see the entire system in motion and then zoom in to microscopic levels and examine the tiny movements of an individual part. John Allspaw recently described this interconnectedness on his blog:
Working with these systems requires an understanding not only of the way technology interacts, but the way that people do as well. The structure, operation, and development of a website mirrors the organization that creates it, which is why so many people in WebOps focus on understanding and improving management culture & process. Organizing a conference like Velocity is a wonderful challenge because it requires the same sort of thinking. We focus on the big concepts that everyone needs to know and then go deep into the technologies that change our understanding of the system. We find ways to share the unique experience that can only be gained by operating at scale. We make it safe to share as much of the "Secret Sauce" as we can. Please join us at Velocity this year, we have an amazing lineup of speakers & participants. Early registration ends on Monday, May 11th at 11:59 PM Pacific. (Radar readers can use "vel09cmb" for an additional 15% discount.) |
Jesse Robbins 2009-04-20
Since then the BarCampBank idea has turned into a movement. There have been over 14 events all over the world, and many of the ideas generated are beginning to turn into action. To me, the global financial system is a platform must always create more value than it captures. Tim explained this in his Work on Stuff that Matters post, saying:
There has never been a more important time to bring meaningful innovation into the financial system, and there has never been more opportunity for our community to make it happen. The next event is occurring this weekend (April 25-26, 2009) on Treasure Island in San Francisco.
|
Jesse Robbins 2008-11-29
|
James Hamilton is one of the smartest and most accomplished engineers I know. He now leads Microsoft's Data Center Futures Team, and has been pushing the opportunities in data center efficiency and internet scale services both inside & outside Microsoft. His most recent post explores misconceptions about the Cost of Power in Large-Scale Data Centers:
[link]
|
翻译:xiaochong James Hamilton是我知道的最聪明最牛的工程师之一。他现在领导微软Data Center Futures团队,一直在微软内部和外部推动数据中心效率和互联网规模服务的各种机遇。下面是他最新的文章,指出了人们关于大规模数据中心电力成本方面的一些误解:
|
Jesse Robbins 2008-11-21
Last year's Velocity conference was an incredible success. We expected around 400 people and we ended up maxing out the facility with over 600. This year we're moving the conference to a bigger space and extending it to 3 days to accommodate workshops and longer sessions.
Velocity 2009 will be on June 22-24th, 2009 at the Fairmont Hotel in San Jose, CA.
This year's conference will be especially important. I've said many times that Web Performance and Operations is critical to the success of every company that depends on the web. In the current economic situation, it's becoming a matter of survival. The competitive advantage comes from the ability to do two things:
I'm excited to announce that joining Steve Souders & I on this year's program committee are John Allspaw, Artur Bergman, Scott Ruthfield, Eric Schurman, and Mandi Walls. We've already started working on the program, and have just opened the Call for Participation. We're especially interested in the following topics:
The submission deadline is January 5th, so get your talks in. If you have any questions or suggestions for the committee, send them to velocity-idea@oreilly.com.
|
Jesse Robbins 2008-11-03
|
Congratulations! |
Jesse Robbins 2008-11-01
|
One of the most interesting DisasterTech projects I've been following is "Decisions for Heroes" led by developer and Irish Coast Guard volunteer Robin Blandford. Decisions is like Basecamp for volunteer Search & Rescue teams. The focus is on providing "just enough" process to compliment the real-world workflow of a rescue team, without unnecessary complexity. One of Robin's design goals is that:
This is the winning approach for building systems that "serve those that serve others", and is echoed by InSTEDD's design philosophy and the Sahana disaster management system. Teams begin by entering their responses to incidents and training exercises. They then tag them with things like the weather conditions, the tools and skills required, and who from the team was deployed. As a team's incident database grows this information can be used to show heatmaps, and provide powerful insight on the locations, weather conditions, and times of year that various incidents occur. Over time this kind of data could be analyzed in aggregate across multiple teams and regions and create an incredibly powerful resource for Emergency Managers. This is very similar to what Wesabe does for consumers with financial transaction data today (disclosure: OATV investment).
Rescue team members enter training dates and levels. The system tracks certification expiration dates and prompts team members & leaders to plan classes and remain current. This is a huge issue for volunteers who have to manage professional-level training requirements with the demands of a regular career. As more incidents are entered into the system, it compares the skills required for each of the rescues with the team training exercises. This allows teams to identify areas to focus, train, and develop new skills. ![]()
This is an innovative project with tremendous potential, and hopefully an early signal of coming changes in Emergency Management.
(Note: ''How to Serve those that Serve Others" will be the theme of my "High Order Bit" session at the Web2.0 Summit. I'll be sure to post video/slides/notes when they are available.)
|
Jesse Robbins 2008-10-31
|
It appears that Sprint has stopped routing traffic (called "depeering") from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to block voice phonecalls to AT&T customers. Here's a graph that shows the outage, courtesy of Keynote : Rich Miller at DataCenterKnowledge has a great summary of the issues behind the incident, which has happened with Cogent before. Rich says:
I think this is particularly Radar-worthy because it provides an example of the complex issues around Net Neutrality.In this case customers are harmed and most (especially Sprint wireless customers) will have no immediate recourse. Todd Underwood of Renesys has posted an incredibly detailed explanation the scope and impact of this issue. Here is a summary: Another way to look at the scope of this event is to identify the number, size and ownership of the network prefixes affected by the outage. [...] So, in total, at least 3500 networks on the Internet have less than full connectivity right now. [...] These same kinds of issues will likely happen with cloud service providers as well. As we've already learned from the evolution of VoIP, you become what you disrupt. |
Jesse Robbins 2008-10-24
|
Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services. Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes of downtime per year. It's important to remember that an SLA is just a contract that provides a commitment to a certain level of performance and some form of compensation when a provider fails to meet it. Here's the summary of the EC2 SLA (emphasis added):Service Commitment AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit as described below. [...] This new SLA does not appear to address the reliability of server instances individually or in aggregate. For example, if half of a customer's EC2 instances lose their connections or die every 6 minutes, EC2 would still be considered "available" even if it is essentially unusable. If the entire EC2 service is down a cumulative four hours and twenty minutes, customers must furnish proof of the outage to Amazon to be eligible for the 10% credit. This seems like an onerous process for very little compensation, and isn't in-line with Amazon's famous "Relentless Customer Obsession". Amazon takes monitoring very seriously and should take the lead by tracking, reporting, and proactively compensating customers when it lets them down. |
Jesse Robbins 2008-10-15
|
The Boston Globe has assembled a beautiful gallery of images of the Sun. This LASCO C2 image, taken 8 January 2002, shows a widely spreading coronal mass ejection (CME) as it blasts more than a billion tons of matter out into space at millions of kilometers per hour. The C2 image was turned 90 degrees so that the blast seems to be pointing down. An EIT 304 Angstrom image from a different day was enlarged and superimposed on the C2 image so that it filled the occulting disk for effect (Courtesy of SOHO/LASCO consortium) [link courtesy Barry Brumitt] |
Jesse Robbins 2008-09-24
|
When Apple announced the iPhone SDK last year I said: [...] Jobs makes it clear that the platform won't be completely open. While he says that this is to balance the benefits of an open platform with user security protection, it's unclear where Apple will draw those lines. Will there be a Skype client? Third-party media apps?Almost a year later Apple is using their control of the App store to block innovative developers from reaching their customers. The most recent example is the "Podcaster" iPhone app which allows you to download and manage podcasts on the iPhone directly, without having to boot your computer to sync in iTunes. According to the developer, Apple blocked this application from the App store, saying: If you want to build a platform, you have to compete fairly with the developers on your platform (if you must to compete at all). By restricting developers, Apple is stifling innovation and their long-term growth. Frustrated customers and developers who "think different" are Jailbreaking their iPhones and getting excited about Google's Android. Remember: Successful platforms create more value than they capture. |
翻译:西门吹雪
Apple采用App Store控制将革新开发人员与客户隔离开来快一年了。最新的例子就是“Podcaster”,这个应用允许用户直接在iPhone上下载并管理播客,而无需用计算机通过iTunes来同步。 据该开发人员称App Store将其拒之门外,理由是:
如果你构建了一个平台就必须与该平台上的开发人员公平竞争(如果一定要竞争)。通过限制开发人员Apple阻碍了革新和自身的长远发展。失望的客户和开发人员有不同想法,只能越狱iPhone,并对Google的Android欢呼雀跃。 请记住:成功的平台要创造比他们索取到的更多的价值。 |
Jesse Robbins 2008-08-07
|
Dan Kaminsky has posted the details of the widespread DNS vulnerability. Clarified Networks created this visualization of DNS patch deployment over the past month: Red = Unpatched |
Jesse Robbins 2008/06/28
|
Theo Schlossnagle, author of Scalable Internet Architectures, gave a great explanation of how internet traffic spikes are shifting: Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve. This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com): [Link] |