Nat Torkington

photo_nat_m.jpgNat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.

Nat主持O'Reilly开源大会和其他O'Reilly会议已经超过十年了。他运行了新西兰第一个Web服务器,也是畅销书“Perl Cookbook”作者之一,还是Radar最早的博主之一。他住在新西兰主要关注亚太地区。

    Follow me on Twitter

    Four short links: 16 March 2010

    Nat Torkington @gnat 2010-03-16

    1. Government is an Elephant (Public Strategist) -- if Government is to be a platform, it will end up competing with the members of its ecosystems (the same way Apple's Dashboard competed with Konfabulator, and Google's MyMaps competed with Platial). If you think people squawk when a company competes, just wait until the competition is taxpayer-funded ....
    2. Recordings from NoSQL Live Boston -- also available in podcasts.
    3. Modeling Scale Usage Heterogeneity the Bayesian Way -- people use 1-5 scales in different ways (some cluster around the middle, some choose extremes, etc.). This shows how to identify the types of users, compensate for their interpretation of the scale, and how it leads to more accurate results.
    4. Building a Better Teacher -- fascinating discussion about classroom management that applies to parenting, training, leading a meeting, and many other activities that take place outside of the school classroom. (via Mind Hacks)

    Open Data Pointers

    Nat Torkington @gnat 2010-03-16

    When I blogged about truly open data, readers sent me a lot of interesting links. I've collected them all below. Enjoy!

    1. The Centre for Environmental Data Archival (CEDA) -- hosts a range of activities associated with evironmental data archives. (Director is on Twitter, @bnlawrence)
    2. CONNECT -- open source healthcare data exchange being developed with Brian Behlendorf, one of the original developers of the Apache web server.
    3. Phil Agre's Living Data -- prescient article in Wired from 1994.
    4. Factual -- web database that permits multiple values in tables, and you can apply different functions to choose which values you'll use when you work with the data (e.g., "most recent", "most popular", ...).
    5. HDF -- BSD-licensed toolkit and format for storing and organising large numeric datasets.
    6. NetCDF -- software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
    7. Madagascar -- an open-source software package for multidimensional data analysis and reproducible computational experiments with digital image and data processing in geophysics and related fields.
    8. Data Documentation Initiative -- standard for metadata of social science datasets.
    9. The Memento Project -- project to incorporate versioned web pages into regular browsing.
    10. Data Sharing: A Look at the Issues -- presentation from Science Commons data manager Kaitlin Thaney.
    11. SIFN Datasprint -- these folks are planning a sprint around data, the same way coders often have sprints around code.
    12. Get Your Database Under Version Control -- 2008 piece by Jeff Atwood on the need to version control your database.
    13. CKAN -- Comprehensive Knowledge Archive Network. The database of open datasets is itself an open dataset, managed by a versioned database.
    14. Componentization and Open Data, Collaborative Development of Data: OKFN are figuring out packaging and structure of distributed data development. They seem closest to building what I was talking about.
    15. Open Data Maturity Model (Slideshare) - I like the idea of progressing from amateur to professional and identifying milestones along the continuum, but I'm not convinced that the last two stages are based on existing projects. I'm a big fan on building frameworks from successful projects, rather than building the framework in isolation.
    16. Data RSS -- proposal for an API for data feeds.
    17. Fedora Commons -- open source for managing, indexing, and delivering datasets. Islandora integrates that into Drupal.
    18. gEDA -- GPL'd suite of Electronic Design Automation tools, some of which are applicable to non-electronics data projects.
    19. You Cannot Run an Open Data Project Like an Open Source Project, Unless… -- not always coherent disagreement with my post. His most lucid moment comes pointing out that government datasets have a single owner. This is the difference between intrinsic data (crimes reported to police) where the data is about the operations of an agency, and extrinsic data (wild horse populations in Arizona, tree ring climate records) where an agency sends people into the field or otherwise collects and possibly processes data from others' labour. But even intrinsic data can be more collaboratively maintained: take bug reports and corrections from users (there is no 800 block of Main, did you mean the 80 block of Main?). It's true: I can't imagine a lot of collaboration around the preparation and distribution of pure sensor data (e.g., traffic data), but my post talked about more than collaborative generation: revision management, documentation, etc.

    Four short links: 15 March 2010

    Nat Torkington @gnat 2010-03-15

    1. A German Library for the 21st Century (Der Spiegel) -- But browsing in Europeana is just not very pleasurable. The results are displayed in thumbnail images the size of postage stamps. And if you click through for a closer look, you're taken to the corresponding institute. Soon you're wandering helplessly around a dozen different museum and library Web sites -- and you end up lost somewhere between the "Vlaamse Kunstcollectie" and the "Wielkopolska Biblioteka Cyfrowa." Would it not be preferable to incorporate all the exhibits within the familiar scope of Europeana? "We would have preferred that," says Gradmann. "But then the museums would not have participated." They insist on presenting their own treasures. This is a problem encountered everywhere around the world: users hate silos but institutions hate the thought of letting go of their content. We're going to have to let go to win. (via Penny Carnaby)
    2. StoryGarden -- a web-based tool for gathering and analyzing a large number of stories contributed by the public. The content of the stories, along with some associated survey questions, are processed in an automated semantic computing process for an immediate, interactive display for the lay public, and in a more thorough manual process for expert analysis.
    3. Google Apps Script -- VBA for the 2010s. Currently mainly for spreadsheets, but some hooks into Gmail and Google Calendar.
    4. There's a Rootkit in the Closet -- lovely explanation of finding and isolating a rootkit, reconstructing how it got there and deconstructing the rootkit to figure out what it did. It's a detective story, no less exciting than when Cliff Stohl wrote The Cuckoo's Egg.

    Four short links: 12 March 2010

    Nat Torkington @gnat 2010-03-12

    1. Flickr Flow -- a "season wheel", showing the relative popularity of colours in Flickr photos at different times of the year. Beautiful. (via gurneyjourney)
    2. Light Peak -- optical peripheral cabling and motherboard connections. (via timoreilly on twitter)
    3. British Museum Pilots "Wikipedian in Residence" -- Liam's underlying task will be to be to build a relationship between the Museum and the Wikipedian community through a range of activities both internally and public-facing. (via straup on Delicious)
    4. Twitter's Location Policy -- If you chose to tweet with a place, but not to share your exact coordinates, Twitter still needs to use your coordinates to determine your Place. In order to improve the accuracy of our geolocation systems (for example, the way we define neighborhoods and places), Twitter will temporarily store those coordinates for 6 months. Because how could anything go wrong if there's a database containing 6 months of my precise locations stored on the Internet even when I've chosen not to share my precise location? (via straup on Delicious)

    Four short links: 11 March 2010

    Nat Torkington @gnat 2010-03-11

    1. Digital Inclusion: How Do You Tell? -- [N]either means nor skills are simple binary states. A while ago, I was talking to a young man looking for a job, and asked him why he didn’t look online. Because it’s two buses to get to the public library and you only get half an hour, was his reply. Or being in a library myself and watching an older man asking a bit tentatively if he could use one of the computers and being firmly told that he could book a slot for three days time. He turned away looking crestfallen and without making a booking. It didn’t look as though he would be back. Remote, uncertain, and limited access is better than none. But it is hardly inclusion.
    2. The Participatory Museum Process -- inside look at the writing of the book, and the surprises she received writing it. People preferred to comment on a finished draft rather than the work in progress. At the time, I thought people would be MORE excited to comment and help shape the book as I was first writing it than to comment on a complete draft. I was wrong. The second draft was offered to participants with a much more specific, time-limited ask, and it was much more successful than the open-ended "help me as I write it" approach to draft one. This makes sense - the second draft experience was much better-scaffolded - and it made me reconsider the extent to which participants want to be involved in the early development of other peoples' projects.
    3. Finding Pin 1 (Evil Mad Scientist) -- some interesting knowledge about hardware that'll make you more informed the next time you peer quizzically at a printed circuit board.
    4. January 2010 US Mobile Subscriber Market Share (ComScore) -- Android just overtook Palm, and is growing faster than the other smartphone platforms. And for a reality check, 28% of mobile customers used a browser on their phone. (via phandroid)

    Four short links: 10 March 2010

    Nat Torkington @gnat 2010-03-10

    1. The Future of Book Publishing Business Models (Stephen Walli) -- some good thoughts about the book publishing industry and ebooks. When does Amazon create the iPhone/Android app and the programme that will allow bookstores to receive a cut of every Kindle edition they sell? I scan the book's in-store barcode with my smartphone, and I get the Kindle edition delivered, and the store gets its cut. Why is this different in concept than Borders on-line store being run on Amazon, or any of the independent book sellers that front through Amazon? It's not the normal book mark-up, but people already browse bookstores and buy on Amazon. This is better than no revenue. (When was the last time you went to a travel agent?)
    2. Google Apps Enterprise Marketplace -- this is sweet. It looks like the play is to become the home page for authenticated apps rather than to make commissions from selling the apps themselves. This may be the Google business model vs the Apple business model in a nutshell. (via Marc Hedlund)
    3. iPad Application Design -- some fantastic notes about the kinds of UI design that iPad encourages. I've avoided covering The Second Coming of The JesusPhone but this is interesting because of the middle ground it stakes out between phone and laptop. The primary warning about designing for the iPad is: more screen space doesn’t mean more UI. You’ll be tempted to violate that principle, and you need to resist the temptation. It’s OK to have UI available to cover your app’s functionality, but a bigger screen doesn’t mean it should all be visible at once. Hide configuration UI until needed. Look like a viewer, and behave like an editor ... There’s been a history of modes getting some bad press on the desktop. The issue is that they trade stability (things always being in exactly the same place in the UI, and not changing) for simplicity (not having too many controls to look through at once). On the iPad, it’s clear where the winning side of the balance is: simplicity. Modes are completely appropriate on this device. (via Marc Hedlund)
    4. The Howtoons Visual Creation Guide -- we teach grammar and spelling in schools but not visual communication. This short booklet is a good start to remedying that. (via BoingBoing)

    Four short links: 9 March 2010

    Nat Torkington @gnat 2010-03-09

    1. Cooperative Behaviour Spreads Through a Group, But So Does Cheating (Not Exactly Rocket Science) -- Fowler and Christakis suggest that people tend to mimic the actions of those they played with. They could be directly imitating the actions of other players, or they could be looking out for cues that tell them the 'right' or 'normal' way of behaving. Whether it's specific actions or social norms that are spreading, the result is the same - a ripple effect that causes groups of people to act in similar ways. People copy the modeled behaviour that they see. This is why, when you start a new social site, you should seed it with people who behave the way that you wish newcomers to behave.
    2. Tulip -- open source 3D visualisation software of large graphs, homepage here. (via hjl on Delicious)
    3. Six Months of Hacker News Front Page Data -- half a million archived records from the Hacker News front page, captured every 15m.
    4. Internet Freedom: Beyond Circumvention (World Changing) -- a very thought-provoking post that challenges the idea that all we need to do to help the citizens of (insert censored country here) is to have more people using Tor. I wonder whether we’re looking closely enough at the fundamental limitations of circumvention as a strategy and asking ourselves what we’re hoping internet freedom will do for users in closed societies. [...] o figure out how to promote internet freedom, I believe we need to start addressing the question: “How do we think the Internet changes closed societies?” In other words, do we have a “theory of change” behind our desire to ensure people in Iran, Burma, China, etc. can access the internet? Why do we believe this is a priority for the State Department or for public diplomacy as a whole? (via BoingBoing)

    Truly Open Data

    Nat Torkington @gnat 2010-03-09

    I'm kicking myself. I have spent a non-trivial number of hours talking to government departments and scientists about open data, talking up an "open source approach" to data, pushing hard to get them to release datasets in machine readable formats with reuse-friendly licenses. I've had more successes than failures, met and helped some wonderful people, and now have more mail about open data in my inbox than about open source. So why am I kicking myself?

    I'm kicking myself because I've been taking far too narrow an interpretation of "an open source approach". I've been focused on getting people to release data. That's the data analogue of tossing code over the wall, and we know it takes more than a tarball on an FTP server to get the benefits of open source. The same is true of data.

    Open source discourages laziness (because everyone can see the corners you've cut), it can get bugs fixed or at least identified much faster (many eyes), it promotes collaboration, and it's a great training ground for skills development. I see no reason why open data shouldn't bring the same opportunities to data projects.

    And a lot of data projects need these things. From talking to government folks and scientists, it's become obvious that serious problems exist in some datasets. Sometimes corners were cut in gathering the data, or there's a poor chain of provenance for the data so it's impossible to figure out what's trustworthy and what's not. Sometimes the dataset is delivered as a tarball, then immediately forks as all the users add their new records to their own copy and don't share the additions. Sometimes the dataset is delivered as a tarball but nobody has provided a way for users to collaborate even if they want to.

    So lately I've been asking myself: What if we applied the best thinking and practices from open source to open data? What if we ran an open data project like an open source project? What would this look like?

    First, we'd collaboratively build the dataset. This means we'd have a curator who is the equivalent of a project leader, taking patches and filtering for quality. Successful open source project leaders foster a group of developers of different skills, rewarding on merit while fostering new talent. Like open source projects, the nirvana state is to have a project that can survive the retirement or death of its founder.

    But collaboration takes more than leadership--open source projects have tools that help. An open data project would need a mailing list to collaborate on, IRC or equivalent to chat in real-time, and a bug-tracker to identify what needs work and ensure that the users' needs are being met. The official dataset of New Zealand school zones has errors but there's nobody to report them to, much less a way to submit a fix to a maintainer. Oh, and don't forget a way to acknowledge and credit contributors—think not just of credits.txt but also of the difference between patch submitter, committer, and project maintainer.

    Open source software developers have a powerful set of tools to make distributed authoring of software possible: diff to identify what's changed, patch to apply those changes elsewhere, version control to track changes over time and show provenance. Patch management would be just as important in a collaborative open data project, where users and other researchers might be submitting new or revised data. What would git for data look like? Heck, what would a local branch look like? I have a new attribute, you have a different projection, she has new rows, how does this all tie back together? (I eagerly await claims that RDF will solve this problem and all others)

    That's just development. The interface between developers and users is the release. State of the art for a lot of government data is the equivalent of source.tar.gz. No version numbers, much the ability to download older versions of the datasets or separate stable and development branches.

    Why would we want to download the historic version of a dataset? Because a paper used it and we want to test the analysis software that the paper used to ensure we get the same answer. Or because we want to see what our analysis technique would have shown with the knowledge that was available back then. Or simply to be able to track defects.

    The users of data will have to adapt to the idea of versions, like the users of software have. The maintainers of the dataset might release five different versions of it while you're writing your analysis code, so it can't be a painful process to incorporate the revised data into your project. With software we have shared libraries and dynamic libraries, supported by autotools and such packages. Our code has interfaces and a branch that promises backwards compatibility. What would that look like for data? And what is the data version of the dependency hell that software developers know all-too-well (M 1.5 depends on N 1.7 and P 2.0, but P 2.0 requires N 2.0, and upgrading N to 2.0 breaks M which expects the 1.x set of interfaces from N ...).

    And, of course, there's documentation. As with software, I imagine we'll see some docs structured and some unstructured. The state of the art isn't great for government datasets, it has to be said: if you're lucky you get a "code X means ABCD" but rarely are you told exactly how the data were generated, the limits on its accuracy, situations where it shouldn't be used, etc.

    Finally, we need to change attitudes and social systems. Data is produced as the product of work done, and is rarely conceived of as having a life outside the original work that produced it. Some datasets will (some won't--think of how many projects fail to interest anyone but the person who started them). This means thinking of yourself not just as the person who does the work, but the person who leads a project of interested outsiders and (in some cases) collaborators and who is building something that will last beyond their time. This is not a natural mindset within government nor, in many cases, science. Funding and budgeting systems at the moment may prevent this, and would need to change.

    The good news is that while government datasets are rarely generated collaboratively, science is a little further along. PubMed and GenBank are just two examples of great science collaborations that we can learn from, and I'm sure there are more. Beyond science, OpenStreetMap is an important example of collaborative data gathering and the Open Knowledge Foundation folks may have work in this area already. I'm keen to learn more about the open data projects that are more than just data-over-the-wall and share what I find. Time to stop kicking myself and start learning!

    Four short links: 8 March 2010

    Nat Torkington @gnat 2010-03-08

    1. China's Cyberposse (NY Times) -- is vigilante justice ok if the cause is right? Is it okay if there wouldn't be justice without it? Does the end justify the means? Many interesting questions raised by this large-scale Internet-based "human-flesh-search" in China. In the future we are all 4chan. (via waxy, who also recommended this article on the same subject)
    2. Questioning "Born Digital" (The Economist) -- an interesting collection of healthy skepticism about how the "born digital" folks will change everything. [...] many of his incoming students have only a superficial familiarity with the digital tools that they use regularly, especially when it comes to the tools’ social and political potential. Only a small fraction of students may count as true digital natives, in other words. The rest are no better or worse at using technology than the rest of the population.
    3. The Participatory Museum -- a new book by the mighty museum mind, Nina Simon. The ideas are very usable outside of the museum world: raid this for social and engagement ideas for your own situation.
    4. Designing for Digital: What Print-Book Designers Should Know About Ebooks -- course notes covering format choice, tools, and (yes) typesetting. (via liza on Twitter)

    Amazon Fires Its Colorado Associates

    Nat Torkington @gnat 2010-03-08

    I just got interesting email from Amazon: the Colorado government recently enacted a law to impose sales tax regulations on online retailers [...] We and many others strongly opposed this legislation, known as HB 10-1193, but it was enacted anyway. Regrettably, as a result of the new law, we have decided to stop advertising through Associates based in Colorado. We plan to continue to sell to Colorado residents, however, and will advertise through other channels, including through Associates based in other states. The message goes on to say that they'll pay out all the money they owe me but I won't earn any more money for referring people to them.

    Interesting! So let me get this straight: I've done nothing, and Amazon just fired me? Now, I haven't used referrals a whole lot so it doesn't hit me in the pocketbook but this should send chills down the spine of anyone who thought they were building a business, or at least an income, around Amazon services. It's one thing to be fired for something you did (hey doofus, don't cause a heap of MPAA infringement notices to land on Amazon's desk because you were running the new Pirate Bay on EC2) but it's entirely another to be fired for something outside your control.

    A farmer friend told me that the goats to keep are female goats: when one doe headbutts another, the recipient then turns to the next in the hierarchy and headbutts them. With male goats, though, you get prolonged headbutt battles that are loud, intimidating, and potentially damaging. Amazon is obviously hoping the female goat scenario plays out: Amazon headbutts me, so I'll go headbutt my representative— punish Amazon's associates and hope they'll pass the pain on. I wonder whether any of Amazon's (former) Colorado associates will turn out to be male goats who, grumpy at being set upon, retaliate....

    The full text of the letter follows, and there's TechFlash covered the new law.

    Dear Colorado-based Amazon Associate:

    We are writing from the Amazon Associates Program to inform you that the Colorado government recently enacted a law to impose sales tax regulations on online retailers. The regulations are burdensome and no other state has similar rules. The new regulations do not require online retailers to collect sales tax. Instead, they are clearly intended to increase the compliance burden to a point where online retailers will be induced to "voluntarily" collect Colorado sales tax -- a course we won't take.

    We and many others strongly opposed this legislation, known as HB 10-1193, but it was enacted anyway. Regrettably, as a result of the new law, we have decided to stop advertising through Associates based in Colorado. We plan to continue to sell to Colorado residents, however, and will advertise through other channels, including through Associates based in other states.

    There is a right way for Colorado to pursue its revenue goals, but this new law is a wrong way. As we repeatedly communicated to Colorado legislators, including those who sponsored and supported the new law, we are not opposed to collecting sales tax within a constitutionally-permissible system applied even-handedly. The US Supreme Court has defined what would be constitutional, and if Colorado would repeal the current law or follow the constitutional approach to collection, we would welcome the opportunity to reinstate Colorado-based Associates.

    You may express your views of Colorado's new law to members of the General Assembly and to Governor Ritter, who signed the bill.

    Your Associates account has been closed as of March 8, 2010, and we will no longer pay advertising fees for customers you refer to Amazon.com after that date. Please be assured that all qualifying advertising fees earned prior to March 8, 2010, will be processed and paid in accordance with our regular payment schedule. Based on your account closure date of March 8, any final payments will be paid by May 31, 2010.

    We have enjoyed working with you and other Colorado-based participants in the Amazon Associates Program, and wish you all the best in your future.

    Best Regards,

    The Amazon Associates Team

    Four short links: 31 December 2009

    Nat Torkington @gnat 2009-12-31

    1. Botnets and the Global Infection Rate (PDF) -- fascinating insights into botnets, control tools, and business models.
    2. Atlassian Uses OpenSocial for Internal Integration -- they use it inside their firewall to build a better dashboard. OpenSocial defines two concepts--an API for defining and working with social data (profiles, attributes, relationships) and specification for gadgets. OpenSocial's fundamental promise was interoperability--write an application once and host it in multiple social networks. Sound familiar? That's what we wanted to do with our own products.
    3. Professional Conference Video with Semi-Professional Equipment -- How to make a great video of yourself giving a presentation, without having a cameraman to track you on stage. (I tried to tell my wife that I had semi-professional equipment, by the way, and it took a quarter of an hour for her to stop laughing.)
    4. Thoughts to Speech -- tested on a stroke victim in his 20s who was able to think but not move, electrodes and a small FM transmitter were implanted between speech and motor centres of his brain. Neurites grew into the electrodes, and the signals sent to them are broadcast by the transmitter to an external receiver. From there a desktop computer runs software to figure out which muscles were being moved, and then makes the corresponding sound. It requires training, but is an exciting breakthrough in brain-computer connection.

    Four short links: 30 December 2009

    Nat Torkington @gnat 2009-12-30

    1. How to Run a Meeting Like Google (BusinessWeek) -- the temptation is to mock things like "even five minute meetings must have an agenda", but my sympathy with Marissa Mayer is high. The more I try to cram into a work day, the more I have to be able to justify every part of it. If you can't tell me why you want to see me for five minutes, then I probably have better things to be doing. There may be false culls (missing something important because the "process' is too high) but I bet these are far outweighed by the missed opportunities if time isn't so structured.
    2. Computer Science Education Week -- December 5-11, 2010, recognizes that computing: Touches everyone's daily lives and plays a critical role in society; Drives innovation and economic growth; Provides rewarding job opportunities; Prepares students with the knowledge and skills they need for the 21st century." Worthy, but there's no mention of the fact that it's FUN. The brilliant people in this field love what they do. They're not brilliant 9-5, then heading home to scan the Jobs Wanted to see whether they could earn more as dumptruck drivers in Uranium mines in Australia. CS isn't for everyone, but it won't be for anyone unless we help them find the bits they find fun.
    3. Installing EtherPad -- step-by-step instructions for installing EtherPad, the open-source real-time text editor recently acquired by Google.
    4. Victorian Infographics -- animals, time, and space from the Victorians. It's beautiful, it's meaningful, it must be infoengravings.

    Four short links: 29 December 2009

    Nat Torkington @gnat 2009-12-29

    1. Turning The Page Online -- historic science books in high-resolution online. Hookes Micrografia was the first view of the microscopic world, and his astonishingly detailed and beautiful illustrations are there to view and print.
    2. Detailed Psychology of Trolls -- You might be surprised to learn that Trolls readily engage in long debates with fellow Trolls - people, that is, whom they know to be perverse and cunning conversation hackers. Apparently, this does not detract them from wasting hours on fruitless debates that are blatantly rigged and full of sophistry. Few Trolls would be happy with debating only fellow Trolls (semi-literate teenagers and hard-boiled fundamentalists are so much tastier - even though they, too, might be trolling you). Yet most of them, every once in a while, enjoy having an absurd argument with another pig-head. Good on the "know your enemy" basis. (via MindHacks)
    3. Theme Issue -- a Royal Society publication ran a special open access issue focusing on "personal perspectives of the life sciences", where top scientists write about what they think is important. It's good to see more toes dipped into open access, but I'd love to see more journals (particularly those of professions and associations) move to an entirely open access model. (via SciBlogs)
    4. Invent Your Own Computer Games with Python (2ed) -- free ebook that teaches how to program in Python, using games as the motivating examples. Nominally for 10-12 year old children, but (naturally) accessible to adults too. I have not read it, but approve of the attempt.

    Four short links: 28 December 2009

    Nat Torkington @gnat 2009-12-28

    1. GTFS Data Exchange -- site for sharing the files that Google Transit collects from public transit agencies. This lets third party developers write apps that don't involve Google.
    2. Tenureometer -- if you are what you measure, let's build good measures. This is one for higher education, designed to measure scholars' impact on their fields by counting how much they have contributed to the literature and how frequently those articles have been cited.
    3. The Known Universe -- rendered according to the best data science has. Beautiful.
    4. 100 Incredible Lectures from the World's Top Scientists -- it's an astounding collection for everyone to have access to. I'm cheekily delighted by the thought that TED talks will become the next generation's equivalent of the cheesy 16mm educational film: "oh no, not another famous person giving a 20 minute presentation on a life-changing approach to something! It's as naff as Spongebob and that silly multicolour Google logo!"

    Four short links: 25 December 2009

    Nat Torkington @gnat 2009-12-25

    1. One Billionth Spam Message Stats -- from the honeypot project comes a pile of stats about which countries spam, what they spam for, when they spam, etc. One intriguing insight our data provides is that bad guys take vacations too. For example, there is a 21% decrease in spam on Christmas Day and a 32% decrease on New Year's Day. Monday is the biggest day of the week for spam, while Saturday receives only about 60% of the volume of Monday's messages. Enjoy your day off spam. (via Bruce Schneier)
    2. Flowing Data's Five Best Data Visualization Projects of 2009 -- I think I listed at least four of these in this year's Four Short Links. You're welcome!
    3. Six Degrees of Separation -- tiring of "Sound of Music"? This BBC documentary on the science of social connection may help.
    4. Nanoscale Snowmen -- The snowman is 10 µm across, 1/5th the width of a human hair.. (via BoingBoing)

    Four short links: 24 December 2009

    Nat Torkington @gnat 2009-12-24

    1. Jonathan Zittrain on "Minds for Sale" -- video of a presentation he gave at the Computer History Museum about crowdsourcing. In the words of one attendee, Zittrain focuses on the potential alienation and opportunities for abuse that can arise with the growth of distributed online production. He also contemplates the thin line that separates exploitation from volunteering in the context of online communities and collaboration. Video embedded below.
    2. Anatomy of a Bad Search Result -- Physicists tell us that the 2nd law of Thermodynamics predicts that eventually everything in the universe will be the same temperature, the way a hot bath in a cold room ends up being a lukewarm bath in a lukewarm room. The web is entering its own heat death as SEO scum build fake sites with stolen content from elsewhere on the web. If this continues, we won't be able to find good content for all the bullshit. The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content. So who created this fake blog? It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that: keywords and links. The incentive to create quality content diminishes.
    3. Magplus -- gorgeous prototyping for how magazines might work on new handheld devices.
    4. Glasgow's Joking Computer -- The Glasgow Science Centre in Scotland is exhibiting a computer that makes up jokes using its database of simple language rules and a large vocabulary. It's doing better than most 8 year old children. In fact, if we were perfectly honest, most adults can't pun to save themselves. Q: What do you call a shout with a window? A: A computer scream. (via Physorg News)

    Four short links: 23 December 2009

    Nat Torkington @gnat 2009-12-23

    1. Blippy -- Automatically share your favorite purchases from iTunes, Amazon, Zappos, Visa, MasterCard, and more. See what your friends are buying. Interesting premise, and interesting possibilities for buyers to influence each other.
    2. Thousands of lost Durham health records spark probe -- not remarkable in itself but rather indicative that the lost USB key is the new vector for data loss, whereas five years ago it was the "lost laptop". Each data loss incident like this represents a failure to follow simple protocols to encrypt data placed on moveable media. (via scilib on Twitter)
    3. The Meaning of Open (Google Blog) -- a Google exec writes up what he thinks Open should mean for Google. The open source argument is fairly conventional, but it heats up at the open data section. For a rebuttal, see Daring Fireball.
    4. On Blogging Tools -- Joshua Schachter wonders whether blogging tools can be rebuilt as small pieces loosely joined. I wonder if there is a way to define loose interfaces between these systems so that they could both work together but also not set APIs in concrete solid enough to stop innovation. Because the various pieces of the systems currently are all tightly bound together, it is very hard for the parts to move forward separately. For example, I've wanted to be able to specifically reply to comments in place in a visually differentiated way as the publisher, rather than just as another commenter. But this feature hasn't emerged, and if I hacked it into one platform via plugins, I'd be stuck with it forever.

    Four short links: 22 December 2009

    Nat Torkington @gnat 2009-12-22

    1. Trading Shares in Milliseconds (Technology Review) -- With the rise of automation, the bulk of U.S. stock trading has moved from the once-crowded floor of Manhattan's New York Stock Exchange (NYSE) to silent server farms run by exchanges and broker-dealers across the country: the proportion of all trades that the NYSE handles has shrunk from 80 percent in 2005 to 40 percent today. Trading is now essentially a virtual art, and its practitioners put such a premium on speed that NASDAQ has considered issuing equal 100-foot lengths of cable to the brokers who send orders to its exchange servers. (via Hacker News)
    2. Stream iTunes Over SSH -- short script that lets you tunnel itunes from one machine to another over ssh (by default iTunes only shares on the local network).
    3. Doodle -- simple way to schedule a common meeting time. (via joshua on Delicious)
    4. Crowdsourcing -- Simon Willison's thoughtful "lessons learned" from his crowdsourcing projects at the Guardian. Crowdsourcing is not as simple as "give them a wiki and they will fill it" (this is related to the failed "everyone in the world wants to work on my broken payroll system" theory of open source), and Simon explains some of the subtleties. The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn’t want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time. For the new version, we tried to provide a much better feeling of activity around the site. We added “top reviewer” tables to every assignment, MP and political party as well as a “most active reviewers in the past 48 hours” table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.

    Four short links: 21 December 2009

    Nat Torkington @gnat 2009-12-21

    1. A Taxonomy of Social Networking Data (Bruce Schneier) -- he divides information by who gave it, why, and who controls it. Useful to remember that not all social data are equal.
    2. Five Ways to Revolutionise Computer Memory (New Scientist) -- the physics and economics of new memory technology.
    3. News at Seven -- project to automatically generate news report, complete with Flash-animated news readers and text-to-speech voices. A project from the Intelligent Information Lab at Northwestern University.
    4. Bacteria-Powered Micro-Machines -- A few hundred bacteria are working together in order to turn the gear. When multiple gears are placed in the solution with the spokes connected like in a clock, the bacteria will begin turning both gears in opposite directions and it will cause the gears to rotate in synchrony for a long time. Video embedded below (via BoingBoing)

    Four short links: 18 December 2009

    Nat Torkington @gnat 2009-12-18

    1. In Character -- a journal that addresses a different virtue each quarter. I've been thinking of practical philosophy a lot, lately, as we see ever-more-dodgy behaviour. (via bengebre on Delicious)
    2. Lessons from Parallelizing Matrix Multiplication -- a reminder why low-level knowledge of your platform matters, and why motivating examples should be carefully chosen.
    3. MathJax -- MathJax is an open source, Ajax-based math display solution designed with a goal of consolidating advances in many web technologies in a single definitive math-on-the-web platform supporting all major browsers. (via Hacker News)
    4. EtherPad Source -- released as part of their Google acquisition. The announcement says: Our goal with this release is to let the world run their own etherpad servers so that the functionality can live on even after we shut down etherpad.com. This is the resolution to the bad reception of the news that EtherPad would close in March with no plan B for users. The cult of entrepreneurship worshipped the customers only as a vehicle to an exit, but I don't believe that it's moral to do well personally but leave your customers high and dry. This is a message that the EtherPad founders seem to have got loud and clear.

    user/nat_torkington.txt · 最后更改: 2010/01/01 由 radarman
    O'Reilly Home | O'Reilly Beijing | Ignite China(点燃之夜在中国) | Privacy Policy ©2005-2010, O'Reilly Media, Inc.
    All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
    京ICP备05003502号