Kevin,
I think actually you head off in the wrong direction when you mention “probabilities and statistics” here. We could always do probabilities and statistics on large quantities of things - the link between thermodynamics and statistical physics for example, or macro-economics, demographics, etc. It’s just a matter of multiplying something small by something big.
The really interesting thing about dealing with enormous quantities of things with modern information technology is that, while you have a huge amount of data, each piece in that is a true individual, to whatever extent the data has captured that. That is what makes genetics so exciting and different from the probabilistic/digital science that came before: every link in that DNA chain has a meaning, and changing one base pair has real impact on the organism. When you have enormous quantities of information, millions of books for instance - each of those books is an individual. It’s not the statistical properties of the books that matters, it’s the individual impact they have when the person using them finds something they’re looking for, reads it, and changes their own life in response.
The scientific concepts relating to “chaos” are somewhat related, but they also are I believe too much at the probabilistic/statistical end of things. What is really interesting going forward is this issue of the behavior of very large collections in which each individual constituent is fundamentally different. There definitely are emergent properties as we know from biology - can they even be imagined, let alone predicted, just knowing the constituent parts?
I think there’s an analogy in commerce too: the difference between mass production and mass personalization. With database, we now have many examples of websites out there that “know” the individual, and present data tailored to that person, even while in principle allowing anyone with an internet connection to have that same personalization. Where this is all leading? I’m not really sure, but it’s I am sure it’s important!
Posted by Arthur Smith on April 22, 2008 at 7:31 AMis this something similar to conomy of Abundance raised by Chris Anderson?
Posted by dd on April 19, 2008 at 7:16 AMThere is also a downside to zillionics.
One car doesn’t cause much pollution, for example, whereas a billion cars can change the climate. One person throwing cigarettes wherever they happen to be standing when they’re done smoking isn’t that big a deal, but when everyone does it…
Posted by Mike Lingle on April 19, 2008 at 6:51 AM“A zillion data points will give you insight that a mere hundred thousand would never.”
Haha. If the internet has ever proved anything incorrect, it is this.
Insight requires a form of creativity that sheer amounts of data simply does not provide. Ten correct “data points” would be way more helpful that a zillion incorrect ones.
Truth does not naturally come from complexity. If anything, complexity can hide insight, as a million lies can be more powerful than one truth. We’ve been witnessing this in civilization since its inception, why would technology change that?
Posted by Nick on April 18, 2008 at 5:01 PMi’d also point out cosma shalizi’s work on statistical inference, complexity & self-organisation, e.g. http://www.cscs.umich.edu/%7Ecrshalizi/thesis/ - “To find a decent formalization of self-organization, we need to pin down what we mean by organization. The best answer is that the organization of a process is its causal architecture —- its internal, possibly hidden, causal states and their interconnections. Computational mechanics is a method for inferring causal architecture —- represented by a mathematical object called the \epsilon-machine —- from observed behavior. The \epsilon-machine captures all patterns in the process which have any predictive power, so computational mechanics is also a method for pattern discovery.”
i think he’d be a great person for you to interview for your book, btw :P
cheers!
Posted by glory on April 18, 2008 at 10:49 AMThanks, glory. When you say “These ‘centroid’ estimators identify not the single most probable solution, but the solution that is most representative of all the data in a set.” - that’s the kind of new tool set I was imagining.
Posted by Kevin Kelly on April 18, 2008 at 8:59 AMHow do we prevent being paralyzed by zillionic choice?
i kinda argued here — http://avc.blogs.com/avc/2008/04/the-declining-p.html#comment-308467 — that a new kinda statistics is needed; viz. http://www.eurekalert.org/pubreleases/2008-02/bu-bmp022808.php
“How do you sift through hundreds of billions of bits of information and make accurate inferences from such gargantuan sets of data? … Lawrence and Carvalho describe a new class of statistical estimators and prove four theorems concerning their properties. Their work shows that these ‘centroid’ estimators allow for better statistical predictions — and, as a result, better ways to extract information from the immense data sets used in computational biology, information technology, banking and finance, medicine and engineering…
“For more than 80 years, one of the most common methods of statistical prediction has been maximum likelihood estimation (MLE). This method is used to find the single most probable solution, or estimate, from a set of data.
“But new technologies that capture enormous amounts of data — human genome sequencing, Internet transaction tracking, instruments that beam high-resolution images from outer space — have opened opportunities to predict discrete ‘high dimensional’ or ‘high-D’ unknowns. The huge number of combinations of these ‘high-D’ unknowns produces enormous statistical uncertainty. Data has outgrown data analysis.
“This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That’s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it’s not likely to be representative…
“‘Using maximum likelihood estimation, the most likely outcome would be very, very, very unlikely,’ Lawrence said, ‘so we knew we needed a better estimation method.’
“Lawrence and Carvahlo used statistical decision theory to understand the limitations of the old procedure when faced with new ‘high-D’ problems. They also used statistical decision-making theory to find an estimation procedure that applies to a broad range of statistical problems. These ‘centroid’ estimators identify not the single most probable solution, but the solution that is most representative of all the data in a set.”
Posted by glory on April 17, 2008 at 9:51 PMBy way of “Hello!”
What happens if we apply that thinking to ?what? manifold phase-space? Ok, that’s meaningless.
Quantitative / qualitative was a core aspect of Trotsky’s realpolitik … no, that’s a dead-end too.
My point is that symbiosis (tensegrity?) is non-linear, so you get that quant/qual shift with changes that are orders of magnitude less than otherwise required for the same effect. I’ve never fantasized about such stuff. Having done SigInt directly after having dropped out of highschool, I’ve always seen that sort of consideration as an operational requirement.
For example, as gedanken: if we in-folded the 108,000 web pages talking about a major issue, say the conflict in Iraq, in a manner reminiscent of Hesse’s glasperlenspiel, what would be the effect on the discourse? (There might be 10M posts in 108K forums, but there’s /1/ discourse, actually.)
regards —bentrem
p.s. I only just now discovered your blog. Cheers!
Posted by Ben Tremblay on April 17, 2008 at 8:41 PMManaging Zillionics with more computer power is not the answer, at least not for me. Too many choices is daunting ; compartmentalizing is helping for my knowledge hunger.
The contrast of two different types of programmers working on a project a few years back taught me a valuable lesson. When reviewing a programming tool as a solution, one would read the whole manual and the other would zoom in on the chapter. Zooming in with blinders is a healthy, expeditious approach.
Another of my tendencies was to follow the trail of crumbs. An article catches my attention and I get sucked in. Avoiding items of interest that are not urgent and/or important has helped.
In my humble opinion, Zillionics is a major contributor to adult ADD and ADHD. Not unlike trying to compute Pi to the last digit.
Posted by Gary S. Hart on April 17, 2008 at 7:11 PMIf you’re talking about zillions, it’s not so much biology or genetics, but biochemistry - during a single cell’s lifetime a single one of your 30k~odd genes gives rises to maybe hundreds of thousands of copies of a protein. That’s the zillions scale, and it’s truly beautiful. An organised chaos, with more feedback and feedforward loops than you can imagine, and yet emergent properties and visible, predictable patterns, despite the chaos within.
Posted by Phil Bradley on April 17, 2008 at 3:52 PMType the characters you see in the picture above.


Hmm, well the term zillion has no well defined meaning (see http://mathworld.wolfram.com/Zillion.html) so we are left to guess how big is big enough to make a difference.
I’ll just mention the move from 32 bit processors to 64 bit processors here… With the currently popular 32-bit processors you have a maximum of 2^32 bytes of memory or about 4 GB of memory in a fully loaded PC. Addressing a trillion bytes requires a register with 39 bits. In contrast to these numbers we are now moving to architectures that have address spaces of 2^64 bytes which is 16.8 million terabytes or 16 exabytes. Ok, its pretty unlikely anyone will actually have that much memory anytime soon, but the point is that we can begin thinking of much larger memory spaces now.
We’re already applying this ability to use much bigger memory spaces in practice in my research group. While we’re still far away from exa-byte memories it is not unreasonable to start thinking about multi-terabyte memories in some server side applications right now. E.g. http://lwn.net/Articles/272534/
Therefore the move from 32-bit architectures to 64 bits is an example of zillionics.
Posted by dangrsmind on April 30, 2008 at 10:53 AM