Category Archives: Science

Signal to noise

When people talk about information analysis, there’s often a lot of worry about noise in the data, and the reliability of the data sources. So when you’re building an information analysis system, there are often requirements that have to do with “filtering out bad data” or assigning “reliability scores” to data sources.

But in practice this isn’t usually necessary. With enough data, noise suffers from destructive interference, and signal interferes constructively. I first learned about this in physics class in school, and first encountered it while doing radio astronomy. I was a research assistant on a team that realized we had an opportunity to make the highest quality radio maps ever made of certain galaxies. See, when big stars go supernova, astronomers like to aim the Very Large Array of radio telescopes in New Mexico at them to study how the massive explosions grow and change. They get a lot of data, write their papers, and move on. But supernovas happen in galaxies. And scientists share data. So our PIs realized they had very long exposures of the galaxies containing the supernovae. Radio astronomy “pictures” tend to be noisier than pictures taken by optical telescopes, and noise in a picture of a distant galaxy can overwhelm the detail of star formation regions and the like. However, noise is random, whereas the bright spots (except for the supernova) don’t change noticeably. So if you take the sum of enough radio data, the random noise cancels out, and the actual bright regions become clear. Note, this is not the same thing as astronomical interferometry, although that was a technique we made use of.

The same thing is true of many other types of noisy data. If you only ever look at the data at a point in time, or watch or listen to it as it changes, it’s difficult to see the signal through the noise, but if you have a system that allows this summation to happen, and you can look at the sum, suddenly the picture becomes clear.

Suppose the “ground truth” looks like this:

A draftsman I'm not.

But we have noisy data that looks like this:

I hope that's not an EKG.

If we layer lots of noisy data, we can start to see that the signal’s there…

I drew this better on the whiteboard last week.

But if we can sum the data it looks something like this:

"The hands acquire shakes, the shakes become a warning. It is by caffeine alone I set my mind in motion." --

Now we can clearly see the signal! Is it truth? Not necessarily, but the analyst can now see that there is agreement across the data. If you want more information, you also need a system that gives you a way to dive into the details of what data contributed to the peaks. And now you also have guidance as to where to collect more data, ideally from additional sources. More on that in another post.

This is related to why Google is so good and your organization’s internal Search is so bad. Even though Google’s data source, the Internet, is way noisier than your organization’s intranet (I hope), Google is still better. This is true even if you have an in-house Google appliance. It’s because of Google’s second big data source (equally noisy): the billions of user clicks. Google doesn’t show you the sum of that data, but it does use that aggregation to decide what to show you. In essence, Google finds the peaks of agreement among billions of clicks and shows you just the peaks. Your IT department doesn’t have enough click-data to do that for your organization, no matter how good their software is.

Reliability Scores

You also probably don’t need to assign reliability scores to your data sources, even though it seems like a perfectly logical, even prudent, thing to do. The problem is that the scores will be fairly arbitrary, hard to agree on, and may present a false sense of rigor where there isn’t any. There’s lots of ways a data source can be unreliable, but we’ve found different ways to handle them that avoid these problems. For example:

Problem: There’s hardly any signal (useful information) in the data.
Solution: You’re not going to get any peaks even when you sum it all together. Don’t score it, ditch it.

Problem: There’s a lot of mistakes or inconsistencies in the data.
Solution: That’s the kind of noise that cancels out if you have enough data. If you do, then don’t sweat it.

Problem: The data has been deliberately redacted to remove what you’re looking for.
Solution: The more data there is, the harder it is to do this perfectly. If you get enough of it, you can find what was missed. Also, if you have enough of it, you’ll see mysterious quiet areas of data, because not only is the signal gone, but so is the noise. So you can detect the obfuscation, and you might even be able to catch the deceivers in a mistake.

Problem: The data is out-of-date.
Solution: This is absolutely relevant to the analysts, and it should be documented, but it’s not something you score. The analysts just need to know because data timeliness matters more for some questions than others.

Problem: There are gaps in the data coverage.
Solution: Again, it’s relevant, and should be documented, but it’s not a “reliability” issue. Maybe there weren’t enough 18-25 year-olds in the medical study you’re analyzing. Even so, if there’s a statistically significant result visible for 25-35 year-olds, you’ve still found something; you just don’t know if it works for young people.

Problem: The data’s useful, but its noise is obscuring the signal of other, cleaner data sets.
Solution: Let the users turn data sets on and off as they choose. See, this is actually what people have in mind with reliability scores – they imagine they’ll either weight more reliable data higher, or the user will use the score to decide what to look at. It’s true, you might end up doing some form of weighting. For example, a search engine might weight clicks higher for users who appear to come from a part of the world that speaks the language the results are in, so clicks from Italy have more effect on ranking Italian search results. But you don’t want to do this before you’ve had a chance to work with the data in the system. As for showing the reliability scores to the analysts…believe me, your analysts already have strong opinions on the reliability of individual data sources, and they will ignore your well-intentioned ratings. If you just give them the ability to turn them on and off, they’ll be happy and productive.

In short, signal reveals itself in noisy data if you have enough of it. And have tools that let you work with all of it in aggregate, while still letting you quickly get the details of the revealed signal.

Don’t Put Limits on God

It’s natural that our comprehension of God is limited by our imagination. That’s why scientists get rankled at the notion that science takes the wonder out of the world. An astronomer spends her life wrapping her mind around the biggest, and wondrous, concepts in the universe, a biologist spends his life wrapping his mind around the most intricate, and wondrous, details of the universe, and so on.

Often, scientists are agnostic. Their concept of the natural world is so amazing, the supernatural holds no attraction for them.

But many of these intellectual types do have spiritual, and even religious, beliefs. And, as a result of the mind-expanding concepts they deal with on a daily basis, their concept of God (by any name) is HUGE. They are mystified by the conflicts about keeping Christ in Christmas, keeping God in schools, whether or not God blesses America, who gets married, and whether our national pledge also affirms God. And that’s the American perspective. They are equally mystified that God cares whether men wear beards, women drive, or a religious figure is depicted in a picture.

From that perspective, God doesn’t have a country. He doesn’t even have a planet. Earth is a mote of dust in a mote of dust in a mote of dust in a mote of dust in God’s full creation. He doesn’t have a holiday…in fact the whole of human history is an eye-blink in His creation. God is present in school and Christmas and a foxhole because God is everywhere and everywhen to an unfathomable degree, not because of national policy.

You don’t have to be a scientist yourself to understand this, but anyone reading this has a responsibility to keep a proper sense of perspective. If you really realize the grandeur of His creation, you can’t help but glimpse that these conflicts are insignificant. Irrelevant. Petty. Needlessly fearful.

If you’re worried about whether God is in…anything…you’ve forgotten Who you’re talking about.

Still think God cares about who the US president is? ...what's printed on US currency? ...what's on the lawn at city hall?

Fun science fact about the sky

Fun science fact about the sky: did you know that it has several huge galaxies in it? I don’t mean the universe, everybody knows there are zillions (that’s a technical term) of galaxies out there.  I mean the sky. The Andromeda Galaxy, which is the one that stands in for our own galaxy in probably every “You are here” T-shirt and poster you’ve ever seen, covers 190′ x 60′ of sky. What does that mean? Well the moon covers about 30′ of sky, so…

Andromeda Galaxy with moon inset to show relative size

You just can’t see it like that because it’s too dim! But I still love mentally picturing it.

The scene at the end of The Empire Strikes Back is actually similar to our own sky – it’s just that the galaxy they’re looking at would also be too dim to show up like that, even from a spacecraft:

C-3PO, R2D2, Luke, and Leia looking out the window of a spacecraft at a galaxy at the end of The Empire Strikes Back.

Thinking Critically Means Not Being Publicly Foolish

Fake picture purporting to show rare alignment of planets with the pyramids of Giza

What’s wrong with this picture?

Seriously, you should be able to debunk this yourself, without looking anything up, right away, if you think about it.

If you know how and are willing to think critically.

Critical thinking happens after “Wow, neat,” and before “Let me Like/Share/Tweet this!”

What’s special about the picture? The planets are over the pyramids, but couldn’t you just figure out where to stand to do that almost any time? The apparent heights don’t match at all. The planets are in a nice line, but aren’t there a bunch of planets moving around the sky all the time? That’s probably not so rare. Internet hoaxes are common, so this is probably just that.

At this point, you could also check Snopes, but you know you probably needn’t bother. You haven’t proven anything yet, but you can tell the likelihood that the statement is true is pretty low.

If you won’t think critically, you will believe and repeat myth.

If you combine critical thinking with just a little bit of knowledge about reputable sites (yes, Wikipedia counts 99% of the time), you’ll be safe from believing myth. You’ll be able to be impressed by things which really are amazing.

Pop quiz: what if you saw this in your news-feed: a picture with the planets right at the tip of each pyramid and the caption “At midnight on 12/21/2012, the eyes of the Great Sphinx will be looking at this!”

Now, that would be a truly impressive claim. The details like the exact time, the more precise planetary positioning, and the fixed position and angle of the observer…that would be amazing! And there’s nothing inherently wrong about the statement on the face of it. But you should be able to dispense with it in about 30 seconds with a quick fact check. Hint: go here and just look…no need to read.

The best thing about thinking critically is that it gets faster and easier. Like any muscle, you can develop it, and the result is having a built-in BS detector. You use it like a filter, and your experience in life is more pleasurable, because there’s less noise in your FaceBook newsfeed, your Twitter feed, your email, or the ads you see.

…but we don’t understand tides

A couple of weeks ago I wrote about how we live in the future. And I commented, as an aside, that in light of the truly amazing degree to which we have been able to figure out the universe, it boggles the mind that there are people who are apparently impervious to that understanding. Worse, they base their whole belief systems on that ignorance.

Case in point:

  1. Transit of Venus photographed by Solar Dynamics ObservatoryThings like this truly amazing video appear, not on some obscure scientific website, but on Time’s Newsfeed.
  2. Bill O’Reilly recently said  that we haven’t figured out what causes the tide to go in and out with great regularity.

Continue reading