Signal to noise

When people talk about information analysis, there’s often a lot of worry about noise in the data, and the reliability of the data sources. So when you’re building an information analysis system, there are often requirements that have to do with “filtering out bad data” or assigning “reliability scores” to data sources.

But in practice this isn’t usually necessary. With enough data, noise suffers from destructive interference, and signal interferes constructively. I first learned about this in physics class in school, and first encountered it while doing radio astronomy. I was a research assistant on a team that realized we had an opportunity to make the highest quality radio maps ever made of certain galaxies. See, when big stars go supernova, astronomers like to aim the Very Large Array of radio telescopes in New Mexico at them to study how the massive explosions grow and change. They get a lot of data, write their papers, and move on. But supernovas happen in galaxies. And scientists share data. So our PIs realized they had very long exposures of the galaxies containing the supernovae. Radio astronomy “pictures” tend to be noisier than pictures taken by optical telescopes, and noise in a picture of a distant galaxy can overwhelm the detail of star formation regions and the like. However, noise is random, whereas the bright spots (except for the supernova) don’t change noticeably. So if you take the sum of enough radio data, the random noise cancels out, and the actual bright regions become clear. Note, this is not the same thing as astronomical interferometry, although that was a technique we made use of.

The same thing is true of many other types of noisy data. If you only ever look at the data at a point in time, or watch or listen to it as it changes, it’s difficult to see the signal through the noise, but if you have a system that allows this summation to happen, and you can look at the sum, suddenly the picture becomes clear.

Suppose the “ground truth” looks like this:

A draftsman I'm not.

But we have noisy data that looks like this:

I hope that's not an EKG.

If we layer lots of noisy data, we can start to see that the signal’s there…

I drew this better on the whiteboard last week.

But if we can sum the data it looks something like this:

"The hands acquire shakes, the shakes become a warning. It is by caffeine alone I set my mind in motion." --

Now we can clearly see the signal! Is it truth? Not necessarily, but the analyst can now see that there is agreement across the data. If you want more information, you also need a system that gives you a way to dive into the details of what data contributed to the peaks. And now you also have guidance as to where to collect more data, ideally from additional sources. More on that in another post.

This is related to why Google is so good and your organization’s internal Search is so bad. Even though Google’s data source, the Internet, is way noisier than your organization’s intranet (I hope), Google is still better. This is true even if you have an in-house Google appliance. It’s because of Google’s second big data source (equally noisy): the billions of user clicks. Google doesn’t show you the sum of that data, but it does use that aggregation to decide what to show you. In essence, Google finds the peaks of agreement among billions of clicks and shows you just the peaks. Your IT department doesn’t have enough click-data to do that for your organization, no matter how good their software is.

Reliability Scores

You also probably don’t need to assign reliability scores to your data sources, even though it seems like a perfectly logical, even prudent, thing to do. The problem is that the scores will be fairly arbitrary, hard to agree on, and may present a false sense of rigor where there isn’t any. There’s lots of ways a data source can be unreliable, but we’ve found different ways to handle them that avoid these problems. For example:

Problem: There’s hardly any signal (useful information) in the data.
Solution: You’re not going to get any peaks even when you sum it all together. Don’t score it, ditch it.

Problem: There’s a lot of mistakes or inconsistencies in the data.
Solution: That’s the kind of noise that cancels out if you have enough data. If you do, then don’t sweat it.

Problem: The data has been deliberately redacted to remove what you’re looking for.
Solution: The more data there is, the harder it is to do this perfectly. If you get enough of it, you can find what was missed. Also, if you have enough of it, you’ll see mysterious quiet areas of data, because not only is the signal gone, but so is the noise. So you can detect the obfuscation, and you might even be able to catch the deceivers in a mistake.

Problem: The data is out-of-date.
Solution: This is absolutely relevant to the analysts, and it should be documented, but it’s not something you score. The analysts just need to know because data timeliness matters more for some questions than others.

Problem: There are gaps in the data coverage.
Solution: Again, it’s relevant, and should be documented, but it’s not a “reliability” issue. Maybe there weren’t enough 18-25 year-olds in the medical study you’re analyzing. Even so, if there’s a statistically significant result visible for 25-35 year-olds, you’ve still found something; you just don’t know if it works for young people.

Problem: The data’s useful, but its noise is obscuring the signal of other, cleaner data sets.
Solution: Let the users turn data sets on and off as they choose. See, this is actually what people have in mind with reliability scores – they imagine they’ll either weight more reliable data higher, or the user will use the score to decide what to look at. It’s true, you might end up doing some form of weighting. For example, a search engine might weight clicks higher for users who appear to come from a part of the world that speaks the language the results are in, so clicks from Italy have more effect on ranking Italian search results. But you don’t want to do this before you’ve had a chance to work with the data in the system. As for showing the reliability scores to the analysts…believe me, your analysts already have strong opinions on the reliability of individual data sources, and they will ignore your well-intentioned ratings. If you just give them the ability to turn them on and off, they’ll be happy and productive.

In short, signal reveals itself in noisy data if you have enough of it. And have tools that let you work with all of it in aggregate, while still letting you quickly get the details of the revealed signal.


One year ago today:

Tweet of Washington Post article re: closing Sweet Briar College

The news spread and Sweet Briar alumnae all over the world reeled in shock and sadness. How could this have happened? “Insurmountable challenges” responded President Jones and the board. “No stone unturned,” to avoid the lamentable conclusion that Sweet Briar must close.

And, in a thousand ways, the leaders that Sweet Briar had been quietly producing in the Virginia countryside for the last century showed us all how to surmount great challenges.

A powerful social media campaign commenced. It was never viral – a small liberal arts women’s college closing is not sensational. The campaign was simply sustained by relentless effort, to the point that that became noteworthy. Major media outlets began to pick it up, and #savesweetbriar stayed in the news for months.

Months was all that was needed, because #savesweetbriar wasn’t just a hashtag:

Saving Sweet Briar raises $2.6 million

3 months and 17 days later, the alumnae fought the closure all the way to the Virginia Supreme Court, and won a settlement that would prevent the closure and oust the president and much of the board. Legal battles are expensive, but it was only half the battle. The other half was to raise enough money to prove that the college was viable. Small schools (SBC averages about 500 students) don’t actually have that many alumnae, so this wasn’t easy. Worse, the settlement imposed strict deadlines – $12 million in 60 days or the deal was off. That’s a million every 5 days.

They delivered it early.

A year later, the school is revitalized and shows every sign of having a bright future.

A year later, we can be grateful to the gift given to us by former president Jones and the board – every Sweet Briar vixen alive knows they graduated from Sweet Briar College, the small liberal arts women’s college of kick-ass leadership.

5 Ways To Be an Awesome Manager

  1. Approve all leave requests immediately. They’ve already earned it; playing gatekeeper over when they use it is patronizing.
  2. Take responsibility for problems. If you’re late, don’t blame your team. If someone’s not performing, it’s your job to handle it with them directly.
  3. Be a good shit shield. If this job was easy, you wouldn’t get paid so much to do it. That means there are going to be political battles, competition for resources, complaints, demands for you to justify your methods or even your existence, et cetera. Your team needs to know they can trust you to take care of the external turmoil, so they can concentrate on building the thing.
  4. Don’t waste their time. If you’re holding weekly hour-long meetings, you’re probably doing it wrong. If you need to plan or review a sprint, solve a problem, make a decision as a team, fine. Status meetings, if needed, should be fast, which is why many teams do them standing up. Meet with purpose.
  5. Spend time with them. Ideally you’ll all work together in a big room, but that’s rare. Answer emailed questions by walking over and talking about it. Sit with them, in your office or theirs, to solve problems. Call remote team members every couple of days, even if only for a few minutes. Ask for feedback on your own work. This is how great leaders always seem to know what’s going on.

Remember that even though you’re all a team, you’re the one who decides what it’s like to be on the team.


Things to do/read when I want to goof off or an bored that would actually be productive without being painful

So when I’m tired, either physically or emotionally, I find that I tend to seek easy distractions. It reminds me of vegging out to whatever was on “the tube” back when TVs were actually tubes and you had to watch “what was on.” Except now it tends to be FaceBook or Instagram. Not that there’s anything wrong with social media, but I’d rather do it on purpose, not because I’m too decision-fatigued to come up with something better to do.

A lot of the time, when I quit swiping through whatever app it is, I regret that instead of goofing off, I didn’t do something just as easy, but more enriching, like go through articles I’ve bookmarked to read later, or read up on some of the tips and tricks linked from my favorite podcast. Call it goofing on. Sometimes the only reason I didn’t is that I would have had to decide on something and then find it.

So this page is my equivalent of setting out my running shoes the night before – something you do when the motivation and will are high, that reduces the barrier for me when I know it will be low. Give some of these a try if you want, and let me know if you have any top links/resources/activities you’d add!

In the vein of and (formerly LiftApp), the habit I’m trying to build is:

When I want to play a game, check FB, etc., I open this list first.

The list of Goof Ons:

Things to do when I have no energy at all (these are listed first for a reason!)

  • Check my calendar
  • Look at the family photo stream
  • Check Safari reading list
  • Check Kindle and/or iBooks for non-fiction I might feel like reading a page from
  • Read show notes from favorite episodes of The Tim Ferriss Show

Things to do when I’m frustrated and need to do something else for a minute

  • DuoLingo
  • Process email (either business or personal)
  • Think of one thing I’ve learned from the most recent thing I’ve read and put one or more action items in my calendar that will help me put it into practice.
  • read Four Hour Chef
  • do one push-up – 5 sec. down and 5 sec. up
  • do one L pull-up
  • do one air squat
  • Check FB saved links (don’t do this when I’m too fatigued, I’ll open FB and get stuck on the news feed)
Three of us goofing off

Goofing off

Habits I’ve removed from LiftApp


Opinions vary on how long to track a new habit, but I think the best use of Lift is to establish a new positive habit. And, given our limited willpower and cognitive space, it works best to focus on just a few at a time. So I’ve adopted the strategy of no longer tracking some habits once they’re established.

Examples of habits I no longer track in Lift:
Continue reading

How I made my family a hot and cold breakfast in 20 minutes on the first try

Inspired by Cory Doctorow

So I read

and Cory answers the question “What everyday thing are you better at than anyone else?” with “Making breakfast. I make my family a 3-4 course, hot/cold tailor-made breakfast every morning, in 20 minutes flat, with handmade coffees.”

I am no cook, and anytime I do try to make a hot breakfast for my family, I’m always in the kitchen for at least 45 minutes no matter how quick I try to be. So it’s strictly a weekend thing, and even so, we’ve gotten out of the habit because it doesn’t really make sense for Emily to wrangle two rambunctious hungry children alone for 45 minutes while I do it.

I had to know, so I asked him. His response was immediate:

Continue reading

Daily Routine

Zihang H., who I follow on Lift, posted the following question recently:

How to make our lives more interesting since most of our daily lives consists [sic] of endless routine?

It’s been a long time since I’ve been bored, but I thought about it and posted this response:

During the times in my life where my central activities bored me, I was happiest when I spent as much time in active personal development as possible. Other times, like now, my central activities are incredibly challenging (no boredom possible), so my routine is all about setting up a good foundation for the central activities, getting the mundane stuff done as efficiently as possible, and carving out small-but-workable slices of time for personal development.

My story?

My dad bought a TRS-80 Color Computer when I was about 5. I didn’t learn to code, but I saw a modem, heard binary being played on our cassette drive, and learned what a kilobyte is.

Later, I learned Logo and BASIC when I was 8 and 9. Just very simple toy programs. I learned more sophisticated programming in Pascal in high school. I did have books, and Dad got me started, but my schools’ programming classes get at least half the credit.

I started getting paid to work with computers while still in high school. I have made money ever since from working with computers. Even the years I taught ballroom dancing full-time, I wrote software part-time and brought in new revenue at the studio by setting up the website and our first online sales of gift certificates.

Today I live in Vienna and manage a significant software project at the International Atomic Energy Agency. As a job, it’s amazing, and the work is important. I’m writing this from a lovely apartment in Venice where I’m vacationing with my family while the team works without me.

It’s a good life, and I’m incredibly grateful. I wouldn’t be here if I didn’t know how to code.

But it’s not just about the good work you can do and the good life you can have. It’s fun. The things we can do now with software are amazing. A programmer in the 80s would be awed by what’s possible to coders now. It’s not just faster computers, it’s the fact that so much of the world is now online. Take something simple like flight bookings: they were computerized in the 80s (probably earlier), but in closed systems. Today, there are so many ways to tie that information together that travel booking sites abound, and the best ones are so good that we can be near-omniscient about our options. We think little of booking, from our couch, vacations with airlines and hotels we’ve never heard of.

Coders regularly produce apps which do things that weren’t possible a few years ago. My phone (an anachronistic name for the hyper-connected supercomputer I carry in my pocket) can augment my reality in countless ways, but the latest is holding it up and looking through it so that all the Italian writing is replaced with English.

What’s next? Imagine writing code to do this:

  • social apps that allow you to point your finger and write in the sky…where all your friends can see it through their glasses or contact lenses.
  • designing toys and selling them online where buyers click to print them out on their 3D printers
  • building the apps to do the designing I just mentioned or building the site to broker the transactions
  • writing code to control swarms of tiny flying/crawling robots to…well, frankly the first of these will all have military or intelligence applications which may appeal to some, but, after that, there will be plenty of environmental and scientific uses.

I wouldn’t stop for red lights

I’m posting this in a public place where my children will be able to see it forever.

If any of you are ever in the hospital, I’m not stopping for red lights.

I will be there as fast as humanly possible. I will drive, get on a plane, hail a cab (hansom, motorized, whatever), run, or whatever combination gets me to you. That’s true if you tell me I don’t need to come, if you tell me not to come, if I’m estranged from you, if your spouse doesn’t like me, if I have a big meeting the next day, if I’m in a big meeting right then, if I’m not a doctor and can’t fix anything, if your mother is already there, if I’m with your mother who is in a different hospital, if I’m sick and that means they won’t let me anywhere near you, if you’re in the hospital because you did something dumb, illegal, or embarrassing, if I was just there, if I was already planning on going in a few weeks, if I’m going to a wedding, if I’m in a wedding, if I’m at a funeral, if I don’t have any vacation time, if I have to borrow the money from a credit card or a friend. You get the idea.

I won’t wait for you to ask. If you are hurt, scared, or need help, I will be there.

Just sayin’