Welcome to step three of the Eightfold Way to Data-driven Business. So far, you may have enjoyed Step 1, Right Indentification and Step 2, Right Collection.

You decided in step 1 you needed, say, a user’s telephone number. Fine.

Then, you decided in step 2 that number needed to be collected during the single-dialog subscription procedure.

Everything is going to be all right, no?


And not because your users are especially mean or anything. Simply, they do not share your concern for the quality of data they give you. Exactly like your own employees.

Make a test: check the CRM client records, see if they look legit; if they do, see what relation they bear to the corresponding records in Accounting. Ok, stop cursing now, see what I mean?

So your dialog asks for a telephone number:


Gotta problem widdat?

But of course your programmers are smart, so they made sure the input is actual figures.


Hmmm. I have a feeling it would be of little help in our next campaign. Let’s make sure the input matches telephone number formats for our country, eh? Like (XXX) XXX-XXXX

(123) 456-7890

Not yet. Let’s make sure an area code is really an area code. Of course, area codes, say, in the UK can be 2-digit, 3-digit or 4-digit. Meh.

(020) 123-4567

We could go on. What’s the bottom line? This is:

validation is hard.

Hard because, from the user’s point of view, typing babble or skipping a field is the most economic alternative.

We have known for years that the only way to guarantee the input of a valid email address is to verify it. It works, you can’t open an account anywhere today without responding to an email from the address you gave at registration.

My advice is: if there is a way for you to verify an input, do it. Banks in London can tell you if you get your zip-code wrong, for Heaven’s sake.

Even if you are not a bank, the cost of linking to an existing database and verify the data at input time are peanuts compared to the cost of collecting missing data or cleaning up your dirty database.

Unfortunately, verification cannot be used for all fields. You’ll have to rely on two sub-optimal solutions:

  1. smart programming (i.e., controlling values as much as possible at input time, forcing well-formed formats when possibile)
  2. smart collection (make it into the user interest to provide the data).

You don’t need all those data, certainly not at the same time

OK, I know, you will never have the budget to validate all input. Which is actually good news, because you are saying you are trying to collect information that is not cost effective. Quick, go back two steps to Right Identification!

We are so used to drowning in data that we lose sight of the fact that each single piece of data has a cost: paid by the user to provide the data and by you to collect it.

Let me share with you the two Small Big Truths Of data Collection I have learned over the years:

  1. you do not need all the data Marketing says you do (holds if you are in Marketing, as well)
  2. you do not need all the data at the same time.

A client once called me to assess the usability of its registration procedure. It consisted of 12 one-page forms, with over 60 required fields. Not kidding. The procedure had been designed by Marketing and all those data had been deemed vital. There was no way they would admit the bulk of data collection could be postponed after the registration had taken place, and diluted of the course of many sessions. So, they kept it like  it was, and it eventually killed them.

The above holds for any instance where data collection takes place: user registration, call centres, tech support, CRM.

It is much better to collect few valid data than many “dirty” ones.

Do all you can to validate your data, and if it’s too costly, do question their usefulness.

Action Items

  1. design a validation procedure at collection time for each piece of data
  2. if you cannot validate, verify
  3. if you cannot validate nor verify, beg (aka give the user an incentive)
  4. if step 1 through 3 above are too costly, the data is not cost-effective, then useless
  5. only ask data that are strictly necessary
  6. disseminate data collection as much as possible; long collection sessions are error- and rejection-prone
  7. routinely give incentives to users (even internal ones) to clean the data. Here, gamification works wonders.
Share This

Share This

Share this post with your friends!