30 May

Kolev Says...

Data obfuscation or de-identification is an interesting adventure to undertake. The problem initially disguises itself as being simple: “Give me data that is just as good as my production data; however, I don’t want anyone reverse the process or figure out what the original data was.” The simplest and perhaps a naive solution would be to just replace everything with a random set of characters. The data would certainly be obfuscated but would not be very useful.

As we begin the process, things like current application logic, industry legislation and standards, uniqueness of the data can be used to our advantage or works against us. The way data is used or travels through the application until it is shown to the user have considerable implementations to consider.

To illustrate the possible complexity, here are some examples:

  • No data-tier to the data source: An application that has no centralized data-tier…

