Archive | May, 2013

Data Obfuscation in MVC

30 May

Kolev Says...

Data obfuscation or de-identification is an interesting adventure to undertake. The problem initially disguises itself as being simple: “Give me data that is just as good as my production data; however, I don’t want anyone reverse the process or figure out what the original data was.” The simplest and perhaps a naive solution would be to just replace everything with a random set of characters. The data would certainly be obfuscated but would not be very useful.

As we begin the process, things like current application logic, industry legislation and standards, uniqueness of the data can be used to our advantage or works against us. The way data is used or travels through the application until it is shown to the user have considerable implementations to consider.

To illustrate the possible complexity, here are some examples:

  • No data-tier to the data source: An application that has no centralized data-tier…

View original post 626 more words

Can You CrowdSource a Program?

30 May

Minds

Fortigent Engineering is always looking to innovate, not only in what we create, but in how we create it. It maybe a bit cliche, but I really believe that ‘thinking inside the box’ constrains a team to produce box-shaped results.

FoodOn Wednesday, the ever-growing Engineering team was locked in a room together for a no holds barred development experiment to stretch the walls of that metaphorical box. Luckily, brain food was on the menu.

In teams of 6 we were challenged to invent a novel presentation of user data given an expansive set of variables. Basically, the teams were tasked with creating a formula that would distill the most pertinent pieces of information into one aggregate product.

Without further adieu, the formula we came up with is:

Forumla

OK, so that’s NOT the formula. In fact, we didn’t come up with a very definitive answer at all. As it turns out, 25 engineers can’t always agree with one another. We did however, carve out a few victories. We were able to identify what variables we believed were essential to solving the problem. We were also able to obtain feedback from almost every individual and evaluate their approach. To that end, my views on the problem were swayed half a dozen times in the span of an  hour because of all the unique ways the team was going at it.

Team

So maybe we didn’t come to a solution once and for all, but we sure are closer. We’ve also proven that exercises like this can peel away at an amorphous set of requirements by harnessing the enormous brain power of our entire team at once. I believe that we have pioneered a new means to problem solve in our group. Sure, it will require some refinement, but I think that large group development can evolve into a powerful tool for solving the increasingly difficult problems we face as a team.

Not to mention it was a pretty sweet way to spend my birthday if it had to be on a work day.

The Power and Simplicity of the Data Warehouse

23 May
“In many ways, dimensional modeling amounts to holding the fort against assaults on simplicity”
– Ralph Kimball, The Data Warehouse Toolkit


Although there are many reasons that an organization may consider building a “data warehouse”, most of the time the overwhelming reason is performance related… a report or web page is just taking too long to run and they want it to be faster. If that data could be pre-computed than that page would be really fast!

I’m here to tell you that speed is not the most important reason for building the warehouse.

When you’ve got a system where multiple consumers are reading the data from the transactional store and doing their own calculations, you create a whole bunch of other problems beyond just the speed issue:

  • You create the potential for multiple different and conflicting results in that system. At least a few of those consumers will inevitably do something wrong.
  • You put a considerable burden on your transactional system to recalculate that data over-and-over again for each custom request.
  • While those consumers are running their long running queries, that data is being simultaneously updated by a multitude of data collection and transformative processes… you are not only reading inconsistent sets of data in the consumers, you are blocking the collection and transformation processes from doing their job, getting in their way and slowing them down… and sometimes even causing them to fail.
  • You’ve created a multitude of intertwined dependencies in that system. This makes attempts to improve or otherwise change that transactional system extremely difficult to achieve without breaking things… and having to make massive system wide changes to accommodate even the smallest change.
  • The bottom line is this: You’ve just got a greatly over-complicated system that is much more prone to performance problems and defects. As Ralph Kimball states so eloquently, data warehouse efforts are a giant move towards simplicity. And simpler systems are better systems.

We recently launched a major warehouse initiative to, once and for all, pre-compute and store all our portfolio-level data. Although that data is already computed from data that’s been pre-aggregated at the account level, there is still considerable additional work required to aggregate that data further to the portfolio level.

Primarily, that’s a major performance problem. Pulling a single portfolio’s data can take as long as 5-7 minutes for larger portfolios. That’s a serious problem with our scalability and an overall burden on our system.

I’m happy to report that the portfolio warehouse initiative is nearing its conclusion and am confident it will do things for us far beyond the performance improvements we hoped to gain:

  • With every consumer pulling from the same warehouse for portfolio level information, we can guarantee they will get the same results… they are all “drinking water from the same well.”
  • The portfolio data can now be processed “incrementally”… i.e. rather than have to recalculate that data from the beginning of time for every request, we can reprocess only that data that has changed. This pays huge dividends on overall system performance and greatly decreases the burden on the system.
  • Our data will now be pulled from a snapshot-enabled data warehouse. This guarantees clean and consistent reads without blocking the transactional store from doing its work.
  • By having one system that reads transactional data and compiles and stores the portfolio data, we only have one system to change when we want to change something in the transactional store. This is hugely liberating to us when we want to modify those underlying systems.
  • The new published warehouse structure for portfolios is simple and easy to understand. It therefore opens up consumption of that data in new ways with less effort, opening the doors to new possibilities that were otherwise impossible. Looking at data for all the portfolios in a firm in one pass is now possible, performing cross-firm analytics that were unthinkable before. This opens a myriad of optionsfor us that we intend to take advantage of.
  • Oh, and the speed is nice also… it’s really fast!

While we are still in the final stages of implementation, we hope to bring this system fully into production over the next few months and are very excited about the possibilities… we hope you are too!

And if you’d like to read about data warehousing, I highly recommend what is, in my opinion, the bible of data warehousing:

The Data Warehouse Toolkit – By Ralph Kimball and Margy Ross

The coolest SQL Server function you never heard of

8 May

Brett W. Green's On the Contrary

Ever heard of the SQL_VARIANT_PROPERTY function? I didn’t think so.

SQL Server developers very often make the mistake of making their NUMERIC fields too large. When faced with a choice of how to size the column, they’ll often think “make it way larger than I need to be safe”.

This works OK as long as you simply store and read these values, but if you ever have to perform math with these columns, particularly some form of division or multiplication, you may find your results mysteriously losing precision.

This is because SQL Server can only store a maximum of 38 digits per number… if the results of your mathematic expression may yield a number larger than that, SQL Server will be forced to downsize it and remove digits from the mantissa as a result.

View original post 193 more words

Dynamically create a function or a stored procedure

8 May

Dynamically create a function or a stored procedure in “some-other“ database from within the stored procedure running in “another“ database.

Note: for the time being, we will forget about possible security issues caused by dynamic SQL.

First of all we all know that CREATE or ALTER procedure clauses do not allow specifying the database name as a prefix to the object name, for instance if you will try to run following statement you will get an error:

CREATE PROCEDURE A.dbo.sp_bar
AS
BEGIN
SELECT 'a'
END;

That brings us to the usual way of setting database and then executing CREATE statement in the script, like this:

USE A
GO
CREATE PROCEDURE dbo.sp_bar
AS
BEGIN
SELECT 'a'
END
GO

Don’t forget about “GO” either, because CREATE or ALTER procedure must be the first statement in a query batch. That in fact itself, brings a little challenge when trying to dynamically create a function or a stored procedure in “A database from within the stored procedure running in “B” database, since sp_executesql does not accept “GO”… if you will try following code:

DECLARE @sql NVARCHAR(100) = 'USE A GO'
EXEC sp_executesql @sql

or even this one

DECLARE @sql NVARCHAR(100)
SET @sql = 'USE A
GO'
EXEC sp_executesql @sql

both will return “Incorrect syntax near ‘GO’” error message

So, to get around of all those limitations, create a little stored procedure in the target database (assuming that you have control over database and that you have thought about all the security concerns, and you have limited all access to such stored procedure) that will look like this:

USE A
GO
CREATE PROCEDURE dbo.sp_executesql_inTHISDatabase
(
@sql NVARCHAR(MAX)
)
AS
BEGIN
SET NOCOUNT ON;
EXEC sp_executesql @sql;
END
GO

All we need now is to do what we originally intended

USE B
GO
DECLARE @sql NVARCHAR(800)
SET @sql=
'CREATE PROCEDURE dbo.sp_bar
AS
BEGIN
SELECT ''a''
END;
';
EXEC A.dbo.sp_executesql_inTHISDatabase @sql;

Hurray! It works.

But now… please do remember all those security concerns we were trying to forget for a little while.