We've all been there, walk into a project that involve some sort of data aggregation, reporting, analytics or some such thing and thought "easy, no problem" only to later realize the data you are to query against is basically an Excel Spreadsheet tossed into a single database table, and over the years has evolved into a table with hundreds of thousands, in not millions of rows that have just been put there for archiving, with never a thought of actually using it.
Yep, always a grand job.
If such data is only a few hundred, or few thousand rows, it's no big deal to have a couple sloppy queries written quickly for the reporting, but, if there is an archive of millions of rows, you are in for a much more challenging experience.
The first step is to check out the data, take note of what is repetitive within it. Most likely, this will be the same data that will eventually make up a "parent" table in a normalized schema. In the instance of something like point of sale data, for example, repeating data might be the store number, department code and maybe a date. Child data would be perhaps a name, item sold and amount.
A simple query to create the parent table from the ugly source table would be, for example:
INSERT INTO parentData (store_number, department_code, sale_date) SELECT DISTINCT store_number, department_code, sale_date FROM POS_History
That would create the parent table, with a single row for each store and department for each date in one large dump.
To create a child table would be something similar in the simplest way:
INSERT INTO childData (name, item, amount, store_number, department_code, sale_date) SELECT name, item, amount, store_number, department_code, sale_date FROM POS_History
Now, this creates a child table that can be related back to the parent via store number, department code and sale date.
However, this could make for a slow query time after time selecting on 3 columns of possibly numbers, strings and dates...so we could do something tricky.
The parent table could have one extra field that would contain a concatenation of the three fields, and do the same thing in the child table, plus add a parentDate_id field to act as a foreign key to a primary key in the parentData table named "id", while removing the three individual fields.
Then, after the two large dumps are done to create those two tables, do an update to match up data based on that concatenation field like so:
UPDATE childData SET parentData_id = parentData.id FROM childData INNER JOIN parentData ON childData.concatField = parentData.concatField WHERE childData_id IS NULL;
By going this route you can do two bulk inserts and an update to determine relationship later, rather than one bulk update then loops through that data set (generally involving a cursor or other horribly expensive sin) and get the relationship thru the loop. In the interest of saving database disk space, you can also null out those temporary key fields after processing, and then this can also work as a flag to say "yep, already been processed" for future processes.
I have, in a couple instances, cut the process of normalizing from 14 hours to 7 minutes...literally.
Musings from the mind of a long-time web entrepreneur in no particular order and for no particular reason.
Tuesday, May 24, 2011
Normalizing Abnormal Data
Tuesday, May 17, 2011
A Quick IE6 Z-Index Fix
Yeah, IE6, old news, 5+ years old, you'd think it wasn't a problem, but alas, it is still around 10% of the average usage, and in some b2b situations, much higher if you are dealing with a web app that has corporate clients that still have it installed across their network.The later is the issue I am working with. Combine that with the fact that we are also using a lot of AJAX and floating divs and modal dialogs that the problem is cropping up.
When fighting IE6 these days it boils down to how much you plan to invest in fixing, and how to do it without lessening the rest of the application. The z-index issues tend to need fixing. The biggest problem is the z-index of elements that IE considers ActiveX object, windowed opject like form select fields. Those are always considered the highest z-index of everything on the page, which makes modal dialogs or other floating divs a real nightmare to work with.
I chose the route of the quickest and easiest fix...there are options of using iFrames, which I immediately throw out, and various, ugly hacks of CSS and JavaScript, just seems cludgy.
My fix that took minutes to install and works great...nothing fancy; just before the modal dialog pops up, hide the form with the style.visibility CSS function to hide the form without moving all the rest of the elements around. If you use style.display the rest of the page will move around to replace that now empty space, with visibility, the space is still "used" it's just invisible. Then, when the dialog/div is closed, make it visible again.
Seemed to me to be the best option, with the least time involved in implementing. Of course, you can set up the functions to only make the form disappear if it's an IE6 browser visiting, otherwise, skip it and have it behave as typical of any browser created in the last few years.
Then, after all that is done, look in to a plan to sunset support for IE6.
Then, after all that is done, look in to a plan to sunset support for IE6.
Thursday, May 12, 2011
Google Chromebook Released
So, as everyone has likely heard, Google announced the final release of it's new laptop, Chromebook, during it's developer conference this week. For educational institutions, they have contract plans for $20 a month, and for business it's $28 a month. These contracts include various services, and hardware upgrades periodically. The prices for consumers varies from $349 to $499 depending on manufacturer and screen size.I have been testing a Google laptop, beta test named CR 48, I've been using it for a few months, and it is nice, many problems have been vastly improved over the months and I very much enjoy using it, but I gotta say, that seems a bit pricey for my blood.
Considering that even your average netbook, or even eReaders, are much lower than this, and the Chromebook is actually a netbook, by it's truest definition, I do not see how Google can justify that type of price tag for the product they are offering.
I had been testing it, assuming that the price point was going to be much lower than that. I mean, it is nice, it feels solid and is well built, but, at the end of the day, it's just a netbook, nothing more, nothing less. There is little to no local file control, all applications are cloud based such as email, office apps, etc.
I gotta admit, I am curious to see how this venture works out for Google. I never begrudge success and hope it works out well, but I know this, I won't be buying one when I can get a full laptop with Windows 7 for about the price of the higher end Chromebook.
Subscribe to:
Posts (Atom)