Tag Archives: #sql-server

Interesting Links #1
Since I manage to read so much on the train I think readers will find some of the articles useful so I plan on listing up the best ones each month.

Business Intelligence

Design Tip #162 Leverage Data Visualization Tools, But Avoid Anarchy - This month’s Kimball Group design tip and incredibly timely considering how we are using Tableau at work. I think we should make it required reading for all business users of Tableau.

Databases

The Baker’s Dozen: 13 Differences Between Analysis Services OLAP and Tabular - An in-depth look at the functional and usage differences between the two flavours of Analysis Services.

Clustered Indexes vs. Heaps - Not a lot of people know that… Thomas Kejser goes in-depth on clustered index performance relative to heaps for both OLTP and OLAP workloads. I bet there is something for everyone to learn in this article.

Indexing a PK GUID in SQL Server 2012 - Again Thomas debunks some myths about GUID keys and scalability in OLTP systems.

Code

Complete Guide to Lazy Loading in C# - The Lazy type in C# 4.0 is a useful tool for performance optimising applications. This article describes its use and various threading options.

F#, Deedle and Computational Investing - Another example of how concise F# is; stock correlation charts in under 75 lines of code.

Testing

Patterns of Effective Test Setup - A set of techniques for avoiding complete unit test setup code; ensuring test clarity and reducing brittleness. This is just the start really and libraries such as AutoFixture can help even more once you have the basics right.

Unit Testing SQL Server OLAP Cubes Using C# - Not really unit testing by most definitions, more like regression testing. In many ways similar to what we do at work with some interesting additions.

Test SAML with #Tableau Server on the cheap - If you end up having to configure and test Tableau SAML this will help.

Development Process

Workflows of Refactoring - A great little slide deck from Martin Fowler about the various refactoring workflows (hint: it is to never refactor and add functionality at the same time).

When Should Y ou Refactor - Everyone remembers the conversation with your manager “We really need to spend some time refactoring; can we add some time in the schedule?”. This article discussing this “Big Bang Refactor” with a far better “Incremental Refactor”.

The Value of Persistent Chat in Incident Management, Support and Business Continuity - We have talked about persistent chat a lot in our sprint retrospectives. This is a bit salesy but good points on the value it brings.

When is it a Good Idea to write Bad Code? - Discusses the trade-offs you make when introducing technical debt into the code base.

How to Run a Successful Open Source Project - Good advice for all successful projects, not just open source ones.

Personal Development

Best development book I’ve read, has no code in it - Looks like one of those must read books for those that take their career seriously. Love the quotes “If you’re worried that your current job is rotting your brain, it probably is.” and “Expose Your Ignorance. Tomorrow I need to look stupider and feel better about it. This staying quiet and trying to guess what’s going on isn’t working so well.”

Don’t Get Me Started: The Steam Drill - Learn to recognise when your skills are out of date and need refreshing to stay competitive.

Uber-Architects: The Building Metaphor Is Dead - Not what you think. The role of the architect is changing for the better.

Organisational Behaviour

The Open-Office Trap - New Yorker article rounding up all the research done one open space workplace productivity. Some interesting results among the expected ones.

Can-Do vs. Can’t-Do Culture - “The trouble with innovation is that truly innovative ideas often look like bad ideas at the time.” Next time you are thinking why something won’t work, take a moment to consider if you are stopping innovation.

Don’t interrupt developers - Absolutely nails why you should not interrupt developers.

Are Your Programmers Working Hard, Or Are They Lazy? - “the appearance of hard work is often an indication of failure” - a must read for both developers and managers.
This entry was posted in reference and tagged #business-intelligence #code #indexes #organisational-behaviour #personal-development #software-process #sql-server #tableau #testing on Jan 15, 2014 .
Discuss this on Twitter or LinkedIn
Mental Health Functional Architecture
Context is everything with architecture. I’ve often had conversations which started with the phrase “how come you didn’t…” – once the context is explained the decision is usually obvious.

The context for this architecture is purely my own since there are no real customers. I want to satisfy the concerns and requirements as simply as possible but leave room to swap out parts of the architecture to investigate new approaches and technologies.

The diagram below is a pretty standard set of data warehouse components. If you are a traditional Microsoft guy then, from left to right, the components would be Integration Services, SQL Server, Integration Services again, SQL Server, Analysis Services and Excel or Reporting Services respectively. Alternatively you might be using Hadoop as the source mirror and Tableau for the data mart and consume components or some other combination.

I always try to set firewalls within an architecture so that any problems can be isolated and replaced without too much disruption. In this instance I’m going to use those firewalls so that I can try out new ideas and technologies.

The main synchronisation points are the three data stores – Source Mirror, Data Warehouse and Data Mart. (In this instance I am using the term data mart to mean a prepared, subject area specific store optimised for analytical/aggregate queries.)

The responsibilities for each stage are as follows:

Acquire > Source Mirror: receive data from source, ensure minimal load on source via high watermarks or another strategy, archive data for historical accuracy. A key point here is that the source mirror has the same metadata as the source itself. No joins or transforms on the way. Sometimes simple data conversions are useful but less is more.

Load > Data Warehouse: apply business logic and transform the source data into dimensional model, update data warehouse.

Summarise > Data Mart: aggregate generation or cube processing.

Consume & Act: covers any output tool such as Reporting Services, Excel, Tableau Dashboard, R, F# etc.

I consider the SQL/relational store to be “the data warehouse” and not Analysis Services which is better suited to a data mart role.

It’s quite hard to be succinct when talking about architecture and this post is quite lengthy so I’ll split it and talk about risk driven architecture in the next post.
This entry was posted in sample-solution and tagged #apache-hadoop #architecture #data-warehouse #excel #microsoft #microsoft-excel #source-data #sql-server #sql-server-integration-services on Apr 18, 2013 .
Discuss this on Twitter or LinkedIn
Integration Services Design Principals

Whilst doing some design work today for a customer project I realised there are a set of principals I try and adhere to when creating SQL Server Integration Services packages. The list is no doubt incomplete but this is what I have so far.

Minimise IO

This is a general data processing principal. Usually disk and, to a lesser extent, network performance determine the overall processing speed. Reducing the amount of IO in a solution will therefore increase performance.

Solutions that consist of multiple read-process-write steps should be redesigned into a single read-process-process-process-write step.

Prefer Sequential IO to Random IO

Disks perform at their best when sequentially reading or writing large chunks of data. Random IO (and poor performance) manifests when procedural style programming occurs - signs to look out for are SQL statements modifying/returning only few rows but being executed repeatedly.

Watch out for hidden random IO - for example, if you are reading from one table and writing to another in a sequential manor then disk access will still be random if both tables are stored on the same spindles.

Avoid data flow components that pool data

Data flow components work on batches of data called buffers. In most instances buffers are modified in place and passed down stream. Some components, such as “Sort” cannot process data like this and effectively hang on to buffers until the entire data stream is in memory (or spooled to disk in low memory situations). This increased memory pressure will affect performance.

Sometimes SQL is the better solution

Whilst the SSIS data flow has lots of useful and flexible components, it is sometimes more efficient to perform the equivalent processing in a SQL batch. SQL Server is extremely good at sorting, grouping and data manipulation (insert, update, delete) so it is unlikely you will match it for raw performance on a single read-process-write step.

SSIS does not handle hierarchical data well

Integration Services is a tabular data processing system. Buffers are tabular and the components and associated APIs are tabular. Consequently it is difficult to process hierarchical data such as the contents of an XML document. There is an XML source component but it’s output is a collection of tabular data streams that need to joined to make sense.

Execute SSIS close to where you wish to write your data

Reading data is relatively easy and possible from a wide variety of locations. Writing data, on the other hand, can involve complex locking and other issues which are difficult to optimise on a network protocol. In particular when writing data to a local SQL Server instance, SSIS automatically used the Shared Memory transport for direct inter-process transfer.

Don’t mess with the data flow metadata at runtime

It’s very difficult to do this anyway but worth mentioning that SSIS gets it’s stellar performance from being able to setup a data flow at runtime safe in the knowledge that buffers are of a fixed format and component dependencies will not change.

The only time this is acceptable is when you need to build a custom data flow programmatically. You should use the SSIS API’s and not attempt to write the package XML directly.

This entry was posted in data-warehousing and tagged #integration-services #sql-server #sql-server-integration-services #ssis on Jun 30, 2009 .
Discuss this on Twitter or LinkedIn

Tag Archives: #sql-server

Interesting Links #1

Business Intelligence

Databases

Code

Testing

Development Process

Personal Development

Organisational Behaviour

Mental Health Functional Architecture

Integration Services Design Principals

Minimise IO

Prefer Sequential IO to Random IO

Avoid data flow components that pool data

Sometimes SQL is the better solution

SSIS does not handle hierarchical data well

Execute SSIS close to where you wish to write your data

Don’t mess with the data flow metadata at runtime