Representing MultiValue Data In an ADO.NET DataSet
Aug
10
Written by:
8/10/2010 2:29 PM
While it supports non-relational data sources, Microsoft’s ADO.NET technology is built first and foremost to work with relational data. When retrieving data that is non-relational, it is necessary to map it to a first-normal-form (FNF) structure. MultiValued systems allow you to create a table that has up to two levels of nesting contained within it. For instance, you can have an invoice item that contains invoice lines, and those invoice lines can in turn have fields that track options. So, for instance, you could have
Invoice Date: 1 Aug 2010 Invoice # 10234 Customer: ABC Company
Line 1: XYZ Radio
Option 1: Name = Color, Value = Black
Option 2: Name = Voltage, Value = 110
Line 2: Headphones
In a relational database, the 2 levels of 1 to many relationships would force you to have a minimum of 3 tables to support this. In a MultiValued system, this can all be contained in a single table, known in MultiValued terminology as a “file”.
Although Microsoft supports non-relational data sources, they ultimately require you to normalize the data at least to FNF. i.e. Multiple Values need to be extracted and flattened into a separate view for each grouping, for all levels of nesting.
Microsoft also insists that for any column (attribute) the data type that is used to represent it be consistent and unchanging.
In this document, we describe how the DbDataAdapter Fill method automatically creates multiple tables and both keys and foreign keys to link them to each other, based on the relationships between these when MultiValued data is presented to it.
The FusionWare Direct DAS account “GAXOLEDB” includes a file named “CUST” which includes 2 MultiValued sets, each of which contains a SubValued set. It contains all the data types supported by your MultiValue system and also includes an example of a record with non-conforming data ( item-id “BAD.3”).
Here is a dump of an item from the CUST file:
>CT CUST 1
1
001 JOHN DOE
002 55
003 9567
004 10324753
005 13266
006 S]C
007 SAVINGS]CHEQUING
008 12335\23447]59866
009 1237\125618]325067
010 14001
011 DONT LOSE THIS
012 TRANSFER FROM ANOTHER\BRANCH
013 HERE'S A BUNCH OF TRAILING
014 STUFF THAT WE WANT TO MAKE
015 SURE WE DON'T LOSE.
016 14001\14001]14001
Here is that same data annotated:
1 The Item Id of the file (CustomerId)
001 JOHN DOE Customer Name
002 55 Age (in years)
003 9567 Birth Date (I know, they don’t match up!)
004 10324753 Net Worth (MD2 – 2 stored decimal places)
005 13266 Birth Time (odd, but lets us work with a time field)
006 S]C MultiValued field: Account Type Code(s)
007 SAVINGS]CHEQUING MultiValued field: Name of account type
008 12335\23447]59866 SubValued field: Account number(s)
009 1237\125618]325067 SubValued field: Account balance(s) (MD2 – 2 decimals)
010 14001 MultiValued field: Visit date
011 DONT LOSE THIS Ignored field
012 TRANSFER FROM ANOTHER\BRANCH SubValued field: Visit notes, multiple lines per visit
013 HERE'S A BUNCH OF TRAILING This and all subsequent fields are unmapped and ignored.
014 STUFF THAT WE WANT TO MAKE
015 SURE WE DON'T LOSE.
016 14001\14001]14001
We actually have two MultiValued groupings, and each of these has one SubValued grouping nested within it.
The first grouping is a grouping of account types. We have “S”avings and “C”hequing in this item, but some of our records also have “P”ersonal Line of Credit types.
For each account type multivalue, there are two attributes that are part of a nested SubValued grouping. This grouping contains a list of all the account numbers and balances for the controlling multivalue’s account types. So, for instance this item has two savings accounts with balances and only one chequing account.
The second multivalued grouping is a list of dates on which the customer visited the branch. Associated with this one field is a SubValued grouping with a single field, which contains multiple lines of text explaining what occurred during the visit.
There are 4 items in this file. Their item ids are 1, 2, 3 and BAD.3. The last one “BAD.3” is an example of non-conforming data. i.e. The fields defined as DATE, TIME, INTEGER and DECIMAL all have invalid data in them. Invariable, large MultiValued tables wind up with bogus records in them. This enables us to test and demonstrate how our products allow you to deal with these real-world anomalies.
The file contains dictionaries for all the relevant fields.
We also provide a mapping of this file that exposes some of the fields:
>CT DICT CUST %OLEDB.T.CUST%
%OLEDB.T.CUST%
001 VARCHAR]CUSTID]0
002 VARCHAR]CustName]0
003 INTEGER\]Age]0
004 DATE]Birthdate]0
005 DECIMAL\2]NetWorth]0
006 TIME]Birthtime]0
007 VARCHAR]AccountType]1\AccountTypes
008 VARCHAR]AccountTypeName]1\AccountTypes
009 VARCHAR]AccountNumber]2\AccountTypes\Accounts
010 DECIMAL\2]AccountBalance]2\AccountTypes\Accounts
011 DATE]VisitDate]1\Visits
012 IGNORE]]
013 VARCHAR]VisitNotes]2\Visits\VisitNoteLines
The first attribute of this mapping item represents the item id (sometimes called attribute 0). The second attribute represents data in attribute 1 of the item, and so on.
Note that attribute 12 is marked “IGNORE” for data type, which means we have not mapped attribute 11 of the file’s data. Attribute 12 *is* mapped, but everything after that is not. We put data in there to allow us to test that the Update method does not lose this data.
The ‘]’ characters are actually value delimiters and the ‘\’ characters are subvalue delimiters. The structure of this mapping file is:
Data type\Scale]Field name]Level\Multivalue group name\Subvalue group name
Scale (and the subvalue delimiter for it) are optional unless the data type is DECIMAL.
Level is 0 for single valued fields, 1 for multi valued fields, and 2 for sub valued fields.
Multivalue group name is optional if level is 0.
Subvalue group name is optional if level is 0 or 1.
ADO.NET represents nested as a series of IDataReader instances. The top level one is the parent of all the others, and in our case has all the single valued fields. For each multivalued group there is a field in the top level IDataReader, which is itself an IDataReader. This field is named based on the multivalue group name. Within this nested, multivalued IDataReader, for each subvalued group that is contained within the multivalued group, there is, again, another field that is itself a nested IDataReader.
When you ask the DbDataAdapter implementation to “Fill” the DataSet object based on the IDataReader, it asks for (or defaults) the top level table name. Let’s say that we name it “CUST”. The top level IDataReader gets a table named CUST, in this case.
This table initially contains all the single valued fields. For every nested IDataReader that this contains, a table is created, which is named “CUST” suffixed with the name of that IDataReader field, which in this case is the name of the multivalued group. In our example we have a multivalued group name of AccountTypes and another of “Visits”, so we wind up with two tables: CUSTAccountTypes and CUSTVisits. In order to allow linking of rows in the multivalued tables to their parent rows in the top level table, an integer field named after the IDataReader field name is created with a unique numeric sequential, zero-based row id. So we have two fields, named AccountTypes and Visits in the CUST table.
Furthermore, the CUSTAccountTypes table has a matching foreign key field named “CUSTAccountTypes” which is the link back to the AccountTypes field in the CUST table, and the CustVisits table has a matching foreign key field named “CustVisits” which is the link back to the Visits field in the CUST table.
This exercise is repeated to the subvalue level.
Tables Generated for the CUST File Mapping
|
Nesting Name
|
DataTable Name
|
Description
|
|
CUST
|
CUST
|
This is the table of single valued data
|
|
AccountTypes
|
CUSTAccountTypes
|
This is the table with all multivalued fields from the AcountTypes multivalued group.
|
|
Accounts
|
CUSTAccountTypesAccounts
|
This is the table with all subvalues in the Accounts subvalued group which is nested within the AccountTypes subvalued group.
|
|
Visits
|
CUSTVisits
|
This is the table with all the multivalued fields from the Visits multivalued group.
|
|
VisitNoteLines
|
CUSTVisitsVisitNoteLines
|
This is the table with all the subvalued fields from the VisitNoteLines subvalued group which is nested within the Visits multivalued group.
|
|
Table Name
|
Key Field Name
|
Description
|
|
CUST
|
AccountTypes
|
Key to identify the rows in the CUST table so that the AccountTypes table can link back to the appropriate single-valued row (item).
|
|
CUST
|
Visits
|
Key to identify the rows in the CUST table so that the Visits table can link back to the appropriate single-valued row (item).
Note that the values in AccountTypes and Visits will always be the same. This is redundant, but it works.
|
|
CUSTAccountTypes
|
CUSTAccountTypes
|
Foreign key to link back to the AccountTypes field of the singlevalued row that is the parent of this row.
|
|
CUSTAccountTypes
|
Accounts
|
Key to identify the row in the CUSTAccountTypes table so that the CUSTAccountTypesAccount table can link back to this table.
|
|
CUSTAccountTypesAccounts
|
CUSTAccountTypesAccounts
|
Foreign key to link back to the multivalued row that is the parent of this row.
|
|
CUSTVisits
|
CUSTVisits
|
Foreign key to link back to the Visits field for the singlevalued row that is the parent of this row.
|
|
CUSTVisits
|
VisitNoteLines
|
Key to identify the row in the CUSTVisits table so that the CUSTVisitsVisitNoteLines table can link back to this table.
|
|
CUSTVisitsVisitNoteLines
|
CUSTVisitsVisitNoteLines
|
Foreign key to link back to the multivalued row that is the parent of this row.
|
- These key fields are zero-based. Multivalue and subvalue numbers are one-based.
- Multivalue numbers for each item restart at 1. These key fields however are sequential for all items represented in the table. If the first item has 2 multivalues (0,1) then the next item starts at 2. The point here is that these key fields are not the same as a multivalue or subvalue number.
- These should NEVER be accessible to an end user. You should not normally even show them. They should be hidden from view.
FusionWare Direct Product Family
The FusionWare Direct product family includes client providers that enable access to your MultiValue data from Java, .NET and Legacy COM/COM+ environments.
Client Platforms Supported
Client Platforms supported for .NET and COM clients include:
- Windows 2000, 2003 and 2008, 32 bit and 64 bit
- Windows XP, Vista and Windows 7, 32 bit and 64 bit
Client Platforms Supported for Java client include:
- Windows 2000, 2003 and 2008, 32 bit and 64 bit
- Windows XP, Vista and Windows 7, 32 bit and 64 bit
- IBM Platforms:
- IBM i (AS/400)
- System z (mainframe)
- System p (AIX)
- Certified "Ready for IBM Systems with Linux"
- Linux
- Unix
- and more...
MultiValue Platforms Supported
- Rocket (formerly IBM)
- U2 (Universe and Unidata) including Universe back to version 5
- PI Open
- Raining Data
- Northgate
- Temenos Group
- Ladybridge Systems (coming soon)
- Others (older or unsupported MultiValue Systems
- UltPlus
- Power95
- and more...
Copyright ©2010 Robert Houben, CTO