This episode of HBR is brought to you by AnanasToast.com. Get 15% discount on all shared hosting with the offer code HBR15. That's HBR15. Bit your web hosting that's Aniston Fair at AnanasToast.com. Hello and welcome to HAKA Public Radio. My name is Mike Ray. In this show I'm going to explain the best, whicing is the best and possibly the only way to implement a many-to-many relationship in a database. This has been triggered by some discussion between Ken and Dave about many-to-many relationships and databases, which has been seen and heard on community podcasts and emails. I believe there was some discussion, some good nature of ribbing between them about the way in which Ken implemented a many-to-many relationship in some mechanism to do with the website. One of the first tasks in designing a database is the identification of the entities which will be contained and managed by that database. And more often than not an entity will have its own table, there will be a one-to-one relationship between an entity and a table in a database. Exceptions to this rule will be where the software demands otherwise, sometimes for example there are e-commerce systems which are designed using object-oriented technology and they demand a greater degree of granularity in the data and that might give rise to a situation where a single entity could occupy more than one table. What do I mean by entity? If you can imagine designing a database for an e-commerce or a sales ledger, some kind of mechanism by which you are setting products either by a web via a website or by a more traditional means your database will probably contain such things as customer data, invoice data, invoice line data because an invoice can contain more than one entry, products and other related things, each of these is an entity. When you've identified the entities in a database, they are often represented on a diagram by a diagram which contains rectangles, one for each entity, they contain a label. For example, it'll be a rectangle with a word customer inside it which represents the customer entity, perhaps another box which represents the invoice entity and whatever entities exist in a database will be joined together by lines which join the boxes. These lines indicate the nature of the join between the tables. For example, a single customer can obviously make more than one visit to a website or a shop and make more than one purchase, so one customer could give rise to many invoices. So the relationship between a customer entity and an invoice entity is a one to many relationship and that is represented on a diagram by a line drawn between customer and invoice and at the invoice end which is the many end. The line splits into what is generally known as a cross foot which is a little three-pronged shape, just like the foot of a crow or chicken and that touches the box which contains the entity of which there are many. In fact, there could be just one row in the invoice table because of course the customer at some point will make his first purchase. So a one to many could be more correctly described as a one to one or more relationship. You could also describe a one to many relationship as a many to one if you're looking from the other end if you're standing if you like at the invoice end looking towards the customer then what you have is a many to one many invoices. A collection of invoices can have or will have the same customer ID. If you find that in your database you have a one to one relationship then there is probably an error with your data analysis. The rules of database normalization state that all of the entries all of the columns in a single row in a database relate to the primary key. The primary identifier of that entity so in a customer obviously has a name but the name is often not unique so the customer is issued with a customer ID and in the customer name, customers gender age, a number of other things related to the customer will be stored against or keyed by that index and if you have another table which is a one to one relationship with a customer it probably belongs in the customer table and combine two tables into one. So what about many to many relationships? Well these are not as common far less common than a one to many relationship. One example and an actually example I use here is in the implementation of a music database in this case the database contains two entities an artist and a genre. Now it's obviously there's going to be multiple artists and multiple genres and it's conceivable that a single artist may appear in more than one genre and it's absolutely certain that a single genre will probably have more than one artist so this gives rise to a many to many relationship. But in defining a many to many relationship in the database we do not and cannot simply join the artist and the genre table and have a close foot at both ends of the line which joins the tables and the reason for this should become clear when I explain about foreign keys and trying very hard not to violate the rules of data normalization. A foreign key is a column contained in what is often called a child in the relationship so in this in the case again for customer and the invoice. The customer can be thought of as the parent entity and the invoice as a child entity and an invoice row will contain a column which corresponds to the customer ID and this is indexed not uniquely indexed but indexed and contains the customer ID of the customer making the purchase which is recorded by the invoice. This column, the custom ID column in the invoice table is known as a foreign key to the customer table. Now if we were to try to join two tables together and have a close foot at both ends of the line and join artist and genre in a many to many then this would suggest that the genre table will contain a foreign key for the artist ID and the artist ID and the artist table will contain a column which is a foreign key into the genre table. This is nonsense and it's a mess because it absolutely busts wide open the rules about data normalization because now the artist entity, the artist table contains a column, a data item which does not relate to the artist. It doesn't identify the artist. It identifies the genre to which the artist can belong but it does not identify the artist and that breaks it kind of creating this kind of circular relationship just breaks the rules of data normalization so how to explain but it does. So how do we generate or represent a many to many relationship in a database? What if you can visualize from in it the two tables artist and genre joined together by a straight line with a crow's foot at each end. Now cut that line in the middle and plop another table there and then spin the two halves of the lines through 180 degrees so now you've got custom that the central table that table in the middle is called a cross reference table and I always suffix these tables with X-REF shortfall cross reference so in a worked example there would be an artist table genre table and an artist artist genre X-REF table so the artist now has a one-to-many relationship with the artist genre X-REF table and the genre table has a one-to-many relationship with the artist genre X-REF table what does the artist genre X-REF table contain? It simply contains two columns an artist ID and a genre ID and these two columns are not permitted to contain null so they're defined as not null constraints in the data definition and there is a single index which is a compound unique index using both of those columns so you would create an index give it a unique constraint and you use both of the columns in the creation of the index and what that does is to ensure that there can only be a single row where an artist and a genre are contained now this is all getting a bit a satiric and a bit abstract and confusing so in the show notes you will find all of the code which implement this real-world example of artist genre in a partial music database system implementing a many to many to many many to many relationships and what I have used as an RDBMS in this case is SQLite SQLite which is the world's widest most used RDBMS that's a very grand claim but it's a claim which is perfectly justified because if you have a smartphone or a blackberry android phone or a eye device in your pocket or if you have a satellite TV receiver or a TV at home and chances are each of those devices containing an embedded operating system be it some flavor of Linux or whatever the app or operating system is or window C or whatever windows phones now contain they all probably use SQLite in fact they're almost definitely do so that's why SQLite can and does develop to the claim that it's the most widely used RDBMS in the world now SQLite is very easy to install on Linux and in fact a lot of packages that are contained or installed on Linux platform will already make use of SQLite so you probably have SQLite libraries and development libraries but you may need to install the interactive prompt on Arch Linux it's called SQLite 3 in the show I can't remember exactly what it is on Debian and Ubuntu but it's in the show notes anyway so once you've installed SQLite 3 you then have an interactive prompt which you can enter create the database and merely create and populate tables and run queries the show notes contain all of the files which I created to simulate the many-to-money relationship data definition language CSV data comma separated values data to load into the three tables that we're going to create load scripts SQL scripts to select data to demonstrate how to select artists in given genres or super sets of genres and scripts for dumping those results sets to other CSV files so I have a good look at the show notes it's a very very comprehensive as I say all of the files that I used in generating the example of there in each case each file is topped and tailed with the string dash-snip dash dash this is because in a SQLite SQL command or query file two dashes begins a line comment so dash-snip dash dash will not affect the contents of the file the only example where you don't need to take out the dash-snip dash dash will be in the CSV data files because of course you need the data to be there on its own I've included a couple of artists which appear in more than one genre and Irish folk rock band called horse lips which crosses from the description it both into folk and rock a Scottish band called run rig that also belong in folk and rock there are some artists there which belong only in rock a couple which belong only in folk and a couple of classical artists just for good measure and all of the mechanisms that you'll need to test or demonstrate the the workings of a many to many relationship but to emphasize again this I've written or I've been involved in a design and implementation of some quite large client server oracle and my SQL database systems and it's been my task to implement many to many relationships on several occasions and I can I have never found a better solution than this a better solution that does not compromise normalization or lead to an even worse situation which is where you need to drag big record sets back across the network probably containing records that you will ultimately drop because that's wasteful in bandwidth and it's wasteful in client side processing cycles and as for doing things like putting comma separated lists of fields into a single column in a database oh no no no no no don't do it it's bad you know who you are so there we are I sort of concluded it's been a probably a difficult explanation to follow but look at the show notes because the show notes are very very comprehensive and as I said they contain a complete worked example implemented with SQLite 3 which despite being small and compact and fast is an exceptionally powerful RDBMS for its size and for its it's not even I don't think it's even GPL it's just free it's just in the public domain it's free as a bird it's a very realistic very realistic till don't use it if you intend to write a relational database system which is going to have multiple concurrent users if you're going to start an airline and have booking clerks all around the world who need to access the database at the same time SQLite ain't going to do it SQLite really is more suited to single user embedded systems or small small one-off programs you know embedded inside a software program for use by only the programmer or by only only by use for use only by the person who is using the software at that time in fact was one really good feature of SQLite which is the ability to create a database in memory that's a database that does not exist on disk if you name the database colon memory colon it exists only in memory and then it's possible to define tables and columns and relationships perhaps into which to load the contents of an XML file or a bunch of XML files or JSON files or whatever in order to make blisteringly fast queries of configuration data which if it's changed anyway can then be dumped back out to an XML file at a later time so I think that's it long and long-winded boring technical difficulty grasp just by listening to what I'm saying it's really one of those things where you're going to need to look at a diagram or peruse at the very least the textual diagram in order to be able to understand exactly what these relationships are and exactly how the mechanism works but trust me this is as far as I can as far as my experience tells me the only way in which to implement a many to many relationship in a relational database system you've been listening to heck of public radio and heck of public radio dot org we are a community podcast network that release the shows every week day Monday through Friday today showed like all our shows was contributed by a HDR listener like yourself if you ever thought of recording a podcast and click on our contribute in to find out how easy it really is heck of public radio was found by the digital dot pound and the informomicon computer club and it's part of the binary revolution of being rev.com if you have comments on today show please email the host directly leave a comment on the website or record a follow up episode yourself on their otherwise status today show is released on the creative comments attribution share it like the dot org license