Solving the Detached Many-to-Many Problem with the Entity Framework
Introduction
This article is part of the ongoing series I’ve been writing recently, but can be read as a standalone article. I’m going to do a better job of integrating the changes documented here into the ongoing solution I’ve been building.
However, considering how much time and effort I put into solving this issue, I’ve decided to document the approach independently in case it is of use to others in the interim.
The Problem Defined
This issue presents itself when you are dealing with disconnected/detached Entity Framework POCO objects,. as the DbContext doesn’t track changes to entities. Specifically, trouble occurs with entities participating in a many-to-many relationship, where the EF has hidden a “join table” from the model itself.
The problem with detached entities is that the data context has no way of knowing what changes have been made to an object graph, without fetching the data from the data store and doing an entity-by-entity comparison – and that assuming it’s possible to fetch the same way as it was originally.
In this solution, all the entities are detached, don’t use proxy types and are designed to move between WCF service boundaries.
Some Inspiration
There are no out-of-the-box solutions that I’m aware of which can process POCO object graphs that are detached.
- I did find an interesting solution called GraphDiff which is available from github and also as a NuGet package, but it didn’t work with the latest RC version of the Entity Framework (v6).
- I also found a very comprehensive article on how to implement a generic repository pattern with the Entity Framework, but it was unable to handle detached many-to-many relationships. In any case, I highly recommend a read of this article, it was inspiration for some of the approach I’ve ended up taking with my own design.
The Approach
This morning I put together a simple data model with the relationships that I wanted to support with detached entities. I’ve attached the solution with a sample schema and test data at the bottom of this article. If you prefer to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet, I’ve omitted it for file size and licensing reasons).
Here’s a logical view of the model I wanted to support:
Here’s the schema view from SQL Server:
Here’s the Entity Model which is generated from the above SQL schema:
In the spirit of punching myself in the head, I’ve elected to have one table implement an identity specification (meaning the underlying schema allocated PK ID values) whereas the other two tables the ID must be specified.
Theoretically, if I can handle the entity types in a generic fashion, then this solution can scale out to larger and more complex models.
The scenarios I’m specifically looking to solve in this solution with detached object graphs are as follows:
- Add a relationship (many-to-many)
- Add a relationship (FK-based)
- Update a related entity (many-to-many)
- Update a related entity (FK-based)
- Remove a relationship (many-to-many)
- Remove a relationship (FK-based)
Per the above, here’s the scenarios within the context of the above data model:
- Add a new Secondary entity to a Primary entity
- Add an Other entity to a Secondary entity
- Update a Secondary entity by updating a Primary entity
- Update an Other entity from a Secondary entity (or Primary entity)
- Remove (but not delete!) a Secondary entity from a Primary entity
- Remove (but not delete) a Other entity from a Secondary entity
Establishing Test Data
Just to give myself a baseline, the data model is populated (by default) with the following data. This gives us some “existing entities” to query and modify.
More work for the consumer
Although I tried my best, I couldn’t come to a design which didn’t require the consuming client to do slightly more work to enable this to work properly. Unfortunately the best place for change tracking to occur with disconnected entities is with the layer making changes – be it a business layer or something downstream.
To this effect, entities will need to implement a property which reflects the state of the entity (added, modified, deleted etc.). For the object graph to be updated/managed successfully, the consumer of the entities needs to set the entity state properly. This isn’t at all as bad as it sounds, but it’s not nothing.
Establishing some Scaffolding
After generating the data model, the first thing to be done is ensure each entity derives from the same base class. (“EntityBase”) this is used later to establish the active state of an entity when it needs to be processed. I’ve also created an enum (“ObjectState”) which is a property of the base class and a helper function which maps ObjectState to an EF EntityState. In case this isn’t clear, here’s a class view:
Constructing Data Access
To ensure that the usage is consistent, I’ve defined a single Data Access class, mainly to establish the pattern for handling detached object graphs. I can’t stress enough that this is not intended as a guide to an appropriate way to structure your data access – I’ll be updating my ongoing series of articles to go into more detail – this is only to articulate a design approach to handling detached object graphs.
Having said all that, here’s a look at my “DataAccessor” class, which can be used with generic data access entities (by way of generics):
As with my ongoing project, the Entity Framework DbContext is instantiated by this class on construction, and implements IDisposable to ensure the DbContext is disposed properly upon construction. Here’s the constructor showing the EF configuration options I’m using:
1.
public
DataAccessor()
2.
{
3.
_accessor =
new
SampleEntities();
4.
_accessor.Configuration.LazyLoadingEnabled =
false
;
5.
_accessor.Configuration.ProxyCreationEnabled =
false
;
6.
}
Updating an Entity
We start with a basic scenario to ensure that the scaffolding has been implemented properly. The scenario is to query for a Primary entity and then change a property and update the entity in the data store.
01.
[TestMethod]
02.
public
void
UpdateSingleEntity()
03.
{
04.
Primary existing =
null
;
05.
String existingValue = String.Empty;
06.
07.
08.
09.
10.
using
(DataAccessor a =
new
DataAccessor())
11.
{
12.
existing = a.DataContext.Primaries.Include(
"Secondaries"
).First();
13.
Assert.IsNotNull(existing);
14.
existingValue = existing.Title;
15.
existing.Title =
"Unit "
+ DateTime.Now.ToString(
"MMdd hh:mm:ss"
);
16.
}
17.
using
(DataAccessor b =
new
DataAccessor())
18.
{
19.
existing.State = ObjectState.Modified;
20.
b.InsertOrUpdate<Primary>(existing);
21.
}
22.
using
(DataAccessor c =
new
DataAccessor())
23.
{
24.
existing.Title = existingValue;
25.
existing.State = ObjectState.Modified;
26.
c.InsertOrUpdate<Primary>(existing);
27.
}
28.
}
You’ll noticed that there is nothing particularly significant here, except that the object’s State is reset toModified between operations.
Updating a Many-to-Many Relationship
Now things get interesting. I’m going to query for a Primary entity, then I’ll update both a property of thePrimary entity itself, and a property of one of the entity’s relationships.
01.
[TestMethod]
02.
public
void
UpdateManyToMany()
03.
{
04.
Primary existing =
null
;
05.
Secondary other =
null
;
06.
String existingValue = String.Empty;
07.
String existingOtherValue = String.Empty;
08.
09.
10.
11.
12.
using
(DataAccessor a =
new
DataAccessor())
13.
{
14.
//Note that we include the navigation property in the query
15.
existing = a.DataContext.Primaries.Include(
"Secondaries"
).First();
16.
Assert.IsTrue(existing.Secondaries.Count() > 1,
17.
"Should be at least 1 linked item"
);
18.
}
19.
//save the original description
20.
existingValue = existing.Description;
21.
//set a new dummy value (with a date/time so we can see it working)
22.
existing.Description =
"Edit "
23.
24.
25.
26.
27.
+ DateTime.Now.ToString(
"yyyyMMdd hh:mm:ss"
);
28.
existing.State = ObjectState.Modified;
29.
30.
31.
32.
33.
other = existing.Secondaries.First();
34.
//save the original value
35.
existingOtherValue = other.AlternateDescription;
36.
//set a new value
37.
other.AlternateDescription =
"Edit "
38.
+ DateTime.Now.ToString(
"yyyyMMdd hh:mm:ss"
);
39.
other.State = ObjectState.Modified;
40.
41.
42.
43.
44.
//a new data access class (new DbContext)
45.
using
(DataAccessor b =
new
DataAccessor())
46.
{
47.
//single method to handle inserts and updates
48.
49.
50.
51.
52.
//set a breakpoint here to see the result in the DB
53.
b.InsertOrUpdate<Primary>(existing);
54.
}
55.
56.
57.
58.
59.
//return the values to the original ones
60.
existing.Description = existingValue;
61.
other.AlternateDescription = existingOtherValue;
62.
existing.State = ObjectState.Modified;
63.
other.State = ObjectState.Modified;
64.
65.
66.
67.
68.
using
(DataAccessor c =
new
DataAccessor())
69.
{
70.
//update the entities back to normal
71.
//set a breakpoint here to see the data before it reverts back
72.
c.InsertOrUpdate<Primary>(existing);
73.
}
74.
}
If we actually run this unit test and set the breakpoints accordingly, you’ll see the following in the database:
Database at Breakpoint #1 / Database at Breakpoint #2
Database when Unit Test completes
You’ll notice at the second breakpoint that the description of the first entities have both been updated.
Examining the Insert/Update code
The function exposed by the “data access” class really just passes through to another private function which does the heavy lifting. This is mainly in case we need to reuse the logic, since it essentially processes state action on attached entities.
1.
public
void
InsertOrUpdate<T>(
params
T[] entities) where T : EntityBase
2.
{
3.
ApplyStateChanges(entities);
4.
DataContext.SaveChanges();
5.
}
Here’s the definition of the ApplyStateChanges function, which I’ll discuss below:
01.
private
void
ApplyStateChanges<T>(
params
T[] items) where T : EntityBase
02.
{
03.
DbSet<T> dbSet = DataContext.Set<T>();
04.
foreach
(T item
in
items)
05.
{
06.
//loads related entities into the current context
07.
dbSet.Attach(item);
08.
if
(item.State == ObjectState.Added ||
09.
10.
11.
12.
13.
item.State == ObjectState.Modified)
14.
{
15.
dbSet.AddOrUpdate(item);
16.
}
17.
else
if
(item.State == ObjectState.Deleted)
18.
{
19.
dbSet.Remove(item);
20.
}
21.
foreach
(DbEntityEntry<EntityBase> entry
in
22.
DataContext.ChangeTracker.Entries<EntityBase>()
23.
.Where(c => c.Entity.State != ObjectState.Processed
24.
&& c.Entity.State != ObjectState.Unchanged))
25.
{
26.
var y = DataContext.Entry(entry.Entity);
27.
y.State = HelperFunctions.ConvertState(entry.Entity.State);
28.
entry.Entity.State = ObjectState.Processed;
29.
}
30.
}
31.
}
Notes on this implementation
What this function does is to iterate through the items to be examined, attach them to the current Data Context (which also attaches their children), act on each item accordingly (add/update/remove) and then process new entities which have been added to the Data Context’s change tracker.
For each newly “discovered” entity (and ignoring entities which are unchanged or have already been examined), each entity’s DbEntityEntry is set according to the entity’s ObjectState (which is set by the calling client). Doing this allows the Entity Framework to understand what actions it needs to perform on the entities when SaveChanges() is invoked later.
You’ll also note that I set the entity’s state to “Processed” when it has been examined, so we don’t act on it more than once (for performance purposes).
Fun note: the AddOrUpdate extension method is something I found in theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’ operation, inserting or updating entities depending on whether they exist or not already. Bonus!
That’s it for adding and updating, believe it or not.
Corresponding Unit Test
The following unit test establishes the creation of a new many-to-many entity, it is then removed (by relationship) and then finally deleted altogether from the database:
01.
[TestMethod]
02.
public
void
AddRemoveRelationship()
03.
{
04.
Primary existing =
null
;
05.
06.
07.
08.
09.
using
(DataAccessor a =
new
DataAccessor())
10.
{
11.
existing = a.DataContext.Primaries.Include(
"Secondaries"
)
12.
.FirstOrDefault();
13.
Assert.IsNotNull(existing);
14.
}
15.
16.
17.
18.
19.
Secondary newEntity =
new
Secondary();
20.
newEntity.State = ObjectState.Added;
21.
newEntity.AlternateTitle =
"Unit"
;
22.
newEntity.AlternateDescription =
"Test"
;
23.
newEntity.SecondaryId = 1000;
24.
25.
26.
27.
28.
existing.Secondaries.Add(newEntity);
29.
30.
31.
32.
33.
using
(DataAccessor a =
new
DataAccessor())
34.
{
35.
//breakpoint #1 here
36.
a.InsertOrUpdate<Primary>(existing);
37.
}
38.
39.
40.
41.
42.
newEntity.State = ObjectState.Unchanged;
43.
existing.State = ObjectState.Modified;
44.
45.
46.
47.
48.
using
(DataAccessor b =
new
DataAccessor())
49.
{
50.
//breakpoint #2 here
51.
b.RemoveEntities<Primary, Secondary>(existing,
52.
x => x.Secondaries, newEntity);
53.
}
54.
55.
56.
57.
58.
using
(DataAccessor c =
new
DataAccessor())
59.
{
60.
//breakpoint #3 here
61.
c.Delete<Secondary>(newEntity);
62.
}
63.
}
Test Results:
Pre-Test – Breakpoint #1 / Breakpoint #2
Breakpoint #3 / Post execution (new entity deleted)
SQL Profile Trace
Removing a many-to-many Relationship
Now this is where it gets tricky. I’d like to have something a little more polished, but the best I have come up with to date is a separate operation on the data provider which exposes functionality akin to “remove relationship”.
The fundamental problem with how the EF POCO entities work without any modifications, is when they are detached, to remove a many-to-many relationship, the relationship to be removed is physically removed from the collection.
When the object graph is sent back for processing, there’s a missing related entity, and the service or data context would have to make an assumption that the omission was on purpose, not to mention that it would have to compare against data currently in the data store.
To make this easier, I’ve implemented a function called “RemoveEnttiies” which alters the relationship between the parent and the child/children. The one bug catch is that you need to specify the navigation property or collection, which might make it slightly undesirable to implement generically. In any case, I’ve provided two options – with the navigation property as a string parameter or as a LINQ expression – they both do the same thing.
01.
public
void
RemoveEntities<T, T2>(T parent,
02.
Expression<Func<T,
object
>> expression,
params
T2[] children)
03.
where T : EntityBase
04.
where T2 : EntityBase
05.
{
06.
DataContext.Set<T>().Attach(parent);
07.
ObjectContext obj = DataContext.ToObjectContext();
08.
foreach
(T2 child
in
children)
09.
{
10.
DataContext.Set<T2>().Attach(child);
11.
obj.ObjectStateManager.ChangeRelationshipState(parent,
12.
child, expression, EntityState.Deleted);
13.
}
14.
DataContext.SaveChanges();
15.
}
Notes on this implementation
The “ToObjectContext” is an extension method, and is akin to (DataContext as IObjectContextAdapter).ObjectContext. This is to expose a more fundamental part of the Entity Framework’s object model. We need this level of access to get to the functionality which controls relationships.
For each child to be removed (note: not deleted from the physical database), we nominate the parent object, the child, the navigation property (collection) and the nature of the relationship change (delete).
Note that this will NOT WORK for Foreign Key defined relationships – more on that below.
To delete entities which have active relationships, you’ll need to drop the relationship before attempting to delete or else you’ll have data integrity/referential integrity errors, unless you have accounted for cascading deletion (which I haven’t).
Example execution:
1.
using
(DataAccessor c =
new
DataAccessor())
2.
{
3.
//c.RemoveEntities<Primary, Secondary>(existing, "Secondaries", s);
4.
//(or can use an expression):
5.
c.RemoveEntities<Primary, Secondary>(existing, x => x.Secondaries, s);
6.
}
Removing FK Relationships
As mentioned above, you can’t just edit the relationship to remove an FK-based relationship. Instead, you have to follow the EF practice of setting the FK entity to NULL. Here’s a Unit Test which demonstrates how this is achieved:
01.
Secondary s = ExistingEntity();
02.
using
(DataAccessor c =
new
DataAccessor())
03.
{
04.
05.
06.
07.
08.
s.Other =
null
;
09.
s.OtherId =
null
;
10.
s.State = ObjectState.Modified;
11.
o.State = ObjectState.Unchanged;
12.
c.InsertOrUpdate<Secondary>(s);
13.
}
We use the same “Insert or Update’ call – being aware that you have to set the ObjectState properties accordingly.
Note: I’m in the process of testing the reverse removal – i.e. what happens if you want to remove a Secondaryentity from an Other entity’s collection.
Deleting Entities
This is fairly straightforward, but I’ve taken a few more precautions to ensure that the entity to be deleted is valid no the server side.
01.
public
void
Delete<T>(
params
T[] entities) where T : EntityBase
02.
{
03.
foreach
(T entity
in
entities)
04.
{
05.
T attachedEntity = Exists<T>(entity);
06.
07.
08.
09.
10.
if
(attachedEntity !=
null
)
11.
{
12.
var attachedEntry = DataContext.Entry(attachedEntity);
13.
attachedEntry.State = EntityState.Deleted;
14.
}
15.
}
16.
DataContext.SaveChanges();
17.
}
To understand the above, you should take a look at the implementation of the “Exists” function which essentially checks the data store and local cache to see if there is an attached representation:
01.
protected
T Exists<T>(T entity) where T : EntityBase
02.
{
03.
var objContext = ((IObjectContextAdapter)
this
.DataContext)
04.
.ObjectContext;
05.
var objSet = objContext.CreateObjectSet<T>();
06.
var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name,
07.
entity);
08.
09.
10.
11.
12.
DbSet<T>
set
= DataContext.Set<T>();
13.
var keys = (from x
in
entityKey.EntityKeyValues
14.
select x.Value).ToArray();
15.
16.
17.
18.
19.
//Remember, there can by surrogate keys, so don't assume there's
20.
//just one column/one value
21.
//If a surrogate key isn't ordered properly, the Set<T>().Find()
22.
//method will fail, use attributes on the entity to determine the
23.
//proper order.
24.
25.
26.
27.
28.
//context.Configuration.AutoDetectChangesEnabled = false;
29.
30.
31.
32.
33.
return
set
.Find(keys);
34.
}
This is a fairly expensive operation which is why it’s pretty much reserved for deletes and not more frequent operations. It essentially determines the target entity’s primary key and then checks whether the entity exists or not.
Note: I haven’t tested this on entities with surrogate keys, but I’ll get to it at some point. If you have surrogate key tables, you can define the PK key order using attributes on the model entity, but I haven’t done this (yet).
Summary
This article is the culmination of about two days of heavy analysis and investigation. I’ve got a whole lot more to contribute on this topic, but for now, I felt it was worthy enough to post as-is. What you’ve got here is still incredibly rough, and I haven’t done nearly enough testing.
To be honest, I was quite excited by the initial results, which is why I decided to write this post. there’s an incredibly good chance that I’ve missed something in the design and implementation, so please be aware of that. I’ll be continuing to refine this approach in my main series of articles with much cleaner implementation.
In the meantime though, if any of this helps anyone out there struggling with detached entities, I hope it helps. There’s precious few articles and samples that are up to date, and very few that seem to work. This is provided without any warranty of any kind!
If you find any issues please e-mail me rob.sanders@sanderstechnology.com and I’ll attempt to refactor/debug and find ways around some of the inherent limitations. In the meantime, there are a few helpful links I’ve come across in my travels on the WWW. See below.
Example Solution Files [ Files ]
Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I haven’t included it in the archive.
Helpful Links
- http://blog.magnusmontin.net/2013/05/30/generic-dal-using-entity-framework/
- https://github.com/refactorthis/GraphDiff
- http://stackoverflow.com/questions/11686225/dbset-find-method-ridiculously-slow-compared-to-singleordefault-on-id
- http://stackoverflow.com/questions/10381106/cannot-update-many-to-many-relationships-in-entity-framework
- http://stackoverflow.com/questions/8413248/how-to-save-an-updated-many-to-many-collection-on-detached-entity-framework-4-1
- http://stackoverflow.com/questions/6018711/generic-way-to-check-if-entity-exists-in-entity-framework
Published at DZone with permission of Rob Sanders, author and DZone MVB. (source)