.NET Junkie - Faking your LINQ provider part 1

14 November 10

Faking your LINQ provider part 1

I recently tried to figure out how to write testable code while using LINQ to SQL as my O/RM of choice, without loosing the ability to use LINQ to Expression trees! In this post I describe the design I ender up with.

Friday, someone at Stackoverflow asked how to hide LINQ enabled persistence frameworks behind an abstraction (the repository pattern). I pointed him to a question on Stackoverflow that I answered a few days earlier in what I explained how to allow your LINQ to SQL project to be unit testable. Friday’s question however, was about being able to easily change the O/RM later on, and specifically with multiple data stores / databases involved.

Let me start by saying that due to the current difference in behavior and quality between both open source and commercial LINQ provider implementations, it is hard to completely abstract away such implementation, while still allowing to use LINQ to Expression queries that are effectively translated to database queries. If you want to be able to easily switch O/RM technology later on, you should -not only- have enough unit tests, but especially have enough integration tests that verify the interaction between your LINQ queries and the database (with the persistence technology in between). Also business transactions that involve deletes are something you should cover using integration tests, because the behavior of deleting objects might differ between O/RM frameworks in surprising ways.

For the project I'm currently working on, I use LINQ to SQL as O/RM tool and I was faced with the problem of unit testing. I wrote this answer on Stackoverflow with the experience gained on this project. The answer explains a simplified model of what I designed. What that answer didn’t show however, was that my actual solution was especially designed to deal with multiple data sources. Friday’s answer triggered me to write this actual design down. What I tried to achieve was the following:

Abstracting LINQ to SQL away enough to allow unit testing.
Create the abstractions in such a way that adding new entities and new data sources would take very little code, both in application and tests.
I wanted it to be as DRY as a bone.
Have a model that closely mimics the API of LINQ to SQL.

What I tried to achieve in my design was to have one unit of work per data source, but without the need to have different implementations during testing. What I ended up with was a design were the persistence technology was abstracted away behind an IDataMapper interface. The unit of work class would be dependent on the IDataMapper, but not on the specific O/RM.

In my design the unit of work would be a container of repositories. These repositories would simply implement IQueryable<T> (such as IQueryable<Customer> for a customer repository), because this allows us to use LINQ queries over the repository.

The typical caller of those data source specific unit of work classes would be a business command or service class. While I could easily configure my favorite IoC framework to create a new instance of a unit of work and inject it into a command, I liked the creation of a new unit of work to be more explicit. Units of work should be committed and disposed, which makes the ownership important. I wanted to make creation and disposing of those object very explicit in my code. For this reason I decided to write factories for the creation of units of work.

Let’s go through the different parts of the code, starting with the Units of work:

Unit of Work

The Unit of Work pattern describes a way to coordinate the writing out of changes in a business transaction. It allows you to make a series of changes in memory and commit them atomically. In my design I have a class per data source. For instance, below is an example of a unit of work for the Northwind database:

public sealed class NorthwindUnitOfWork : IDisposable 
{
    private readonly IDataMapper mapper;
    
    public NorthwindUnitOfWork(IDataMapper mapper)
    {
        this.mapper = mapper;
    }
    
    public Repository<Customer> Customers
    {
        [DebuggerStepThrough]
        get { return this.mapper.GetRepository<Customer>(); }
    }
    
    public Repository<Employee> Employees
    {
        [DebuggerStepThrough]
        get { return this.mapper.GetRepository<Employee>(); }
    }
    
    public Repository<Order> Orders
    {
        [DebuggerStepThrough]
        get { return this.mapper.GetRepository<Order>(); }
    }
    
    [DebuggerStepThrough]
    public void SubmitChanges()
    {
        this.mapper.Save();
    }
    
    [DebuggerStepThrough]
    public void Dispose()
    {
        this.mapper.Dispose();
    }
}

The NorthwindUnitOfWork wraps a IDataMapper and forwards the calls to Dispose and SubmitChanges to the mapper. The NorthwindUnitOfWork also contains a set of properties that represent the different repositories. A repository implements IQueryable<T> and allows the retrieval of a certain type. As you can see, all properties return the generic Repository<T> class. Here is the definition of Repository<T>:

public abstract class Repository<T> : IQueryable<T>
    where T : class
{
    private readonly IQueryable<T> query;

    protected Repository(IQueryable<T> query)
    {
        this.query = query;
    }

    public Type ElementType
    {
        get { return this.query.ElementType; }
    }

    public Expression Expression
    {
        get { return this.query.Expression; }
    }

    public virtual IQueryProvider Provider
    {
        get { return this.query.Provider; }
    }

    public abstract void InsertOnSubmit(T entity);

    public abstract void DeleteOnSubmit(T entity);

    public void InsertAllOnSubmit(IEnumerable<T> entities)
    {
        foreach (var entity in entities)
        {
            this.InsertOnSubmit(entity);
        }
    }

    public void DeleteAllOnSubmit(IEnumerable<T> entities)
    {
        foreach (var entity in entities)
        {
            this.DeleteOnSubmit(entity);
        }
    }

    public IEnumerator<T> GetEnumerator()
    {
        return this.query.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.query.GetEnumerator();
    }
}

The repository decorates an IQueryable<T>. The sole reason to have a Repository<T> instead of simply returning IQueryable<T> is because of the InsertOnSubmit and DeleteOnSubmit methods. In my initial design I implemented the InsertOnSubmit and DeleteOnSubmit as instance methods of the unit of work (as Entity Framework does). However, when I added InsertAllOnSubmit and DeleteAllOnSubmit methods things went wrong. I wrote code that looked a bit like this:

var customer =
    context.Customers.GetById(this.CustomerId);
 
context.DeleteOnSubmit(customer.Orders);

Note that I incorrectly called DeleteOnSubmit instead of calling the DeleteAllOnSubmit. This compiled fine, because DeleteOnSubmit accepted object and a collection of orders is of course an object. I think this was one of the reasons why the C# team decided to put those insert and delete methods on LINQ to SQL’s Table<T> class. So I wanted to have those methods on the repository and that lead to a design with the Repository<T> class.

The NorthwindUnitOfWork and the Repository<T> together now mimic the LINQ to SQL API very closely. Because of this, for the most part I didn’t have to change my code to get this to work, with is a big plus when you want to add this to an existing project.

Having an API close to that of LINQ to SQL wasn’t enough for me however. The idea of the Repository pattern is to have entity specific operations that allow fetching or deleting entities of a certain type such as GetOrderById or FindCustomerByName. In the past I already solved this problem by writing these operations as extension methods on Table<Order> and Table<Customer>. By writing extension methods on IQueryable<Order> and IQueryable<Customer> -from a user’s perspective- it is just as if these methods are defined on the repository itself:

public static class NorthwindRepositoryExtensions
{
    public static Customer GetById(
        this IQueryable<Customer> repository, string id)
    {
        return GetSingle(repository, e => e.Id == id, id);
    }

    public static Employee GetById(
        this IQueryable<Employee> repository, int id)
    {
        return GetSingle(repository, e => e.Id == id, id);
    }

    public static Order GetById(
        this IQueryable<Order> repository, int id)
    {
        return GetSingle(repository, e => e.Id == id, id);
    }

    // TODO: More GetById methods here.

    // Allow reporting more descriptive error messages.
    private static T GetSingle<T>(IQueryable<T> collection, 
        Expression<Func<T, bool>> predicate, object id)
        where T : class
    {
        T entity;

        try
        {
            entity = collection.SingleOrDefault(predicate);
        }
        catch (Exception ex)
        {
            throw new InvalidOperationException(string.Format(
                "There was an error retrieving an {0} with " +
                "id {1}. {2}",
                typeof(T).Name, id ?? "{null}", ex.Message), ex);
        }

        if (entity == null)
        {
            throw new KeyNotFoundException(string.Format(
                "{0} with id {1} was not found.", 
                typeof(T).Name, id ?? "{null}"));
        }

        return entity;
    }
}

While it would be sufficient to implement those GetById methods with return repository.Single(e => e.Id == id), I found out quickly that this would give very little information on failure. A private method that throws a more descriptive exception was the solution.

Data Mapper

To keep the unit of work classes, such as NorthwindUnitOfWork, ignorant to the chosen technology, a mapping should be in place between that type and the used persistence framework. This is what the IDataMapper interface is for:

public interface IDataMapper : IDisposable
{
    Repository<T> GetRepository<T>() where T : class;
 
    void Save();
}

If you look closely at the definition of this interface, you’ll notice that the data mapper is in fact a unit of work. As a matter of fact, if you wish, you could just use the IDataMapper directly in your business layer instead of using a NorthwindUnitOfWork and its extension methods. Main reason for the them to exist is syntactic sugar. It makes the code much cleaner. In fact, it’s only because C# lacks extension properties that we actually need a NorthwindUnitOfWork.

With the definition of the IDataMapper, it is easy to create implementations for specific O/RM frameworks. Here is an implementation for LINQ to SQL:

public sealed class LinqToSqlDataMapper : IDataMapper
{
    private readonly DataContext context;
    private readonly Dictionary<Type, object> repositories =
        new Dictionary<Type, object>();

    public LinqToSqlDataMapper(DataContext context)
    {
        this.context = context;
    }

    public void Save()
    {
        this.context.SubmitChanges();
    }

    public void Dispose()
    {
        this.context.Dispose();
    }

    public Repository<T> GetRepository<T>() where T : class
    {
        object rep;

        if (!this.repositories.TryGetValue(typeof(T), out rep))
        {
            var table = this.context.GetTable<T>();
            rep = new LinqToSqlRepository<T>(table);
            this.repositories[typeof(T)] = rep;
        }

        return (Repository<T>)rep;
    }

    private sealed class LinqToSqlRepository<T> : Repository<T>
        where T : class
    {
        private readonly Table<T> table;

        public LinqToSqlRepository(Table<T> table)
            : base(table)
        {
            this.table = table;
        }

        public override void InsertOnSubmit(T entity)
        {
            this.table.InsertOnSubmit(entity);
        }

        public override void DeleteOnSubmit(T entity)
        {
            this.table.DeleteOnSubmit(entity);
        }
    }
}

As you can see, the LinqToSqlDataMapper wraps a LINQ to SQL DataContext class and forwards it’s Save and Dispose methods to the DataContext. Its GetRepository<T> method returns LinqToSqlRepository<T> classes, which wrap Table<T> instances. As you can see, the code is fairly simple.

A bit more tricky is the implementation for Entity Framework:

public sealed class EntityFrameworkDataMapper : IDataMapper
{
    private readonly ObjectContext context;
    private readonly Dictionary<Type, object> repositories =
        new Dictionary<Type, object>();

    public EntityFrameworkDataMapper(ObjectContext context)
    {
        this.context = context;
    }

    public void Save()
    {
        this.context.SaveChanges();
    }

    public void Dispose()
    {
        this.context.Dispose();
    }

    public Repository<T> GetRepository<T>() where T : class
    {
        object rep;

        if (!this.repositories.TryGetValue(typeof(T), out rep))
        {
            string setName = this.GetEntitySetName<T>();

            var query = this.context.CreateQuery<T>(setName);
            rep = new EntityRepository<T>(query, setName);
            this.repositories[typeof(T)] = rep;
        }

        return (Repository<T>)rep;
    }

    private string GetEntitySetName<T>()
    {
        EntityContainer container =
            this.context.MetadataWorkspace.GetEntityContainer(
            this.context.DefaultContainerName, DataSpace.CSpace);

        return (
            from item in container.BaseEntitySets
            where item.ElementType.Name == typeof(T).Name
            select item.Name).First();
    }

    private sealed class EntityRepository<T>
        : Repository<T> where T : class
    {
        private readonly ObjectQuery<T> query;
        private readonly string entitySetName;

        public EntityRepository(ObjectQuery<T> query,
            string entitySetName)
            : base(query)
        {
            this.query = query;
            this.entitySetName = entitySetName;
        }

        public override void InsertOnSubmit(T entity)
        {
            this.query.Context.AddObject(entitySetName, entity);
        }

        public override void DeleteOnSubmit(T entity)
        {
            this.query.Context.DeleteObject(entity);
        }
    }
}

For Entity Framework you need to do a bit more work to create repositories, because Entity Framework expects you to supply the entity set name, which would normally the plural form of the entity name, but can in fact be anything. This entity set name is also needed for inserting entities. As you can see, the internal EntityRepository<T> forwards insert and delete calls back to the context, because that’s how Entity Framework likes it (but I don’t).

UPDATE: Note that the ObjectSet<T> class of Entity Framework 4.0 now contains a AddObject(T) method (which was missing in .NET 3.5), which makes writing the EntityResository<T> much easier.

For unit testing we of course want to have an in-memory representation of the objects:

public class InMemoryDataMapper : IDataMapper
{
    private List<object> committed = new List<object>();
    private List<object> uncommittedInserts = new List<object>();
    private List<object> uncommittedDeletes = new List<object>();

    public bool Saved { get; private set; }

    public bool Disposed { get; private set; }

    // Get a list with all committed objects of type T.
    public IEnumerable<T> Committed<T>() where T : class
    {
        return this.committed.OfType<T>();
    }

    public void AddCommitted(object entity)
    {
        this.committed.Add(entity);
    }

    public Repository<T> GetRepository<T>() where T : class
    {
        return new InMemoryRepository<T>(this);
    }

    public void Save()
    {
        this.committed.AddRange(this.uncommittedInserts);
        this.uncommittedInserts.Clear();

        this.committed.RemoveAll(
            e => this.uncommittedDeletes.Contains(e));
        this.uncommittedDeletes.Clear();

        this.Saved = true;
    }

    public void Dispose()
    {
        this.Disposed = true;
    }

    private sealed class InMemoryRepository<T> : Repository<T>
        where T : class
    {
        private readonly InMemoryDataMapper mapper;

        public InMemoryRepository(InMemoryDataMapper mapper)
            : base(mapper.committed.OfType<T>().AsQueryable())
        {
            this.mapper = mapper;
        }

        public override void InsertOnSubmit(T entity)
        {
            if (this.mapper.committed.Contains(entity))
                Assert.Fail("Entity already exist.");

            this.mapper.uncommittedInserts.Add(entity);
        }

        public override void DeleteOnSubmit(T entity)
        {
            if (!this.mapper.committed.Contains(entity))
                Assert.Fail("Entity does not exist.");

            this.mapper.uncommittedDeletes.Add(entity);
        }
    }
}

The AddCommitted method is especially useful during test setup. You typically want to configure the InMemoryDataMapper with a set of committed objects. This correctly mimics how LINQ to SQL and Entity Framework work. The Committed<T>() method is useful during the assertion phase of your tests. With this method you can check if the objects you expect are indeed committed.

Unit of Work factories

The last piece of the puzzle are the Unit of Work factories. Each unit of work gets it’s own factory. As I explained, I like the creation of objects that implement IDisposable to be very explicit. Factories help with this.

In my project I had to deal with multiple data stores. To minimize the number of needed interfaces I decided to define a single generic interface for creating unit of work classes:

public interface IUnitOfWorkFactory<TUnitOfWork>
{
    TUnitOfWork CreateNew();
}

While there is one generic interface, you’ll still need to have one concrete factory for the chosen persistence technology. For instance, here is the factory for creating NorthwindUnitOfWork instances with LINQ to SQL:

public class LinqToSqlNorthwindUnitOfWorkFactory
    : IUnitOfWorkFactory<NorthwindUnitOfWork>
{
    private static MappingSource Source =
        new AttributeMappingSource();

    private readonly string conStr;

    public LinqToSqlNorthwindUnitOfWorkFactory(string conStr)
    {
        this.conStr = conStr;
    }

    public NorthwindUnitOfWork CreateNew()
    {
        var db = new DataContext(this.conStr, Source);
        var mapper = new LinqToSqlDataMapper(db);
        return new NorthwindUnitOfWork(mapper);
    }
}

What you might notice is that the factory is not only data store specific, but also O/RM specific. I tried to define a technology ignorant NorthwindUnitOfWorkFactory by creating an IDataMapperFactory interface, but unfortunately this didn’t work out. For performance reasons, the unit of work factory needs a static MappingSource that is specific to the actual data store. Entity Framework has the same sort of constraint. It needs a default container name which is specific to the data store. Because of this it isn’t possible to extract this code to a IDataMapperFactory implementation.

Here is the factory when using Entity Framework:

public class EntityFrameworkNorthwindUnitOfWorkFactory
    : IUnitOfWorkFactory<NorthwindUnitOfWork>
{
    public NorthwindUnitOfWork CreateNew()
    {
        var db = new ObjectContext("name=NorthwindEntities");
        db.DefaultContainerName = "NorthwindEntities";
        var mapper = new EntityFrameworkDataMapper(db);
        return new NorthwindUnitOfWork(mapper);
    }
}

Because constructing units of work in a unit testing environment is much simpler, we don’t need a factory per data store. We can simply create a factory for creating unit of work factories :-). The CreateFactory method creates a factory that returns the supplied unit of work:

public static class FakeUnitOfWorkFactoryFactory
{
    public static IUnitOfWorkFactory<TUnitOfWork>
        CreateFactory<TUnitOfWork>(TUnitOfWork uow)
    {
        return new FakeFactory<TUnitOfWork>()
        {
            UnitOfWork = uow
        };
    }

    private sealed class FakeFactory<TUnitOfWork>
        : IUnitOfWorkFactory<TUnitOfWork>
    {
        public TUnitOfWork UnitOfWork { get; set; }

        public TUnitOfWork CreateNew()
        {
            return this.UnitOfWork;
        }
    }
}

IoC Configuration

With all this plumbing in place we can now configure our IoC framework. The configuration is really straightforward, because we only need to register the concrete unit of work factories, as follows:

string northwindConnection = GetConStr("Northwind");
 
container.RegisterSingle<IUnitOfWorkFactory<NorthwindUnitOfWork>>(
    new LinqToSqlNorthwindUnitOfWorkFactory(northwindConnection));
 
container.RegisterSingle<IUnitOfWorkFactory<SalesUnitOfWork>>(
    new EntityFrameworkSalesUnitOfWorkFactory());

Application code

I almost forgot the reason why we developers got paid: To create code that helps the business. This is what some business command might look like when we use this code:

public class SomeBusinessCommand
{
    public string CustomerId { get; set; }
}

public class SomeBusinessCommandHandler
    : IHandle<SomeBusinessCommand>
{
    private IUnitOfWorkFactory<NorthwindUnitOfWork> factory;
 
    public SomeBusinessCommandHandler(
        IUnitOfWorkFactory<NorthwindUnitOfWork> factory)
    {
        this.factory = factory;
    }
 
    public void Handle(SomeBusinessCommand command)
    {
        // Create a new context using the factory
        using (var context = this.factory.CreateNew())
        {
            // Using the extension methods on Repository<T>
            var customer = 
                context.Customers.GetById(command.CustomerId);
 
            // Use LINQ queries to effectively filter data. 
            var ordersToDelete =
                from order in context.Orders
                where order.Customer == customer
                where order.ShippedDate == null
                select order;
 
            // Use the delete operation on Repository<T>
            context.Orders.DeleteAllOnSubmit(ordersToDelete);
 
            // save the changes to the database
            context.SubmitChanges();
        }
    }
}

UPDATE 2011-12-01: Although letting handlers create unit of work instances (using a factory) is an explicit model that is easy model to grasp, I came to the conclusion that this does not scale well. When complexity of the business logic increases, you will find yourself passing the unit of work on to other classes, which makes the code harder to follow. Currently, I rather let the unit of work be created and controlled outside the scope of a handlers and configure my DI container in such way that in a certain scope, the same unit of work is always injected. This doesn't invalidate the use of unit of work factories, since they can still be used by parts of the code that controls the unit of work, but keep in mind that you need to do more infrastructural code (DI wiring) to get this to work correctly.

Limitations

Although this code has worked very well for me, there are a few short comes that might interest you. First of all, this code doesn’t check if the chosen LINQ provider (LINQ to SQL, Entity Framework) can even execute your queries. During unit testing we actually use the LINQ provider on top of LINQ to Objects. This provider just compiles the Expression tree down to delegates and is able to execute practically any query you give it. Providers that translate the query down to SQL (or anything else for that matter) can’t do this. You need to be aware of this and need to write integration tests, or test manually if being able to migrate another persistence framework is a concern. This short come is in fact caused by the concept of LINQ to Expression trees itself. To prevent this you could have your repositories not implement IQueryable<T> and locate the LINQ queries inside a technology specific repository (which is the usual thing to do). This however, doesn’t solve anything, because you now have code that never runs in unit tests at all, and need to rewrite those classes when you migrate to another technology, or at least you still have to test this code manually. There might be a way around this, by letting your LINQ provider run in the background, and verify the executed queries. However, I never managed to create a working proof of concept of this.

A better solution is to give custom queries their own abstraction in the system. This gives you a lot of flexibility, although it still forces you to do integration testing on those queries.

Second short come has to do with the difference in delete behavior between frameworks. NHibernate for instance runs inserts and deletes in the order they are registered. While this sometimes is unpractical, at least this is predictable. LINQ to SQL on the other hand, always executes delete statements last what can be quite annoying. Therefore, what might work during unit testing or while running with one framework, might fail with another.

Third, this design does not completely hide the O/RM tool. One of the differences that might bite you is the lazy loading behavior of sub entities. When accessing the Employee property of an Order, LINQ to SQL will load it lazily from the database. Entity Framework 3.5 returns null, unless you explicitly tell it to include it. (This design really stinks, and because of this the default behavior has changed in EF 4.0.) This will practically prevent you from using EF 3.5 with this design. A good way to prevent this is by using POCO objects. LINQ to SQL allows you to use POCOs, but again EF 3.5 does not. Also don’t forget that LINQ to SQL only supports a one-to-one mapping between an CLR object and database table, while other O/RM tools allow very complex mappings. For this reason, migrating from LINQ to SQL to EF 4.0 would be much easier than the other way around.

A fourth short come is the lack of possibilities to tune performance. Remember that there is one single Repository<T> and all entity specific methods are extension methods. Those extension methods need to operate on IQueryable<T>. Because those methods will also be called during testing, you can’t tune performance by calling some stored procedure at that point. Doing this would ‘promote’ your unit tests to integration tests. UPDATE 2011-01-27: Compared to what I thought before, performance tuning is possible by injecting fetching strategies into the application. UPDATE 2011-06-19: Please read part 2 of this series if you're interested in optimizating performance using fetching strategies.

Yet another short come is that this design does not handle concurrency conflicts. All modern O/RM frameworks have a way to handle (especially) offline optimistic concurrency conflicts. The way they report errors however, and the way you have to deal with conflicts however, differs. If you look closely at the designs of the tools I think you can come up with an API that allows not only reporting (throwing exceptions is the simple part), but also fixing those conflicts. This however, is not something I’ve dealt with. I usually let the application blow right in my face in the situation of a concurrency conflict and log that failure. I think the best way around this is to wrap your business operations with a decorator that handles these concurrency conflicts for you. You can only do this when you give business operations their own abstraction, as I've written about that here.

A last short come I like to mention is that this method will only help you to replace one LINQ provider with another one. When you swap to Azure for instance, you will probably have no LINQ support (or a very limited one) and transition from one to another will fail. If you want to prevent this you should probably hide the LINQ queries behind interfaces.

As you might have noticed, most short comes have to do with LINQ being an leaky abstraction.

UPDATE 2010-11-18: Damien wrote earlier this year an interesting extension method that allows you to do eager loading in a persistence ignorant (read: testable) way. You should definitely check it out.

UPDATE 2010-12-01: Dennis Doomen used the design described in this article successfully in one of his applications. He also used this design in his Silverlight Cookbook reference architecture.

Conclusion

As you know there is not a problem in software design that can’t be solved by adding a layer of abstraction, except of course the problem of too many layers of abstraction :-). While this might seem like a lot of code, don’t forget that you only need the code for the persistence framework you're using (I included code for both LINQ to SQL as Entity Framework). Also, when you don’t need multiple data stores, like I do, the design can be simplified. One of the big plusses for me about this design is the amount of code it saves me to write while writing unit tests.

Cheers

- .NET General, ADO.NET, C#, Databases, Dependency injection, Entity Framework, LINQ, LINQ to SQL, O/RM - fifteen comments / No trackbacks - § ¶

The code samples on my weblog are colorized using javascript, but you disabled javascript (for my website) on your browser. If you're interested in viewing the posted code snippets in color, please enable javascript.

fifteen comments:

Hi Steven,

nice post, allthough I do have a few questions
- Why have you chosen Linq to SQL as your provider (if it was a choice at all)
- Why don't you restrict the type of Repository (apart from the class constraint) to an interface that decorates all your business entities (I assume your fetching business entities)
- You abstract the OR/M from your business layer but a lot of frameworks (which I tend to like) have a dependency on their own entities. How can you handle these kind of entities in your model as you can't project Linq queries to those objects without losing the 'deferred execution' part (as a Linq query executes on a projection to a different type).
Hans (URL) - 15 11 10 - 19:57

Hi Steven,

Great article that inspires me to refactor my current project a bit :-)

Some remarks and questions:
* How do you propose to handle specific Repository functionality that needs to rely on the ORM specific API? E.g. using NHibernate's ICriteria to optimize a specific query.
* Since a repository mimics a collection, I would use methods like Add and Remove, rather than Save and Delete
* I would never accept the potential ability to switch ORMs as a reason for introducing abstractions. Testability IS a good reason though.
Dennis Doomen (URL) - 16 11 10 - 19:10

Hans,

Thanks for your response. I couldn't choose my framework version. It is fixed to .NET 2.0/3.5 and the client dictates Microsoft stuff, so the choice was between L2S or EF. And as I said before, EF 3.5 stinks, so I picked LINQ to SQL.

While implementing the entities with an IEntity interface would at least make the roles of those objects explicit (as Udi Dahan preaches), it wouldn't gain much with this particular design. First of all, because calls to the IDataMapper.GetRepository method would be abstracted by a unit of work class, the compile time support is not an issue. Further, to implement that interface, I would have to write a partial class for every entity that I need a repository for, because -call me old fashion- I let L2S generate my entities ;-). Besides that, having an Id property on the IEntity interface would unfortunately not work, because, as far as I know, no LINQ provider can handle interfaces. You must work with concrete types. For that reason I need a GetById method per entity.

While I use an abstraction over LINQ providers, my code uses the default L2S generated classes. While these classes do not have a base type, they have EntitySet sub collections, which are L2S specific. For the most part, letting your entities be generated by another framework wouldn't be that much of a problem, as long as you don't use framework specific features on those entities (and of course there will be some specific issues you might run into). Of course it depends on the framework. Deferred executing still works, even when using projections. I do this all the time. LINQ to SQL effectively generates efficient database queries for me. No problem at all.
Steven (URL) - 16 11 10 - 20:09

Dennis,

Handling O/RM specific functionality is a problem with this design. Mainly because the repository specific methods are in fact static methods and those static methods must be persistence ignorant, because they will get called during testing. This design makes it hard to use something like Udi Dahan's IFetchingStrategy (I think you remember his talk from last years SDN conference).

I'm not sure about having Add and Remove methods, because an IQueryable does not really mimic collection. The idea of a collection would be that an element would be directly visible in the collection after adding it. After calling InsertOnSubmit on LINQ to SQLs Table however, the inserted element will not be part of the Table until the changes are submitted to the database. An IQueryable is a view over persisted objects. In other words, it mimics a database table. Because of this, the LINQ to SQL API designers cleverly choose the name 'InsertOnSubmit' which clearly describes that the element will only be visible after calling SubmitChanges.

I must admit that the sole reason for me to introduce this abstraction was for testability. It was that question on Stackoverflow that triggered me to think about the switchability of this design. However, when you look at the given list of short comes, I think we can conclude that it is pretty be hard to create an abstraction that doesn't leak and allows you to easily switch from one persistence technology to another, while allowing LINQ over expression tree queries.

Cheers
Steven (URL) - 16 11 10 - 20:41

Steven,

Why don't you implement a DefaultRepository which derives from Repository and tell your IoC framework to return that one if there's no specific repository defined? This way you'll only have to implement the default behavior once and make explicit classes (which immediately makes it nice and clean where to look for specific code) for repositories that need specific code.
Hans (URL) - 16 11 10 - 21:53

That's why they introduced the "Leaky Abstraction" anti-pattern :-) Unfortunately, nobody has solved it, yet.

I'm going to think a bit more about your design. I hate the fact that I currently have to use three or more repository stubs to unit test my business command handlers.
Dennis Doomen (URL) - 17 11 10 - 08:46

Steven,

I posted the following comment on the StackOverflow thread as well for continuity:

Sorry it took so long to respond, but I do appreciate the detailed explanation in your blog post. A couple quick follow-ups:

First, why extension methods versus a strongly-typed subclass?

And second, how would you handle an entity (in business object terms) that required data from two data contexts (e.g. part of the data is stored in a SQL Server db with the rest in a DB2 or Oracle db)? In other words, the business command or service needs to return an object (or list of objects) that is an Aggregate Root (to steal the DDD term) with its data stored across two distinct (and unlinkable) data stores. Would it be up to the service or command to manage two UoW or would there be a different UoW for this operation?
SonOfPirate (URL) - 29 11 10 - 19:35

SonOfPirate,

Using extension methods the way I described in my post allows you to have a single repository class. Having methods like GetById and FindPersonsByName on the repository means we have to create class for each entity. To make things worse, you probably end up defining a test repository for each entity as well. This results in a lot of extra code and thus a lot of work in creating and maintaining those classes. That's why I choose extension methods. However, it depends on your project whether this will work for you.
Steven - 29 11 10 - 22:33

Steven,

I guess I'm still not sure why it is better to use extension methods. I disagree that it is saving you any code since you still have to implement and maintain the extension methods. Plus, we use a mocking framework so I don't manually create test repositories. And, with a dependency injection container, instantiating strongly-typed repositories isn't that big of a deal.

Don't get me wrong, I love extension methods and use them to solve all kinds of problems. But, I develop code that is consumed, used and/or maintained by other developers that might not be on the same page so I'd like to be able to explain and justify the approach.

Here's how I would have created the sub-classes:

public class CustomerRepository : Repository<Customer>
{
    public Customer GetById(string id)
    {
        return Single(Where(e => e.Id == id), id);
    }
}

public class EmployeeRepository : Repository<Employee>
{
    public Employee GetById(int id)
    {
        return Single(Where(e => e.Id == id), id);
    }
}

public class OrderRepository : Repository<Order>
{
    public static Order GetById(int id)
    {
        return Single(Where(e => e.Id == id), id);
    }
}

Does it just boil down to personal preference or is there a concrete benefit to using extension methods that I'm simply missing?
SonOfPirate - 30 11 10 - 13:43

No single design is good for every solution. I'm just describing a model that worked well in my situation. You have got other tools, other developers, other requirements. It all influences the design. If you're not sure whether to use extension methods or go with sub types, I recommend you to write a test project to try it out. Find out how to write your UnitOfWork classes, how to write your tests, how to hook everything op in your IoC container. Pick what works best for you.
Steven (URL) - 30 11 10 - 15:25

What is missing from your example is true persistance ignorance. Everything you've described works great to decouple and abstract the data access code (and thereby the technology) except that you are strongly-typing everything to the Linq-to-SQL entity types.

I've had good success accomplishing this by introducing interfaces for the entities that the Unit of Work and Repository classes (as well as calling code) will use. Then I simply create a partial for each generated entity class that implements this interface. And this works great for a simple model, however, ...

Where I get stuck is accomplishing this with one-to-many parent-child relationships. For instance, in your example each Customer entity would implement an ICustomer interface with an Orders property of type ICollection<Order>. This will generate a compilation error as EntitySet can be cast to ICollection<Order> but not to a collection of the interfaces. And using the Linq Cast<>() method returns and IEnumerable. The only way I've found to satisfy the interface is to implement the property explicitly and return Orders.Cast<IOrder>().ToList() which, of course, forces the Linq statement to execute and any further clauses are not performed against the in-memory copy of the list. I'd prefer a way that returned the list so that additional clauses can be added before the query is executed.

Any thoughts on this?
SonOfPirate - 15 03 11 - 14:41

Hiding your entities behind an interface will not work, because most (if not all) LINQ providers can only work with queries over entities. With interfaces, they just don't know what to do with them. If you want true persistance ignorance, you should go with POCO entities. Both EF4 and L2S allow you to work with POCOs.
Steven (URL) - 15 03 11 - 15:24

this post is rather old. There is no "RegisterSingle" in the current Unity.
Blaise - 30 10 12 - 16:18

@Blaise, the article doesn't mention Unity. The registration examples are for a hypothetical DI container. For Unity, you will have to use RegisterType and supply a ContainerControlledLifetimeManager. But you might want to try another container, such as http://simpleinjector.codeplex.com/.
Steven (URL) - 30 10 12 - 16:31

Steven,
Considering changes in EF, are you still defend your proposed design or this post is old and is not defendable anymore?
Alireza - 31 01 13 - 09:53