Thursday, October 4, 2012

Yield-Return Makes Batches Happy


      For those of you who have not yet met .net's yield-return sorcery, allow me to introduce you to the best friend foreach will ever have.

     Conventional foreach blocks just run through IEnumerables; no feedback- no user warm fuzzies for the progress bar. Even a parallel.foreach is like the cable guy- you call, and wait.

    When processing big data- especially in conjunction with DB persistence- it is critical to provide incremental feedback at runtime. 
    Thanks to a post by webdev_hb on TheCodeProject, I fell in love with this unappreciated tool- and used it to peek inside the ubiquitous  foreach block.

     Here's how.

The ForEach Block:

     Say you have a .net method that populates a DataTable. You want to commit these records to your database. Maybe you use a stored procedure; maybe you lock the table for a BCP. Either way- you're eating server resources, playing try-catch roulette, and your client is stuck in black-box purgatory until you can report back to the UI thread.
    In processing and committing large data sets, you can easily overflow local memory, nuke transaction logs, and cause other avoidable unpleasantness.

    So, you refactor your code to bite off a manageable number of records at a time; let's say 100k- that's a good start. You commit each batch with a method that looks something like this:

public void ToSQL(List<DataTable> dataTables)
{
  foreach (DataTable dtMyDataTable in dataTables)
  {
     // Call your method for committing a DataTable to the Database
     CommitDT(dtMyDataTable, _ConnectionString, _TableOrSpName);
  }
}

 

The Pathetic UI: 

      Great: now you've fixed all of the runtime issues by bulk-copying friendly blocks of data- you have a happy DB, and a happy app server. 
      Not Great: your UI is playing hold music, and your user has gone to lunch.

 

The Fix:

      Now, let's get feedback from the foreach block so the user has something to root for.

public IEnumerable<Double> ToSQL(List<DataTable> dataTables)
{
  foreach (DataTable dtMyDataTable in dataTables)
  {
      double dProgress = 
           (double)dtMyDataTable.Rows.Count / (double)_TotalRecordCount;

     // Call your method for committing a DataTable to the Database

     CommitDT(dtMyDataTable, _ConnectionString, _TableOrSpName);

     // Return a progress update after each iteration with yield return
     yield return dProgress;
  }
  
  // We're done processing, so let's tell the caller explicitly with yield break
  yield break;


 

The Result:

       Where you once had an arbitrarily long wait and a static UI, you now have feedback at runtime. You can use the Double return type used here to update a progress bar, or you can get creative and return the entire DataTable for more detailed feedback. While you're at it, mark a timestamp, or recover gracefully from an exception, or just go bother your coworkers with your new passion for yield-return.