Friday, November 14, 2014

Asynchronous Programming Concepts (.NET related)

Async execution - It's all about responsiveness.
Old code scenarios
To use WaitHandle from a Task.
Implement the APM pattern by FromAsync methods and "Tasks Forest"
Implement the EPM pattern by TaskCompletionSource.
Asynchronous Task
means the Threadless Task.
Asynchronous Tasks may have only the next statuses:
Fault, RunToCompletion, Cancelled.
Continuation for Asynchronous Tasks
Use the TaskCompletionSource<T> class.
async await
A syntactic sugar for awful amount of work.
A compiler's underground job.
A enumeration, try, finally, using and etc.
Users SynchronizationContext implicitly.
Unwrap multiple exceptions
The await will return only the first exception from a AggregateException.
There is a useful trick to unwrap multiple exceptions.
await task.ContinueWith(
    ( ) => { },
    TaskContinuationOptions.ExecuteSynchronously
);
var results = task.Result;
Multiple await
Preventing of lost exceptions with multiple await. (await like Parallel Programming fashion)
- Use trick from the Unwrap multiple exceptions.

Wednesday, November 5, 2014

Parallel Programming Concepts (.NET related)

Parallel execution - It's all about loaded computations.
Common Types of Parallelism
Data
    amount of data processed by one operation in parallel.
Patterns
    Parallel Loops
Implementations
    Parallel.Foreach
    Parallel.For
Task
    Some data or amount of data processed by many operations.
Patterns
    Parallel tasks
Implementations
    Parallel.Invoke
Dataflow
    Operations processed in some flow or order.
Patterns
    Features
    Pipeline
Embarrassingly parallel
    computations entirely independent of one another. No dataflow in between the options.

Patterns
Parallel Loops
Parallel.For / Parallel.Foreach
Parallel Tasks
Parallel.Invoke
Parallel Aggregation
Map-Reduce
Producer-Consumer
By Parallel Loops with TLS (Task Local Storage) parameter
or by Wait All One By One
Futures
Dynamic Task Parallelism
Pipelines
+ Concurrent Data Structures.
Dataflow
(many to one + one to many, ContinueWhenAll)
+Concurrent Data Structures.
Speculative Execution
Wait All One By One
Facade Task (Threads saving by APM)

The TPL's main pattern-approach
Fork-join
Custom Partitioning Strategies
range (static)
stride (static)
chunk (dynamic)
hash (dynamic)

Wednesday, January 1, 2014

Hashing for collection's keys


Key objects must be immutable as long as they are used as keys in the Hashtable.

When an element is added to the Hashtable, the element is placed into a bucket based on the hash code of the key. Subsequent lookups of the key use the hash code of the key to search in only one particular bucket, thus substantially reducing the number of key comparisons required to find an element.
GetBucket(item.GetHashCode())

Any object that creates a hashvalue should change the hashvalue, when the object changes, but it must not - absolutely must not - allow any changes to itself, when it is used inside a Hashtable (or any other Hash-using object, of course).

So, just make GetHashCode result change, when the object data changes, and if the use of the object inside of hash using lists or objects is intended (or just possible) then make the object either immutable or create a readonly flag to use for the lifetime of a hashed list containing the object.

(By the way: All of this is not C# oder .NET specific - it is in the nature of all hashtable implementations, or more generally of any indexed list, that identifying data of objects should never change, while the object is in the list. Unexpected and unpredictable behaviour will occur, if this rule is broken. Somewhere, there may be list implementations, that do monitor all elements inside the list and do automatic reindexing the list - but the performance of those will surely be gruesome at best.)

What is GetHashCode used for?
It is by design useful for only one thing: putting an object in a hash table. Hence the name.

Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!

GetHashCode may returns different values in time, on different processes and in different .NET versions.

GetHashCode is designed to do only one thing: balance a hash table. Do not use it for anything else.

See all thoughts at the Guidelines and rules for GetHashCode

From:
MSDN
GetHashCode Guidelines in C#
Guidelines and rules for GetHashCode

Choosing Collections in .NET Framework

(In progress...)

Big O Notation
    http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
.
The Generic Collections: System.Collections.Generic namespace
    Associative Collection Classes (stores a value by a key, associates the value with the key)
        useful to lookup/manipulate a collection using a key value
        The Dictionary<TKey,TValue>
            Best for high performance lookups.
                is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers
                The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
                the key type should correctly implement GetHashCode() and Equals() appropriately
                or you should provide an external IEqualityComparer to the dictionary on construction.
                O(1)
                    insert/delete/lookup time of items in the dictionary is amortized constant time - O(1)
                        - which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant
                        This is highly desirable for high-speed lookups
                downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order
        The SortedDictionary<TKey,TValue>
            Compromise of Dictionary speed and ordering, uses binary search tree.
                is similar to the Dictionary<TKey,TValue> in usage
                differents in implementation
                gives fast lookups (slover then Dictionary<TKey,TValue>) but also the ability to maintain the collection in order by the key
                    O(1) == O(log 10)
                    O(log 10) < O(log 11)
                O(log n)
                    requires a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n).
                has a cost of maintaining a constant sorting
                uses a binary tree under the covers to maintain the items in order by the key
                As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted.
        The SortedList<TKey,TValue>
            Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads.
                is implemented as an array of key/value pairs, sorted by the key.
                uses less memory than SortedDictionary<TKey, TValue>
                SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data - O(n)
                Lookup - O(log n)
                supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties.
                Key objects must be immutable as long as they are used as keys in the SortedList<TKey, TValue>.
                Has capacity management.
    The Non-Associative Containers
        The List<T>
            Best for smaller lists where direct access required and no sorting.
                items are stored contiguously as an array, you can access items in the List<T> by index very quickly
                inserting and removing in the beginning or middle of the List<T> are very costly
                    because you must shift all the items up or down as you delete or insert respectively.
                adding and removing at the end of a List<T> is an amortized constant operation - O(1)
                typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed.
        The LinkedList<T>
            Best for lists where inserting/deleting in middle is common and no direct access required.
                is a basic implementation of a doubly-linked list.
                By this you can add or remove items in the middle of a linked list very quickly
                    (because there's no items to move up or down in contiguous memory)
                It lost the ability to index items by position quickly
                We favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing.
        The HashSet<T>
            Unique unordered collection, like a Dictionary except key and value are same object.
                is an unordered collection of unique items.
                Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object.
                is useful for super-quick lookups
                This collection is very useful for maintaining a collection of items you wish to check membership against.
                like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(),
                    or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction.
        The SortedSet<T>
            Unique sorted collection, like SortedDictionary except key and value are same object.
                is a binary tree where the key and value are the same object.
                The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>.
                adding/removing/lookups are logarithmic - O(log n)
                the ability to iterate over the items in order.
                !!! to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.
        the Stack<T> and Queue<T>
            Essentially same as List<T>
                sequential collections
                Stack<T> - last-in-first-out (LIFO)
                Queue<T> - first-in-first-out (FIFO)
    The summarize table at the
        http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
The Original Collections: System.Collections namespace
    considered deprecated by developers and by Microsoft itself.
    mostly used when you are dealing with legacy .NET code
    in some cases less performant than their generic counterparts.
        ArrayList
            A dynamic, contiguous collection of objects.
            use the List<T> instead.
        Hashtable
            Associative, unordered collection of key-value pairs of objects.
            use the Dictionary<TKey,TValue> instead.
        SortedList
            Associative, ordered collection of key-value pairs of objects.
            use the SortedList<T> instead.
        Queue
        Stack
The Concurrent Collections: System.Collections.Concurrent namespace
    They ignore the ICollection's IsSynchronized (always return false) and the SyncRoot (always returns null)
    all the concurrent containers are safe for enumeration even while being modified
    The main collections:
        ConcurrentDictionary
            Optimized for multiple readers (allows multiple readers under same lock).
        ConcurrentBag
            unordered collection of objects.
            Optimized for situations where a thread may be bother reader and writer.
        BlockingCollection
            Wrapper collection that implement producers & consumers paradigm.
            Readers can block until items are available to read.
            Writers can block until space is available to write (if bounded).
        ConcurrentStack
            achieves thread-safe access by using System.Threading.Interlocked operations.
            This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast!
            Interesting scenario: We may iterate collection in the foreach and change it on the fly - see the GetEnumerator() below.
                Pop() was removed by TryPop()
                PushRange() and TryPopRange() were added
                    Allows to deal with multiple items atomically.
                Push() works just like you’d expect
                Count takes a snapshot of the stack and then counts the items.
                    it is a O(n) operation, and if you just want to check for an empty stack, call IsEmpty instead which is O(1).
                ToArray() and GetEnumerator() both also take snapshots.
                    iteration over a stack will give you a static view at the time of the call and will not reflect updates.
                Peek() method has been removed in favor of a TryPeek()
        ConcurrentQueue
            Most notably changes:
                Dequeue() was removed by TryDequeue()
                    Returns true if an item existed and was dequeued and false if empty.
                Count does not take a snapshot
                    subtracts the head and tail index to get the count.
                    This results overall in a O(1) complexity which is quite good.
                    It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero.
                ToArray() and GetEnumerator() both take snapshots.
                Peek() method has been removed in favor of a TryPeek()
       
   
From:
    http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
    MSDN
    http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly