Interestingly enough I debugged the threads and yeah they where disposed really fast but I removed them, see how it performs and add a queue if it is needed.
About the benchmarking it performs exactly the same (Apart of the initial Dictionary loading which happens on startup anyway) but it is just way more easy to maintain and debug (which is very important for a project that is still adding the packets).
I can add a packet in one line instead of 5 and stepping through the code it way better.