Rewrite your for loops in c style, benchmark to see if dividing the list of players that being broadcasted a spell, for example, in smaller segments and sending packets in different threads is faster. Choose servers that provide NVMe SSDs. I would also consider not sending a packet to certain players if the number of players that it needs to be broadcasted is greater than X, yes you will have players watching some other players pvping without moving, but the gain of performance will be significant.