We have a very heavy program which run several subroutines a very large number of times
The routines are in C, optimized by a good compiler.
We are using OpenMP outside the loop for parallelization and so we need to optimize with a single core.
We need an Assembler specialist to transform the code from C to Assembler in order to speed up the calculation
The first test program (which can be sent on demand) includes a few nested for loops