Matrix-Vector multiplication JIT compiler
 Committed to daNeuralNet a first working version of a JIT for matrix-vector multiplication that relies on the FMA instruction set (Fused Multiply and Addition).
Committed to daNeuralNet a first working version of a JIT for matrix-vector multiplication that relies on the FMA instruction set (Fused Multiply and Addition).
This version generates code that is up to twice faster than the OpenBLAS for matrix sizes up to CPU cache size (100×100 to 200×200 usually), and maintains a marginal lead for larger sizes, though those are bound by memory bandwidth. The performance profile is similar on both AMD and Intel CPUs.
 
 A test version of SamplingProfiler 64bit is available
A test version of SamplingProfiler 64bit is available  Just created a new repository with a “LibCBLAS” unit meant to use the OpenBLAS library in its Windows 64bit incarnation from Delphi 10.3+
Just created a new repository with a “LibCBLAS” unit meant to use the OpenBLAS library in its Windows 64bit incarnation from Delphi 10.3+