see environment file for full list of prerequisites. Tutorial implementations use Tensorflow > 2.0 (Keras) or Pytorch, but versions for Tensorflow 1.x users based on the deprecated tf.contrib module ...
Both are tutorials, and the MMult_4x4_17.c written now can reach 70% of the armv8.1 CPU peak The boundary problem is not dealt with now, only the case where MNK is a multiple of 4 is considered; ...