Implementing Matrix Multiplication on CPU from Scratch

less than 1 minute read

Implementing Matrix Multiplication on CPU from Scratch

The goal of this project isnโ€™t to write a competitive BLAS implementation, but rather to learn about performance optimization. Starting with a naive approach, I applied various techniques step-by-step to significantly improve performance. I was assisted by an AI coding assistant during this process.

The complete code can be found at the link below:

Matmul implementation from scratch

References

Iโ€™ve curated a list of excellent articles to help learn these concepts. More details can be found in the links below:

Leave a comment