This is a Triton implementation of the Flash Attention v2 algorithm from Tri Dao (https://tridao.me/publications/flash2/flash2.pdf) ...
𝟲 𝗕𝗶𝗼𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗰𝘀 𝗦𝗸𝗶𝗹𝗹𝘀 𝗥𝗲𝗰𝗿𝘂𝗶𝘁𝗲𝗿𝘀 𝗖𝗮𝗻 ...