And here come multimodal Chain of Thought.

Amazon researchers paired text with images as input to drive LLM ability ever higher. And they did it with a model that’s only 770 million params (vs the 175 Billion in GPT-3.5). “Our method achieves new state-of-the-art performance on the ScienceQA benchmark, outperforming accuracy of GPT-3.5 by 16% and even surpassing human performance.” learn more

Leave a Reply

Your email address will not be published. Required fields are marked *