Microsoft's Phi-4 Multimodal – Best Small Model Out There?

Spread the love


Microsoft's Phi-4 Multimodal – Best Small Model Out There?


Walkthrough and thorough test of the recently released Phi-4 multimodal model (5.6B parameters).

Code here – https://github.com/designingwithml/blogposts/blob/main/notebooks/modelevals/phi4/phi-4-multimodal.ipynb

0:00 Introduction
01:21 Setup
02:30 Text Conversation
03:53 Image Description
05:42 OCR
06:42 Tool Calling
07:32 Audio…

source

Reviews

0 %

User Score

0 ratings
Rate This

Sharing

Leave your comment

Your email address will not be published. Required fields are marked *

Prove your humanity: 5   +   3   =  

6 Comments

  1. Got it up and running on my kind of outdated local PC. Somewhat a struggle to do so. Model is impressive. Will check it out further. Thank you very much for your video, creative use cases and snippets to run them. Subscribed.

  2. Short story, the model's great at text generation (e.g., summarize x), multimodal understanding (what does the author speak about in this audio file and how is it related to the image provided), audio transcription (give me a verbatim transcription of this audio file), OCR (give me ALL the text in this image as a tidy markdown file), function calling.

    If you are doing any of this and would like a small/local model (e.g., for latency, privacy, compliance etc reasons), definitely try Phi-4 multimodal.