OpenGVLab
/

InternViT-6B-448px-V1-0

Image Feature Extraction

feature-extraction

Model card Files Files and versions

czczup commited on Jul 25, 2024

Commit

101bc9b

·

verified ·

1 Parent(s): 24c2ed6

Update README.md

Files changed (1) hide show

README.md +0 -19

README.md CHANGED Viewed

@@ -26,25 +26,6 @@ We release InternViT-6B-448px-V1-0, which is integrated into [InternVL-Chat-V1-1
 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
 - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for MLLM. Therefore, when building a MLLM with this model, **please use the features from the fourth-to-last layer.**
-## Released Models
-### Vision Foundation model
-| Model                   | Date       | Download                                                               | Note                             |
-| ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
-| InternViT-6B-448px-V1-5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
-| InternViT-6B-448px-V1-2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution                   |
-| InternViT-6B-448px-V1-0 | 2024.01.30 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution                   |
-| InternViT-6B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px)      | vision foundation model          |
-| InternVL-14B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px)      | vision-language foundation model |
-### Multimodal Large Language Model (MLLM)
-| Model                   | Date       | Download                                                                    | Note                               |
-| ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
-| InternVL-Chat-V1-5      | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)            | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
-| InternVL-Chat-V1-2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus)       | more SFT data and stronger  |
-| InternVL-Chat-V1-2      | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2)            | scaling up LLM to 34B       |
-| InternVL-Chat-V1-1      | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)            | support Chinese and stronger OCR   |
 ## Model Usage (Image Embeddings)
 ```python

 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
 - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for MLLM. Therefore, when building a MLLM with this model, **please use the features from the fourth-to-last layer.**
 ## Model Usage (Image Embeddings)
 ```python