Korean, Edit

Chapter 22. Image Generative Models

Recommended Reading : 【Algorithm】 Algorithm Index


1. DIP

2. Vision Transformer

3. Image Generative Models



1. DIP (deep image prior)

⑴ Features : Overfits the CNN architecture to the input image without training data to generate new images



2. Computer Vision Foundation Models

⑴ Vision Transformer (ViT)

① ViT uses only the transformer encoder structure

Step 1. Split the image into multiple small patches and treat each patch as a token for input into the transformer

Step 2. Embed each patch using the transformer encoder

Step 3. Just like embedding words in a sentence and outputting the sentence embedding that represents the sentence’s meaning, ViT learns the relationships between patches and outputs features representing the whole image.

Limitation: The computation of self-attention is proportional to the square of the number of patches that make up the image, making it difficult to input high-resolution images at once.

Solution 1: Divide the given image into smaller patches and apply ViT independently to each patch (e.g., iSTAR).

Solution 2: Introduce an extended self-attention mechanism, such as dilated self-attention using models like LongNet (e.g., Prov-GigaPath).

⑵ Types

BEiT: A ViT variant that adopts the idea of the BERT model, trained similarly to masked language modeling.

iSTAR: Used to enhance the resolution of spatial transcriptomics. It utilizes a BEiT-based model trained with the DINO method.

② Swin Transformer: A ViT variant that uses window-based local self-attention.

CTransPath : Wang et al., Medical Image Analysis (2022)

UNI : Chen et al., Nature Medicine (2024)

CONCH (CONtrastive learning from Captions for Histopathology) : Lu et al., Nature Medicine (2024)

Virchow : Vorontsov et al., arxiv (2023)

RudolfV : Dippel et al., arxiv (2024)

Campanella : Campanella et al., arxiv (2023)

Prov-GigaPath: Announced by Microsoft, it is a vision foundation model trained on 170,000 pathology images (1.3 billion tiles) (2024).



3. Image Generation Models

⑴ Types

① DALL·E3 (OpenAI)

② Midjourney

③ Stable Diffusion

④ Sora (OpenAI)

⑤ Video LLM



Input: 2024.04.22 14:08

results matching ""

    No results matching ""