This article explores how different embedding approaches perform in medical image retrieval tasks. Self-supervised models slightly edge out supervised ones, though the performance gap across architectures is narrow. Surprisingly, pretraining on natural images (ImageNet) outperforms domain-specific sets (RadImageNet), while fractal-based embeddings achieve unexpectedly strong results given their synthetic origins. DreamSim, an ensemble of ViT embeddings fine-tuned with synthetic data, delivers the best recall overall, making it the current leader in embedding generation. Isolated anomalies—like poor recall for certain anatomies—remain unexplained, pointing to fertile ground for future research.This article explores how different embedding approaches perform in medical image retrieval tasks. Self-supervised models slightly edge out supervised ones, though the performance gap across architectures is narrow. Surprisingly, pretraining on natural images (ImageNet) outperforms domain-specific sets (RadImageNet), while fractal-based embeddings achieve unexpectedly strong results given their synthetic origins. DreamSim, an ensemble of ViT embeddings fine-tuned with synthetic data, delivers the best recall overall, making it the current leader in embedding generation. Isolated anomalies—like poor recall for certain anatomies—remain unexplained, pointing to fertile ground for future research.

DreamSim and the Future of Embedding Models in Radiology AI

3 min read

Abstract and 1. Introduction

  1. Materials and Methods

    2.1 Vector Database and Indexing

    2.2 Feature Extractors

    2.3 Dataset and Pre-processing

    2.4 Search and Retrieval

    2.5 Re-ranking retrieval and evaluation

  2. Evaluation and 3.1 Search and Retrieval

    3.2 Re-ranking

  3. Discussion

    4.1 Dataset and 4.2 Re-ranking

    4.3 Embeddings

    4.4 Volume-based, Region-based and Localized Retrieval and 4.5 Localization-ratio

  4. Conclusion, Acknowledgement, and References

4.3 Embeddings

It was shown that embeddings generated from self-supervised models are slightly better for image retrieval tasks than those derived from regular supervised models. This is true for coarse anatomical regions with 29 labels (see Table 20) as well as fine-granular anatomical regions with 104 regions (see Table 21). This is roughly preserved for all modes of retrieval (i.e. slice-wise, volume-based, region-based, and localized retrieval). More generally, the differences in recall across differently pre-trained models (except pre-trained from fractal image) are very small. Practically, the exact choice of the feature extractor should not be noticeable to a potential user in a downstream application. Further, it can be

\

\ concluded that pre-training on general natural images (i.e. ImageNet) resulted in slightly more performant embedding vectors than domain-specific images (i.e. RadImageNet). This is unexpected and subject to further research.

\ Although, the model pre-trained of formula-derived synthetic images of fractals (i.e. Fractaldb) showed the lowest recall accuracy the absolute values are surprisingly high considering that the model learned visual primitives out of rendered fractals. This is very encouraging as the Formular-Driven Supervised Learning (FDSL) can easily be extended to very high number of data points per class and also several virtual classes within one family of formulas [Kataoka et al., 2022]. Additionally, the mathematical space of formulas for producing visual primitives is virtually infinite and thus it is the subject of further research whether radiology-specific visual primitives can be created that outperform natural image-based pre-training. Again, FDSL does not require the effort of data collection, curation, and annotation. It can scale to a large number of samples and classes which potentially results in a very smooth and evenly covered latent space.

\ Embeddings derived from DreamSim architecture showed the highest overall retrieval recall in region-based and localized evaluations. DreamSim is an ensemble architecture that uses multiple ViT embeddings with additional finetuning using synthetic images. It is plausible that an ensemble approach outperforms single-architecture embeddings (i.e. DINOv1, DINOv2, SwinTransformer, and ResNet50). Therefore, the usage of DreamSim is currently the preferred method of embedding generation.

\ Worth discussing is an observation that can be found in all tables presenting recall values. Across all model architectures (column) there are usually a few anatomies or regions (i.e. row) that show lower recall on average (see "Average" column). For example, in Table 2 "gallbladder" showed poor retrieval accuracy, whereas in Table Table 4 "brain" and "face" showed lower recall. The observation of isolated low-recall patterns can be seen across all modes of retrieval and aggregation. The authors of this paper cannot provide an explanation, as to why certain anatomies perform worse in certain retrieval configurations but gain high recall in many other retrieval configurations. This will be subject to future research.

\ Figure 9: Overview of average recall vs. mean anatomical region size for 29 anatomical regions for (a) slice-wise, (b) volume-based, (c) volume-based and re-ranking, (d) region-based, (e) region-based and re-ranking, (f) localized, (g) localized and re-ranking retrieval.

\ Figure 10: Overview of average recall vs. mean anatomical region size for 104 anatomical regions for (a) slice-wise, (b) volume-based, (c) volume-based and re-ranking, (d) region-based, (e) region-based and re-ranking, (f) localized, (g) localized and re-ranking retrieval.

\

:::info Authors:

(1) Farnaz Khun Jush, Bayer AG, Berlin, Germany ([email protected]);

(2) Steffen Vogler, Bayer AG, Berlin, Germany ([email protected]);

(3) Tuan Truong, Bayer AG, Berlin, Germany ([email protected]);

(4) Matthias Lenga, Bayer AG, Berlin, Germany ([email protected]).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
Edge Logo
Edge Price(EDGE)
$0,09027
$0,09027$0,09027
-6,83%
USD
Edge (EDGE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Woman shot 5 times by DHS to stare down Trump at State of the Union address

Woman shot 5 times by DHS to stare down Trump at State of the Union address

A House Democrat has invited Marimar Martinez to attend President Donald Trump's State of the Union address in Washington, D.C., after she was shot by Customs and
Share
Rawstory2026/02/06 03:36
What is Play-to-Earn Gaming? Unlocking New Possibilities

What is Play-to-Earn Gaming? Unlocking New Possibilities

The post What is Play-to-Earn Gaming? Unlocking New Possibilities appeared on BitcoinEthereumNews.com. The Play-to-Earn (P2E) model is playing a key role in the advancement of the crypto industry. Users are able to earn crypto by playing games and get involved with global communities of gamers, creators, and developers. In this article, we’ll explore the functionalities of P2E gaming, its core features, potential risks, benefits, legal issues, and highlight some of the most impactful games shaping the Web3 gaming frontier.  What is Play-to-Earn Gaming? As its name implies, you gain rewards for playing the game. Players in Play-to-Earn games get involved with blockchain networks and can receive crypto assets or NFTs as prizes. The assets you acquire can be sold, traded or kept as an investment to see if their value rises. In Axie Infinity, players gathered and combated Axies, which are fantastical creatures. The game gave players SLP, a coin that works the same as money and could be traded for fiat currencies or other coins. Due to its success, it has grown into a more advanced and eco-friendly economy on current gaming platforms. How P2E Works? Most P2E gaming relies on Ethereum and Layer 2 networks, including Immutable, Ronin, and Base. Users are given both tokens and NFTs for accomplishing various game goals, such as: Completing missions or winning battles Trading or crafting in-game items Participating in tournaments or community events Staking assets or voting in DAOs The main difference between P2E games and traditional ones is that players can truly own what they earn in the game. Weapons, land, avatars, and resources on the Web3 game are tokenized, enabling you to trade or transfer them elsewhere. For example, users in Decentraland are able to purchase virtual land as NFTs, set up experiences and earn money from events or the services they provide. They are different from other items since they…
Share
BitcoinEthereumNews2025/09/19 21:33
DBS Partners With Franklin Templeton and Ripple for Tokenized Lending Platform

DBS Partners With Franklin Templeton and Ripple for Tokenized Lending Platform

TLDR DBS Digital Exchange, Franklin Templeton, and Ripple signed a memorandum of understanding to launch tokenized trading and lending services on the XRP Ledger DBS will list Franklin Templeton’s sgBENJI token alongside Ripple’s RLUSD stablecoin, allowing real-time swaps for institutional investors The partnership enables portfolio rebalancing and yield generation during volatile market conditions through tokenized [...] The post DBS Partners With Franklin Templeton and Ripple for Tokenized Lending Platform appeared first on CoinCentral.
Share
Coincentral2025/09/18 17:06