The Language-Guided Navigation module leverages an LLM (like ChatGPT) and the open-set O3D-SIM.The Language-Guided Navigation module leverages an LLM (like ChatGPT) and the open-set O3D-SIM.

VLN: LLM and CLIP for Instance-Specific Navigation on 3D Maps

Abstract and 1 Introduction

  1. Related Works

    2.1. Vision-and-Language Navigation

    2.2. Semantic Scene Understanding and Instance Segmentation

    2.3. 3D Scene Reconstruction

  2. Methodology

    3.1. Data Collection

    3.2. Open-set Semantic Information from Images

    3.3. Creating the Open-set 3D Representation

    3.4. Language-Guided Navigation

  3. Experiments

    4.1. Quantitative Evaluation

    4.2. Qualitative Results

  4. Conclusion and Future Work, Disclosure statement, and References

3.4. Language-Guided Navigation

In this section, we leverage the LLM-based approach from [1], which uses ChatGPT [35] to understand and map language commands to pre-defined function primitives that the robot can understand and execute. However, there are a few differences between our current approach and the approach in [1] regarding the use case of the LLM and the implementation of our function primitives. The previous approach used the LLM’s ability to bring in an open-set understanding by mapping general queries to the already-known closed-set class labels obtained via Mask2Former [7].

\ However, given the open-set nature of our new representation, O3D-SIM, the LLM does not need to do that. Figure 4 shows both approaches’ code output differences. The function primitives work similarly to the older approach, requiring the desired object type and its instance as an input. But now, the desired object is not from a pre-defined set of classes but a small query defining the object, so the implementation to find the desired location changes. We use the text and image-aligned nature of CLIP embeddings to find the desired object, where the input description is passed to the model, and its corresponding embedding is used to find the object in O3D-SIM.

\ A cosine similarity is calculated between the embedding of the description and all the embeddings of our representation. These are ranked in a decreasing order, and the desired instance is selected. Once the instance is finalized, a goal corresponding to this instance is generated and passed to the navigation stack for autonomous navigation of the robot, hence achieving Language-Guided Navigation.

\

:::info Authors:

(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;

(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;

(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;

(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Piyasa Fırsatı
Large Language Model Logosu
Large Language Model Fiyatı(LLM)
$0.0003453
$0.0003453$0.0003453
+4.57%
USD
Large Language Model (LLM) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Solana Faces Massive DDoS Attack Without Performance Issues

Solana Faces Massive DDoS Attack Without Performance Issues

Solana successfully countered a major DDoS attack without affecting users. The network maintained transaction confirmation times around 450 milliseconds. Continue
Paylaş
Coinstats2025/12/17 13:08
Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025

Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025

The post Is Doge Still The Best Crypto Investment, Or Will Pepeto Make You Rich In 2025 appeared on BitcoinEthereumNews.com. Crypto News 18 September 2025 | 13:39 Is Dogecoin actually running out of gas, after making people millionaires overnight? As investors hunt for the best crypto to buy now and the best crypto to invest in 2025, Dogecoin still owns the meme spotlight, yet its upside looks capped according to today’s Dogecoin price prediction. Focus is shifting toward projects that marry community with real on chain utility. People searching best crypto to buy now want shipped products, audits, and transparent tokenomics. That frames the honest matchup for this cycle, Dogecoin versus Pepeto. Meet Pepeto, an Ethereum based meme coin built with live rails, PepetoSwap for zero fee trading and Pepeto Bridge for smooth cross chain moves. By blending story with tools people can touch today, and speaking directly to crypto presale 2025 demand, Pepeto puts utility, clarity, and distribution first. In a market where older meme coins risk drifting on sentiment, Pepeto’s delivery gives it a credible seat in the best crypto investment debate. First, here is why Dogecoin may be fading. Dogecoin Price Prediction Is Dogecoin Losing Momentum Remember when Dogecoin made crypto feel effortless. In 2013, Doge turned an internet joke into money and a movement that welcomed everyone. A decade later the market is tougher and the relentless tailwind is gone, sentiment is choppier and patience matters. With Doge near $0.268, the setup reads bearish to neutral for the next few weeks. If the $0.26 shelf holds on daily closes, expect choppy range trading toward $0.29 to $0.30 where rallies keep stalling. Lose $0.26 and momentum often slides into $0.245 with risk of a deeper probe toward $0.22 to $0.21. Close back above $0.30 and the downside bias is likely neutralized, opening room for a squeeze into the low $0.30s. Beyond the price view, Dogecoin still centers…
Paylaş
BitcoinEthereumNews2025/09/18 18:56
XRP Price Steady Near $2 Amid Chart Compression and Growing ETF Inflows

XRP Price Steady Near $2 Amid Chart Compression and Growing ETF Inflows

XRP price has steadied near $2, with technical charts indicating momentum compression and strong institutional demand via ETF inflows. This convergence suggests
Paylaş
CoinoTag2025/12/17 13:33