MCTS DPO for VLM Hallucination Mitigation

posted on 18 January 2025

baby VLMs hallucination mitigation methods

Project: MCTS DPO for VLM Hallucination Mitigation

Inspiration: It came to me during the winter holidays, after skating. also this

Update: There has no been anything done similar but there exists papers with similar concepts: 1 2, 3

Yes, the idea sounds ludicrous but it makes sense if you can break rewards done step-wise, plus the idea is highly appealing when one considers that the density of the visual modality appeals to needing search to understand the level of abstraction that is desired by the user who is asking the query. DPO is there to push the VLM to work in such a way that it does try to get to that level of abstraction. That is my ted talk for today!