MCTS DPO for VLM Hallucination Mitigation
Project: MCTS DPO for VLM Hallucination Mitigation
Inspiration: It came to me during the winter holidays, after skating. also this
Update: There has no been anything done similar but there exists papers with similar concepts: 1 2, 3
Yes, the idea sounds ludicrous but it makes sense if you can break rewards done step-wise, plus the idea is highly appealing when one considers that the density of the visual modality appeals to needing search to understand the level of abstraction that is desired by the user who is asking the query. DPO is there to push the VLM to work in such a way that it does try to get to that level of abstraction. That is my ted talk for today!