Beschreibung:
Visual Question Answering (VQA) usually combines visual inputs like image and video with a natural language question concerning the input and generates a natural language answer as the output. This is by nature a multi-disciplinary research problem, involving computer vision (CV), natural language processing (NLP), knowledge representation and reasoning (KR), etc.
1. Introduction.- 2. Deep Learning Basics.- 3. Question Answering (QA) Basics .- 4. The Classical Visual Question Answering.- 5. Knowledge-based VQA.