PhD Student, Intelligent Systems Program, University of Pittsburgh
23 Oct 2020 - Arun Balajiee
Today we had two speakers talking about the broader domain of Visual Annotation using NLP and Computer Vision – with different approaches to two different problem spaces. The first speaker was Tristan Maidment, a PhD student in the Intelligent System Program, talking about a survey of 6 models, their characteristics and performance in a domain adapted setup. His approach deals with translating the domains these models were trained to a target domain on which these models are re-trained to perform with high accuracy. A subset of these models called the neuro-symbolic models performed really well when run as domain adapted model.
The second speaker was Dr. Malihe Alikhani, Professor at the department. She covered topics on coherence resolution, the creation of coherence relations in discourses – basically idenfiying the flow of logic and concept of the different sentences in a text and apply these principles in the field of visual captioning, multimodal visual communication and applied common sense in models. Using multimodal approach, disambiguation of text becomes easier and captioning of abstract entities in an image become easier. Mutlitmodal approach is also applicable in human-robot collaborative tasks – where the interaction between a human and a robot can involve gestures to add depth and meaning to a conversation. Further, conversational agents can be created to incorporate multimodal machine learning to be applied natural language understanding and towards better dialog systems. The key aspect from the talk to learn would be that all this is possible using the simple ideas of resarch and progress made in the field of cognitive science in the area of coherence resolution and coherence relations in human dialog.