Generative Forms of Multimedia Content – IEEE International Conference on Multimedia and Expo

Keynote presentation

Schedule: Tuesday, July 7; 09:30 – 10:30 London (BST time zone)

Abstract

Multimedia analysis has recently made spectacular improvements in both quality and in sophistication. Over the last half-decade we have seen extreme progress in tasks like image and video tagging, object detection and activity recognition, generating descriptive captions and more. Some of these have been deployed and are in widespread use in our smartphones and on our social media platforms. We have also seen recent research work, including our own, on computing more abstract features of multimedia, such as person-counting from CCTV, computing visual salience, estimating aesthetics of images and videos, and computing video memorability. The common methodology used across most of these applications is of course machine learning, in all its forms, from convolutional neural networks to simple regression and support vector machines. Much of the research in our field is about wrestling with machine learning to optimise its performance in multimedia analysis tasks and this recent run of extreme progress does not look like ending anytime soon, though it will reach its high water mark. When it does reach the point at which it cannot get any better, what then ? Generative machine learning (ML) is a recent form of media analysis which turns the conventional approach on its head and its methodology is to train a model and then generate new data. Example applications of generative ML deoldify which colourises black and white images and video clips, and Generative Adversarial Networks (GANs) which can generate DNA sequences, 3D models of replacement teeth, impressionist paintings, and of course video clips, some known as deepfakes. Putting aside the more nefarious applications of deepfakes, what is the potential for generative forms of multimedia ? In the short to medium term we can speculate that it would include things like movie augmentation but it how far can it go and could it replicate human creativity ? In this talk I will introduce some of the recent forms of generative multimedia and discuss how far I believe we could go with this exciting new technology.

Alan Smeaton is Professor of Computing at Dublin City University where he has previously been Head of School and Executive Dean of Faculty. His early research interests covered the application of natural language processing to information seeking tasks and this evolved into the analysis and indexing of visual media (image and video) to support user tasks such as video searching, browsing and summarisation. Currently Alan’s research is around technology to support people in information seeking tasks and using this to compliment the frailties of our own human memory. Alan has a particular focus on lifelogging, automatically recording information about yourself, your everyday life and the recording is done by yourself, and for your own personal use. In 2001 Alan was a co-founder of TRECvid, the largest collaborative benchmarking activity in content-based tasks on digital video,and TRECVid has continued annually since then with over 2,000 researchers having contributed overall. Alan is a member of the Royal Irish Academy and an Academy Gold Medallist in Engineering Science, and award given only once every three years as “testament to a lifetime of passionate commitment to the highest standards in scholarship”. He is a Fellow of the IEEE and the current Chair of ACM SIGMM (Special Interest Group in Multimedia).