|10:15||10:45||Pavel Shkljudov (IBM), Slides,IBM Watson – the uncharted territory|
|10:45||11:15||Alexander Gasnikov (MIPT), Slides,
Universal fast gradient method for strictly convex stochastic composit optimization problems
We propose a new simple variant of Fast Gradient Method that requires only one projection per iteration. We called this method Triangle Method (TM) because it has a corresponding geometric description. We generalize TM for convex and strictly convex composite optimization problems. Then we propose Universal Triangle Method (UTM) for convex and strictly convex composite optimization problems (see Yu. Nesterov, Math. Program. 2015. for more details about what is Universal Fast Gradient Method). Finally, based on mini-batch technique we propose Stochastic Universal Triangle Method (SUTM). SUTM can be applied to stochastic convex and strictly convex composite optimization problems. Denote, that all the methods TM, UTM, SUTM are continuous on strictly convexity parameter and all of them reach known lower bounds. With additional assumption about the structure of the problems these methods work better than the lower bounds.
|11:15||11:45||Dmitry Vetrov (Skoltech), Slides,
Modelling multiple word meanings using Adaptive Skip-gram
We present an extension of famous Skip-gram model used in word2vec that allows to build vector representation for the meanings of the words rather than on the words. This is an excellent example of latent variable models (LVM) that are widely used for learning from weakly-labeled data. Elegant combination of LVM, stochastic optimization, non-parametric Bayesian inference and neural networks provides new computationally effective tool for word desambiguation allowing us to identify the number of different meanings a word may have and find their vector representations.
|11:45||12:15||Sergey Nikolenko (NRU HSE St. Petersburg, PDMI RAS), Slides,
Deep learning for natural language processing
The deep learning revolution has not left natural language processing alone. New advances in text modeling and generation, sentiment analysis, machine translation, dialog and conversation, question answering, and other NLP tasks arrive not by the month but rather by the week. Moreover, although DL in NLP started with standard neural architectures (RNNs and CNNs), it has branched out into several quite different directions, from recursive networks for syntactic parsing to attention-based models for machine translation and memory networks for question answering. In the talk, we give a ((very-)very) brief overview of the most interesting and promising directions in modern NLP based on deep learning.
|13:15||13:45||Ivan Laptev (INRIA), Slides,
Computer vision in CNN era: New challenges and opportunities
Recent progress in visual recognition goes hand-in-hand with the supervised learning and large-scale training data. While the amount of existing images and videos is huge, their detailed annotation is expensive and often prohibitive. To address this problem, in this talk we will focus on weakly-supervised learning methods using incomplete and noisy annotation for training. In the first part I will discuss recognition from still images and will describe our work on weakly-supervised convolutional neural networks for recognizing objects and human actions. The second part of the talk will focus on the learning of human actions from videos and corresponding textual descriptions in the form of movie scripts or narrations. We will conclude with future challenges in visual recognition.
|13:45||14:15||Andrey Ustyuzhanin (Yandex), Slides,
Reconstruction of long-lived particles tracks using deep learning
We present an approach to building end-to-end track reconstruction of a particle flying through detector with deep learning methods. Recent advances in deep learning allowed for extraction of high level representations from raw input data. Such methods work well for 2D samples. Raw data in particle detectors represented by set of 3D voxels (hits). Long-lived particles are of special interest for physicists since they may lead to enhancements to Standard Model but not studied well. We attempt to use the existing deep learning methods to reconstruct tracks of $K^0_s$ particles in the LHCb detector given only raw subtetector hits. Results of this attempt show that proposed approach can show better performance than standard reconstruction techniques.
|14:15||14:45||Alexander Chigorin (Yandex), Slides,
Improving Yandex images search using deep learning
Usage of an image content for relevance prediction is a long standing goal in any image search engine. In this talk I will share how we use deep learning to incorporate image content into relevance prediction pipeline. Use cases that will be covered include search by text query and search by image query.
Alexander Panin, Alexey Rogozhnikov (Yandex), Slides,
AgentNet - reinforcement learning toolkit for humans
Since the first attempts to train neural networks as MDP solvers back in 1992, there were a lot of methods developed to tackle the existing issues of such solvers. Experience replay techniques and delayed learning targets are widely used to stabilize the convergence and improve final optima. Several value-based and hybrid methods were invented to deal with continuous and/or parameterized action spaces. Finally, some newest advances allowed deep RL methods to deal with near-“real world” problems by introducing hierarchical MDP with several abstraction levels. In this talk we present an overview of major advances and new tendencies in Deep Reinforcement Learning domain and present an open-source toolkit designed for human-friendly creation of deep reinforcement learning systems in a variety of specific domains ranging from atari games to motion planning for mobile robots.
|17:05||17:35||Ruslan Salakhutdinov Carnegie Mellon University), Slides,Deep Learning|
|10:15||10:45||Ivan Oseledets (Skoltech), Slides,Tensors and deep architectures|
|10:45||11:15||Mikhail Burtsev (MIPT DeepHack), Slides,
FSNet learning in partially-observable stochastic environments
In a real life settings an agent usually has no full information about current state of the world. So the agent is required to generate alternative solutions and switch between them on the fly. FSNet algorithm learns goal-directed behavior in partially-observable stochastic environments by accumulation of alternative actions in its network. FSNet performance was studied in stochastic state spaces for different schedules of transitions sampling. Results demonstrate that it was able to solve the problem independent of the transitions update schedule for a number of sizes and topologies of the environment.
|11:15||11:45||Alexander Khanin (VisionLabs), Slides,
Computer vision and deep learning in practical business cases
In this talk real examples of using programming solutions based on computer vision algorithms and deep learning of VisionLabs clients will be presented.
|11:45||12:15||Ekaterina Lobacheva (Kaspersky Lab), Slides,
Recurrent neural networks for malware detection
In this talk I will cover some basic principles of these architectures and their usage for sequence classification tasks. In addition, I will present our ideas on using RNN for several practical tasks in malware detection.
|13:15||13:45||Vadim Lebedev (Skoltech), Slides,
Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires a slow and memory-consuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys~et~al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions. Joint work with Dmitry Ulianov, Andrea Vedaldi, and Victor Lempitsky. To be presented at ICML 2016.
|13:45||14:15||Victor Lempitsky (Skoltech), Slides,
DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation
I will present our ongoing project on photorealistic gaze redirection, in which we use machine learning to synthesize an image of a given face with an altered gaze direction. Our primary motivating application is enabling gaze contact during video-conferencing. I will focus on our recent results obtained with deep learning, where deep networks allowed us to obtain a boost in image photorealism. The developed method can be extended to other image editing applications. This is a joint work with Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, Leonid Ekimov.
|14:15||14:45||Andrey Afanasyev (iBinom),
Generalising better: applying deep-learning to prioritise deleterious point mutations
Over the past fifteen years researchers have developed a plethora of individual deleteriousness scoring systems, such as PolyPhen and SIFT. Lately, the focus has been shifting from creating novel standalone tools towards combining available scoring systems into ensembles or meta-scores… The shortage of reference data, combined with the bias towards intensively studied model systems, calls for improved generalisation ability… We applied several deep-learning strategies and compared the performance with all ACMG-recommended scoring systems as well as two novel scores, employing deep-learning. Our results show that even relatively simple modern neural networks significantly improve both prediction accuracy and coverage.
|14:45||15:15||Anton Rodomanov (FCS HSE), Slides,
Optimization Methods for Big Sums of Functions
We consider the problem of minimizing a big sum of smooth functions. A typical example of such a problem is empirical risk minimization for training a machine learning algorithm, e.g. Linear/Logistic Regression, SVM, deep neural network etc. One of the most popular methods for minimizing big sums of functions is Stochastic Gradient Descent (SGD). In this talk we are going to look at SGD from the optimization perspective and understand what its advantages and disadvantages are. After that we will discuss a possible alternative to SGD that the modern theory of optimization can offer.
Automatic Relevance Determination via Variational Dropout
Variational dropout is a way to learn dropout weights for Gaussian dropout, a popular regularization method. In our work we discovered that this method can lead to so-called ARD effect – some weights of the model are driven to zero. It leads to a simpler sparse solution and can be used for feature selection. During the talk I will briefly introduce the reparametrization trick and doubly stochastic variational inference procedure which are important techniques for non-conjugate Bayesian inference. Then I will introduce variational dropout and show how ARD effect can be achieved on some simple models.
|15:45||16:15||Artem Babenko (Yandex), Slides,
Deep learning for image retrieval
Image descriptors based on the activations within deep convolutional neural networks have emerged as state-of-the-art generic descriptors for visual recognition. Also, recent works have successfully applied these descriptors to the problem of query-by-image retrieval. Some methods use the neural networks pretrained for an unrelated classification task (e.g. on Image-Net) while others adapt the networks to the particular retrieval task via fine-tuning. In this talk I will give an overview of the several recent papers and will demonstrate the results on the standard common benchmarks.
|16:15||16:45||Lyubov Podoynitsina (Samsung R&D Institute Russia), Slides,
Veles - Deep learning platform from Samsung
Veles is a machine learning project that started in Samsung in 2013 and is currently open source. It was envisioned as a platform that facilitates creation of applications by non-expert developers. It supports deep neural networks, distributed training and provides more than 200 pre-defined units that can be combined into working models. New directions of the project in he rapidly changing technological landscape will be discussed.
|16:45||17:15||Alexey Artemov (Yandex Data Factory), Slides,
Signal filtration with trend in disorders detection problems
В последние десять лет возник новый тип высокотехнологичных систем: системы с интенсивным программным обеспечением (software-intensive systems). Согласно исследованиям, доминирующей причиной отказов таких систем является именно возникновение сбоев программного обеспечения. Таким образом, обеспечение их бесперебойной и эффективной эксплуатации представляет собой крупную проблему, для решения которой необходимо прежде всего предотвращение отказов их программной составляющей, в частности, их быстрое и точное обнаружение. Существующие методы детектирования отказов (разладок, сбоев) обладают рядом ограничений ввиду ряда особенностей сигналов реальных систем: циклов нагрузки на ряде масштабов времени, всплесков нагрузки ввиду наличия длинной памяти, а также невозможности определить математическую модель возникающего отказа. Высокие объемы данных обостряют проблему высокоэффективного автоматизированного обнаружения разладок современных больших систем.