AAAI (Association for Advancement of Artificial Intelligence) 2016 Conference

Phoenix, Arizona, USA, 2016

The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) was held in February 12–17 at the Phoenix Convention Center, Phoenix, Arizona, USA. I was lucky enough to attend.

First of all, it was my first time in the USA and I have to admit that I was impressed. The buildings, the streets, the coffee, the food, the life!

Concerning the conference itself, it attracted a lot of people and celebrity academics working in the area of Artificial Intelligence. The first couple of days I attended interesting tutorials. From the traditional heuristic search to the trendy and hot topic of Deep Learning!! I will just mention that the deep learning tutorial was meant to be for students or people who were not familiar with the concepts. Even though they started from scratch, they managed to discuss about complex concepts by the end. So, the speakers presented what is included in their new book (check the book on Amazon) which is available for pre-orders at the time of writing this post.

On the days of the actual conference I had the opportunity to attend a number of talks about machine learning, security and multi-agent systems. I also attended the educational panel where Russell Stuart and Peter Norvig  announced the release of  the 4th edition of their famous book: Artificial Intelligence: A Modern Approach. The book will include material on Deep Learning and Monte Carlo Tree Search and meta-heuristics.

The evening of the first day of the conference I attended an invited talk from Andreas Krause who is currently leading the Learning & Adaptive Systems Group at the ETH University. He covered a lot of application themes as  the presentation’s title was From Proteins to Robots: Learning to Optimize with Confidence. The focus though was on Bayesian Optimization and how submodularity can be handy in such approaches in terms of providing guarantees and bounds on the solutions provided.

The next day, I attended another important talk. Demis Hassabis was there, the CEO of DeepMind which was acquired by Google for hundreds of millions back in 2014. There, it was announced that DeepMind’s AlphaGo will face the world champion in the game of Go in mid March (starting on March 9th 2016) and that it will be live streamed. For more details click here. Beside advertising and promoting DeepMind’s brain child, Demis went in to some details about the algorithms they used, giving us  intuition on how they approached the problem, why and how their algorithm works. Concretely, he said that contrary to people’s belief, not everything in Deepmind has to do with Deep Learning, but instead they rely on reinforcement learning which is an area inspired by behavioural psychology and our attempt to understand how brain works and ultimately how we learn. Put simply and briefly, reinforcement learning is about rewarding behaviour that is beneficial and penalising behaviour that is detrimental in some context.

My talk was just after lunch on the 3rd day of the conference. I was in the Computational Sustainability session and I only had a spotlight talk, which means I had to talk for 2 minutes in order to advertise my work and invite people to the poster session in the evening where I could present and talk more about my work. The session was interesting overall. The key speaker in my opinion was Pascal Van Henteryck. He and his students had 2 or 3 presentations in that session. It was about evacuation plans in cities that are vulnerable to flooding and natural disasters. Pascal is very well-known in the area and I personally have attended his online optimization course in Coursera.

The evening of that day I presented my poster in the allocated session. A lot of people stopped by and asked about my work and I am glad about it.

All in all, AAAI was a very good experience and I feel lucky to have the opportunity to be there! If you want to know more details about it or talk about specific papers please get in touch with me.

aaai16

Crowdsourcing categories

In this post I attempt to describe different types of crowdsourcing. This post will be continuously updated with examples, descriptions and potentially new categories.

Participatory sensing is about people carrying special equipment with them and take measurements for monitoring for example an environmental phenomenon.
Crowdsensing is about sharing data collected by sensing devices.
Crowdsourcing is an umbrella term that encapsulates a number of crowd-related activities. Wikipedia has the following definition: Crowdsourcing is the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.
Online crowdsourcing is about outsourcing online tasks to people. For example people doing tasks for micropayments in Amazon Mechanical Turk.
Citizen science is about assisting scientist in complex (for the machine) and time-consuming tasks (for the scientists). For example, identifying fossils in rocky environments from hundred of pictures or identifying the galaxies of the universe. More interesting projects can be found in zooniverse.org.
Spatial crowdsourcing is about doing tasks that requires participants to go to specific locations to do specific tasks. For instance, taking a photo of a plant that grows in a specific location would require participants to physically go to that location to complete their task.
Mobile crowdsourcing describes crowdsourcing activities that are processed on mobile devices such as smartphones and tablets.
Human computation is a type of collective intelligence and crowdsourcing where humans assist machines in performing tasks that are difficult for them.
Opportunistic sensing is about doing tasks without users active contribution. For example, take a measurement automatically when the device is near some location.

 

If you want to add to the descriptions or disagree with something above feel free to comment below.

Research Internship – Data science/Machine Learning

This post aims to describe my experiences from my three-month research internship at Toshiba Research Labs, Bristol, UK and the project I have been working on (September – December 2015).

I remember the day I first went there for my interview. The building was between a wonderful small square park and a river, and it was just 5 minutes walk from the city centre. But this was not the only thing I really liked. Working there I realised the importance of the culture in a firm. I appreciated the importance of collaboration, brainstorming and creativity. It was an academia-like environment; friendly, down-to-earth people with lots of ideas and knowledge on a variety of subjects. Everyone was approachable and you could discuss with them about anything.  I could communicate with colleagues effectively without having to worry about business formalities.

The project I worked on was intriguing. That was the main reason I applied for this research internship in the first place. It combined my academic interest on Machine Learning and my personal interest on human wellbeing.  In short, the project was about Mood Recognition ar Work using Wearable devices. In other words, understand, learn and attempt to predict someone’s mood (happiness/sadness/anger/stress/boredom/tiredness/excitrment/calmness) using just a wearable device (could be a smart wristband, a chest sensor or anything that is able to capture vital signs). Sounds impossible right? How can you predict such a complicated thing as human emotions? We, as humans, are not able to understand our mood. For example, how would you say you feel right now? Happy, Sad? Ok? This is indicative of the complexity of the problem we were facing. However, we wanted to do unscripted experiments, meaning we did not want to induce any emotions to the participants of our study. We rather wanted them to wear a smart device amd log on their mood in 2-hour intervals while they were still in work as accurate as they could. Surprisingly, at least for me, there was variation in their responses in general. Some higher, some lower but all of them varied. That was encouraging.

We had to study the literature, do some research to answer the following question: How could we extract meaningful features from vital signs and accelerometer signals that will have predictive capabilities in terms of emotions? After some digging around, we found the relevant literature. It was not new concept. There were studies both in Medical literature and in Computer Science, associating heart rate with stress and skin temperature with fatigue. We wanted to take this further. We wanted to check whether a combination of all these could have a more powerful predictive ability. Intuitively, think about the times you felt stressed. Your heart might pumps faster, but sometimes your foot or hand might be shaking as well. These could be captured by the accelerometer and together could be used as an additional indicator stressful situation.

We ended up with hundreds of features, and tested a number of basic machine learning techniques, such as Decision Trees and SVMs.

Our results were good enough, comparable to those in literature. Thus, we decided to publish our findings in the PerCom 2016 conference proceedings (WristSense Workshop)(http://ieeexplore.ieee.org/document/7457166/).

Further, a number of ideas for patents were discussed and exciting new venues for potential work was drawn.

Overall, I would recommend an internship during a PhD programme as it is a very rewarding experience.

I would like to take this opportunity to thank all of the employees, managers, directors there for the unique experience and their confidence in me.

Intelligent Express – Making your everyday coach more intelligent

If you live in the UK, you’ve travelled at least once with N. E. Well… If you haven’t, N.E. is a British multinational public transport company. I used to take their coaches to travel across the country. I still do. They connect all the places together by having frequent journeys to hundreds of destinations within at least the UK. So, what is this post about? Well, as I said, I am using their services mainly because of their prices. It is usually much cheaper than taking the train. However, there is one frustrating thing. Delays!! Waiting for the coaches seems never-ending. Other times you expect to go to your destination within a couple of hours and it takes four or more. It happened to me. I know, traffic. We can’t do anything about it. But, yes we can. We can at least know the schedule. We can know that the coach will actually take four hours to go to its destination and thus be prepared of the long journey.

The timetable is actually given along with the expected time to the destination…and, to my experience, it is usually wrong! So, here, I propose a simple solution that could be beneficial for both customers but also for the company.

Machine learning is a fast evolving branch lying between computer science and statistics and it could come handy. We can train intelligent algorithms to find patterns in the schedule of coaches. Specifically, we can learn their departing and arriving times and provide better estimates about each journey’s duration. So, we can know in advance that the trip is going to take more than expected or that is going to be departing late!

To the practical bit now. I believe that Gaussian Processes are ideal for this task. A periodic kernel could be used since we already know that duration depends on the day and the time of the day. Departure and arrival times can be noted down by the drivers and added to the system. Thus, a history of journeys’ times and durations can be created. Next, for any journey requested, an accurate estimation of the duration and departure time can be provided as well as the risk or the confidence interval or the uncertainty about that prediction.

Inference VS Prediction: What do we mean, where they are used and how?

A lot of people seem to confuse the two terms in the machine learning and statistics domain. This post will try to clarify what we mean by the two, where each one is useful and how they are  applied. I personally understood it when I had a class called Intelligent Data and Probabilistic Inference (by Duncan Gillies) in my Master’s degree. Here, I will present a couple of examples in order to intuitively understand the difference.

Inference:

You observe the grass in your backyard. It is wet. You observe the sky. It is cloudy. You infer it has rained. You then open the TV and watch the channel weather. It is cloudy but no rain for a couple of days. You remember you had a timer for the sprinkler a few hours ago. You infer that this is the cause of the grass being wet.

(The creepy example) Imagine you are staring at an object in the evening that is a bit far away in a corner. Getting closer… you observe that the object is staring back at you. You infer that is an animal. You are brave enough and you are getting closer.  You can now see the eyes, the fur, the legs and other characteristics of the animal.  You infer that it is a catA simple procedure for your brain, right? It feels trivial to you and probably stupid to even discuss it. You can of course recognize a cat. But in fact this is a form of inference. Say the cat has some features like: eyes, fur, shape etc. As you get closer to it, you assign different values to these variables. For example, initially eyes variable was set to 0, as you couldn’t see them. As you move closer you are more certain of what you observe. Your brain takes these observations and converts them in the probability that the object is a cat. Say we have a catness variable that represents the possibility of the object being a cat. Initially, this variable could be near zero. Catness is increased as you move closer to the object. Inference takes place and updates your belief about the catness of the object.   Similar example can be found here: http://www.doc.ic.ac.uk/~dfg/ProbabilisticInference/IDAPISlides01.pdf

Prediction:

You observe the sky. It is cloudy. You predict that is going to rain. You hear in the news that the chances for rain despite the clouds are low. You revise and predict that most probably is not going to rain.

Given the fact that you own a cat, you predict that when you come home, you will find it running around.

Final Example:

Understanding the behaviour of humans in terms of their daily routine, or their daily mobility patterns requires the inference of latent variables that control the dynamics of their behaviour. The knowledge of where people will be in the future is prediction. However, prediction cannot be made if we have not inferred the relationships and dynamics, let’s say, of the humans’ mobility.

Verdict:

Inference and prediction answer different questions. Prediction could be a simple guess or an informed guess based on evidence. Inference is about understanding the facts that are available to you. It is about utilising the information available to you in order to make sense of what is going on in the world. In one sense, prediction is about what is going to happen while inference is about what happened. In the book “An introduction to statistical learning” you can find more detailed explanation. But the point is that given some random variables (X1, X2…Xn) or features or, for simplicity, facts, if you are interested on estimating something (Y) then this is prediction. If you want to understand how (Y) changes as random variables change, then it is inference.

In a short sentence:  Inference is about understanding while prediction is about “guessing”.

Submodularity in Sensor Placement Problems

Many problems in Artificial Intelligence and in computer science in general are hard to solve. What practically this means is that it would take a computer probably hundred/thousands/millions of years of computation to solve it. Thus, many scientists tend to create algorithms that approximately solve difficult problems but in a sensible time period, i.e., seconds/minutes/hours.

A problem like this is the sensor placement problem. The key question here is to find a number of locations to place some sensors in order to achieve better coverage of the interested area. In order to solve this problem the computer has to compute all the possible combinations of placing the sensors we have in all different locations. To give some numbers, having 5 sensors and 100 possible locations, one has to try 75287520 combinations in order to find the best arrangement. Imagine what happens when the problem is about placing hundreds of sensors in a city where there are hundreds or thousands of options.

In such problems submodularity comes handy. It is an extremely important property used in many sensor placement problems.  It is a theorem that describes the behavior of functions. In particular, the main idea is that an addition to a small set has a higher return/utility/value rather than adding the same thing to a larger set. This can be better understood with an example. Imagine having 10000 sensors scattered in a big room taking measurements of the temperature every 2 hours. Now imagine adding another sensor to that room. Have we really gained much for doing so? So, we have a large set and we add something. Similarly, imagine the same room having only 1 sensor. Adding 1 more can give us better understanding of probably some corner or get a better estimate of the true average temperature of the room. So, this sensor was much more valuable to have that in the previous case. This is what i mean by saying that adding something to a smaller set has a higher utility.

It turns out that this property is very useful at maths and in computer science and AI in particular as it allows us to build algorithms that have theoretical guarantees. It has been proved that a greedy algorithm has a 63% of the optimal algorithm in terms of performance. This was initially proved from Nemhauser in maths contents and later from Krause et. al in the field of computer science and especially for the sensor placement problem. The image below shows this property in terms of diagrams to get a better feeling of what this property is about.

Submodularity (taken from Meliou et al. power point presentation)
Submodularity (taken from Meliou et al. power point presentation)

Gaussian Process Summer School

Last September I had the opportunity to attend Gaussian Process Summer School, in Sheffield, UK. It is a twice a year event that holds for 3-4 days. First of all, I have to say that it was an awesome experience even if i had no much time to explore the city. Besides, it was heavy raining most of my time there. Well, we had an excursion to a local brewery.

Anyway, the event was structured like full day lectures, everyday, given by experts in the field. And by saying experts I mean  guys like Rasmussen, who has written the famous book on Gaussian Processes (GPs) cited on any paper that includes these two words nowadays, and of course Neil Lawrence who has a whole lab in Sheffield working on Gaussian Processes and organizes this School.

What I enjoyed the most though were the lab sessions scheduled between lectures. It was the perfect time to get our hands dirty. It was a chance to use GPy, a python library that includes almost everything about GPs, developed in Sheffield. I have to admit that GPy seems a lot more powerful tool to have than GPML which I currently (it is a GP library for Matlab). Anyway, the exercises given were perfectly suited to play around with the features of GPy as well as discover the potential of GPy and Gaussian Processes in general. In fact, the exercises were given in ipython notebooks. Ipython notebook is an interactive computational environment, in which you can combine code execution, rich text and mathematics. Specifically, we were given snippets of code that had some crucial parts missing, which we were supposed to fill in.

Another memory from the GP school was that of Joaquin Quiñonero Candela who gave lectures at the summer school as well as the university of Sheffield. Joaquin was previously a researcher at Microsoft and he is now director of research in Applied Machine Learning at Facebook,  where apparently make use of advanced machine learning techniques and push the field to its limits. Importantly Joaquin co-authored papers with Rasmussen  on Gaussian Processes and he seemed to me a brilliant guy.

That is pretty much my experience from this school. In another post, I will introduce GPs and explain as intuitively as i can their usefulness and applicability.

gpss