1. 程式人生 > >Learning at Scale & The End of “If -Then” Logic.

Learning at Scale & The End of “If -Then” Logic.

Learning at Scale & The End of “If -Then” Logic.

In 2001, a group of Physicists were awarded the Nobel prize in Physics for creating an experiment that produced the Bose Einstein Condensate(BEC).

The BEC is a state of Matter in an extremely cold state, close to absolute zero(that is, very near 0 K or −273.15 °C), first theorized by Satyendra Nath Bose and Albert Einstein in 1925.

In the 2001 Nobel Prize winning experiment, the physicists created the first BEC in a lab by shooting multiple lasers at Gas particles from different directions. After meticulous calculations and planning, they carefully calibrated a series of lasers to achieve this. It was a huge deal.

However, last year, a group of researchers gave controls of the same Lasers to an A.I entity that taught itself to create BEC in about 1 hour.

It is important to note that this wasn’t just a computer program with complex instructions on how to create BEC.

“A simple computer program would have taken longer than the age of the universe to run through all the combinations and work this out.”
Paul Wigley and Michael Hush, 2 of the researchers behind the experiment. Photo: Stuart Hay

Wrote the co-lead researcher Paul Wigley of the Australian National University Research School of Physics and Engineering, — the Team behind the BEC AI experiment.

Instead the researchers used a machine learning technique called reinforcement learning — A process in which the computer learns from continuous and relentless trial-and-errors and gets better with each trial.

This incremental learning through small experiments of trial-and-error can be extremely small per trial.

However, for machines who never tire or forget, this “ incremental learning” is relentless, continuous and because of ever decreasing computing cost — highly scalable.

Earlier this year, 2 Students from Carnegie Mellon University created an AI poker player called “Libratus” using reinforcement learning. At Rivers Casino in Pittsburgh, it walked away with $1.8 million by beating four of the top Poker players in the world.

“We give the AI a description of the game. We don’t tell it how to play,” said Noam Brown, one of the 2 people behind the system.

Rivers Casino. Photo: cardschat.com

Here’s the summary of how they did it. They used an algorithm called “Counterfactual Regret Minimization” which is a fancy name for a reinforcement machine learning algorithm designed to minimize an Error value. For example, Error value can be the number of chips lost after each hand of poker. They replicated the system to make 2 AI entities and allowed Libratus to play itself continuously, for months, through trillions of hands of poker. It transitioned from placing random bets in the beginning to beating world champions after a few trillion hands of trial and errors.

Usually, computer programming involves the use of specific instructions for the computer to execute a particular action. For example in 1997, IBM’s champion chess playing software DeepBlue used sheer computing power to play-out a large number of possible scenarios at every given point during the game to ensure it avoided moves that led to a loss. It was able to consider 200 million positions per second. This strategy worked because Chess happens to have a finite number of unique scenarios. So it was possible for programmers to write a specific set of instructions to play the game.

It’s a lot harder to write specific instructions for games/tasks like Poker where the AI entity has incomplete information about the game state. On top of that, in Poker and in real world tasks, there isn’t one specific optimal move at a given point. Repeating strategies can be identified and exploited by the opponents (changing circumstances). Finally, like many other complex tasks performed by humans, we don’t fully understand how exactly decisions are made in a human brain.

“We don’t know what program to write because we don’t know how it is done in our brains”

Professor Geoffrey Hinton from the University of Toronto, regarded as the father of modern Neural Networks, said during his Machine Learning and Neural Networks lecture.

Turns out, most real life complex tasks where we are trying to optimize for an output that is a function of a large number of known/unknown, related/unrelated input variables, writing specific “if ___ then ___ unless____” logic is highly inefficient.

Imagine having to write millions of “if_then_” logic while designing an algorithm for a self driving car. We don’t know how to write a program for complicated tasks performed by humans because we don’t know how exactly we do these tasks.

It took us 3.8 billion years of adversarial reinforcement learning against the elements of the earth to evolve from a single cell organisms to what we are today.

All those years of slow evolution is difficult to decipher, let alone program.

But now, we have new tools to solve this problem.

Modeled after the basic ideas of evolution(i.e. Reinforcement learning) and the human brain(i.e. Deep Neural Nets), we now have access to a variety of technology stacks that can be used to perform almost ANY task done by humans with superhuman level performance. Enter, Artificial Neural Networks.

Artificial Neural Networks: How does it work?

I feel it is important to note that the following description is an idealized version of actual technology or biology in question. Idealization allows us to grasp key ideas behind a concept while exceptions to the key ideas are ignored in order for easier understanding to being with.

To start, we need a general understanding of how the human brain and neurons work. This is important because Artificial Neural Nets are built based on key principals of the neurons in the human brain.

For simplicity, we will look at a basic form of cortical neuron (neurons found in Neo Cortex part of the human brain) and a basic type of artificial neural network called Feed-forward Neural Networks.

Neurons Connected to each other and flows signs to other connected neurons. Photo: Denis Ezannino

In short, the brain is made up of cells called neurons that are connected to other neurons and can “fire”(change their electromagnetic charge) if the conditions are just right.

If a neuron fires/activates hard enough, some of the other neurons connected to it may or may not fire themselves, based on their own standards for firing. Each neuron can fire/activate at varied levels of intensity and can even change its standards for firing. Each neuron is sensitive to a certain pattern of external input/stimuli and therefore acts like a tiny pattern recognizer — looks for a certain combination from the external inputs and fires/activates when a matching combination of inputs is found. This way a neuron creates inputs for other neurons connected to it who in turn decides whether or not to fire/activate based on the aggregated inputs it received from its neighboring neurons. This is a gross simplification of how Neurons actually work but will help us understand a standard Feed-forward artificial neural network.

A Simple Feed Forward Neural Network- Image: CoderonCode

As you can see, there is an input layer, an output layer and a hidden layer in the middle. The circles represent nodes( like neurons) and activities on each layer of nodes is a nonlinear function of activities in the layer to the left of it.

With Multiple Hidden Layers: Deep Neural Networks

If a network has more than one hidden layer in the middle(typical), it is called a deep neural networks.

The nodes in the middle layers can adjust their weights during “training”.

These artificial neural networks can be trained using a data set with sample inputs and corresponding outputs as examples for the Network to learn from.

Based on this training data, each node in the middle adjusts its nonlinear function to ensure that all training inputs end up producing their corresponding training outputs after it is propagated through the whole network.

After the neural network is trained, if it is fed an input it has never seen before, it can predict/classify an output based on the training data set. In effect, just like neurons, these nodes are small pattern-recognisers looking for specific patterns to classify inputs to outputs based on the training data it “learned” from.

From a functional perspective, learning can be sorted into 3 piles.

1.Supervised Learning: Learns to predict an output based on an Input.

Includes 2 key functions

a. Classification: Output is a class label

Example: Classifies (Diagnose) a patient’s condition based on input variables( symptoms, bloodwork etc)

b. Regression: Output is a real number or vector of real numbers.

e.g. Price of crude oil in 3 months.

2. Unsupervised Learning: Sorts/classifies/clusters unstructured raw data by itself so this learning can be used for other supervised/reinforcement learning tasks.

3. Reinforcement Learning: Learns to make moves to maximize or minimize a variable.

Example: DeepMind used reinforcement learning to create AlphaGo who went on to beat the reigning champion of the board game Go.

Here’s how the Team behind the AlphaGo summarized their feat.

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of stateof-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

In the paper published by the AlphaGo team.

Reinforcement learning is exciting because of its inherent “self-play” property which mimics evolution — the only proven process that is known to have created an intelligent being.

We can think of our DNA as a reinforcement learning algorithm that evolution refined using small mutations, aka trial and errors, to maximize the chances of survival and reproduction. Reproduction ensures good code(DNA) is passed on for another set of trial and error process. Mutation ensures new changes in the code(DNA) is developed, tested in the real world and passed on to the next generation if it further optimizes the output. Note that these new mutations(changes) are random and often fail. But nature has scale and a lot of time to test things out. Something we humans never had until now.

Thanks to reinforcement learning techniques and ever decreasing computing costs, today we can create computing systems that can simulate properties of evolution to teach machines perform any complex task without the need of specific instructions. By tirelessly performing scores of trial and errors to maximize/minimize an output variable, these AI entities can learn and outperform almost ANY task done by humans.

This is a fundamental change in how we write software because now we can automate tasks we don’t fully understand.

There are tons of paid and opensource deep learning platforms out there. Almost all major tech companies have a paid deep learning platform. They are easy to handle and if used properly, can save us from having to write thousands of lines of code AND surpass human performance in a variety of complex tasks. Examples include Watson by IBM, Azure by MS, Cloud Machine Learning Services by Google etc. There are also plenty of opensource options with all the usual issues and blessings of any open source platform. Top opensource platforms include TensorFlow, Theano, Keras, Torch, Caffe2 etc. The abundance of platforms with fairly similar yet powerful functionalities shows how rapidly these advances are getting standardized for us to build on.

If you’d like explore ideas about applications you can build using deep learning tech, check out my essay -Stop Sitting On All That Data & Do Something With It.

Conclusion: The Orangutans on BBC

There is a really cool footage of Sir David Attenborough with an Orangutan on Youtube. The footage was filmed at an animal sanctuary where some Orangutans who spent their lives with humans were being sheltered.

In the footage you can see an old Orangutan interacting with some tools. There was a hammer, some nails, a saw, some pieces of wood etc. This Orangutan was captive born and had obviously seen humans “do things” with these tools. Now she had access to these tools to do what she pleased.

In the video the Orangutan is seen holding the hammer just like a human. She tries to use it by banging a nail that she had placed on top a piece of wood. In another scene she gets hold of a saw and uses it to try and cut a piece of wood.

It seemed that she had some sort of understanding of how the tools worked. Her curiosity while experimenting with the tools showed her appreciation for the technology. But she didn’t end up doing anything with the tools. She sort of experimented with enthusiasm and that was it.

But then in a difference scene she is seen using a tool at a completely different level. She is seen on a small boat in the middle of a pond, using her arms to paddle and expertly navigate the calm waters to get to the vegetation she craved. Perfectly in-tune with her tool — the boat — with her baby on her lap.

It seemed like the usefulness derived from the different tools was dependent on the existence of a purpose.

With the hammer and the saw it seemed like the Orangutan didn’t really have a purpose that drove her to use the tools. Although she was using the tools like humans do, she probably could not come up with a purpose for the tools. However, with the boat, her purpose — the need to float and scavenge on water with her offspring without getting wet — was clear.

I think the moral of the clip is that without purpose, technology is useless. And that Orangutans are majestic animals.

Here is the clip.

Note:

Additional recommended essays on machine learning/artificial intelligence from team Archie.AI