A deeper look into the complex behaviors, collective dynamics, and reinforcement learning behind VICTOR.
Emergent properties arise when interactions between agents within a system display behaviours that are not predictable from the properties of individual agents [1].
A phenomena that occurs in biological networks, social systems, and other ecological dynamics. Emergent properties within complex systems emphasize the nonlinear self-organization of varying scaled interactions. An example of this would be the flocking behavior of birds and the interactions between schools of fish. In systems that are engineered, there can be beneficial or hazardous emergent properties.
The VICTOR was designed with the intent to exhbit emergent behaviors. The first design consideration was for the movement to be stochastic ensuring a random outcome with each vibration. This led to a lightweight design with the motors being in orthoganal positions to not favor any direction. Then implementing an adhesive factor into the robots allows them to have some cohesion creating larger groupings. This grouping would then be optimized by the reinforcement learning. Q-learning would allow for the robots to walk in their optimized paths incentivising emergent behviours with indepence.
A machine learning technique that trains an agent to interact with its environment, learning the best policy through trial and error [2].
Q-learning is an extension of reinforcement learning. It allows agents to learn the optimal action by experiencing the consequences of actions, without needing to map the environment [3]. An agent will take it's action space given it's state space and evaluate it's options with regard to it's reward or penalty. Once computing the optimal action per state, it is capable of making the optimal action no matter where in the environment it finds itself. It is however the responsibility of the observer to choose the policy, reward, and other hyperparameters to create the most optimal learning algorithm for the specific application.
The VICTOR will be using Q-learning to interact with others within an arena. VICTOR will use an accelerometer and IMU as it's state space. With each vibration, the VICTOR will know where it is within the environment. The VICTOR will then be rewarded for being linked with another VICTOR and punished if choosing to stay on the edge of the arena. Over time we will observe how this policy creates emergent behaviours.