Reward or Ignore – are they the only options?

The saying ‘Reward the good and ignore the bad’ has a lot to answer for in how people view reward based training (and trainers).  Some people seem to think that reward based trainers will ignore all sorts of bad behaviour (such as barking, aggression, chewing inappropriate things, jumping up at people etc.) and will just wait patiently, for the dog to stop doing that behaviour and do something that can be rewarded. This just isn’t the case and the saying is grossly over simplified (have a look also at my blogs Proactive not Passive and  Changing Challenging Behaviour).

All too often, trainers just focus on using the 4 quadrants of Operant conditioning and forget about all the other ways that organisms learn and that can be a real handicap for a trainer. Yes, we know that rewarding appropriate behaviour with something the dog want, will lead to an increase in that behaviour (R+) and that with holding something the dog wants will reduce undesired behaviour (P-). We also know that positively punishing the dog (P+) by applying something that the dog will actively work to avoid, will reduce behaviour. We also know that removing something that the dog will work to avoid, will increase a desired behaviour (R-). What else do we need to know?DSC_2521

All too often, Classical Conditioning gets forgotten about (this does tend to go hand in hand a bit with Operant conditioning; it is difficult to split them completely). Classical conditioning deals with reflexes and conditioned emotional responses. A dog that is fearful of something needs to be classically conditioned to learn that the scary thing isn’t that scary.  So the most fabulous rewards appear in the presence of the scary thing. I’m sure if someone offered me enough chocolate, I could eventually learn not to be frightened of Earwigs (shudder).

The opposite side of the coin is sensitisation, where a dog becomes more and more worried by something. This is a natural trait and helps keep animals safe from predators, but also occurs in pets where they can be overwhelmed in their socialisation experiences and become worried by something or it can be a by-product of using positive punishment or flooding.

Another understated method is Premack or Grandma’s Law (eat your greens and you’ll get dessert). Basically, Premack makes a less probable behaviour become more probable. Let’s use Beau as an example. She is a ball obsessed spaniel that really finds it difficult to ignore a tennis ball, even if she already has one in her mouth. The behaviour that I want to make more likely is her letting go/leaving the ball (She would much rather hang onto the ball, so this is a low probability behaviour). I reward her leaving the ball, by letting her go and get the ball (this is a highly desirable behaviour as far as she is concerned). By using a highly desirable activity to reward a much less desirable behaviour (as far as the dog is concerned), we are gradually building a more reliable leave.

Beau Leave

The same principle can be used to increase the reliability of the recall. If your dog chases critters, then you can use that to help the recall away from critters. The chasing of critters is a highly desirable (to the dog) behaviour (high probability) and the recalling away from the critter is less desirable to the dog (low probability), but by allowing the dog to return to chasing after it has recalled, will make that recall from critters much stronger. Just bear in the mind that by the time your dog has called away from chasing the critter, it will have long gone by the time you send the dog back for it..they still get the fun of sniffing where it was.

How many other examples of Premack can you think of?

DSC_5368Then there is good old Habitutation. Basically, this just means being exposed to something and getting used to it. It should be a none (neutral) event really, with no positive or negative emotional responses. I used to live in a house next to a church with a chiming clock. When we first moved in, I heard that darn clock chime very quarter of an hour. It didn’t take long for that sound to become just background noise and I had to really listen for that clock chiming if I wanted to check the time. The same happens with people that live next to busy roads or next to a railway line.  Allowing a dog to explore an environment before asking them to work is a form of habituation (or acclimatisation). The more environments they are used to being in, the faster they will habituate to new ones.

Lara habituating to a new area

Socialising a puppy is basically habituation as we want the puppy to be used to every day things. It should be a neutral process or mildly positive (see Keep those experiences positive)

We could also talk about Flooding (sink or swim approach), but I really hope that no one uses this approach with dogs any more as it it not the most humane approach and just results in a dog shutting down through excessive stress and learned helplessness (if you can’t escape something, you just give in to the inevitable).

Extinction is another way for an organism to learn. A previously reinforced behaviour is no longer reinforced (rewarded) and gradually disappears. This often happens by accident, when the pet owner forgets to reward a desired behaviour and over time, the dog stops doing that behaviour and does something else that does earn them reinforcement. This often happens with behaviours such as recall and loose lead walking. Extinction, can result in a large amount of frustration. Just try not feeding a dog titbits from the table when it has had a long history of being fed titbits that way….you will see the frustration build and if you persist (many owners will give up), you will see an extinction burst and then the behaviour goes away. Take note though, you only have to reinforce that behaviour again and it will be back to full strength very quickly and this time, it will be harder to extinguish.

A better way to extinguish behaviour is to couple extinction with Differential reinforcement, where a different behaviour is reinforced and the undesired one extinguished. There are several approaches to using differential reinforcement: DRI, DRO, DRA and DRL

DRI – differential reinforcement of an incompatible behaviour. Your dog can’t jump up on someone is he is taught to sit. Sitting being incompatible with jumping up. Training your dog to go to its bed or to a mat when the doorbell rings is another form of DRI. I’m using DRI to teach Lara to leave me be whilst I am training another dog. In the video clip, she is being rewarded for staying on a platform while the other dog is working.

Lara DRI

You could also use DRI to teach a puppy not to nip, by reinforcing for them carrying a toy, for example.

DRO – is the differential reinforcement of another behaviour provided that the undesirable one doesn’t occur with in a defined, fixed period of time. So if our puppy doesn’t mouth us within 5 seconds of being stroked or played with, then they are rewarded, no matter what behaviour they are exhibiting. You do need to know how frequently the mouthing occurs.

DRA – differential reinforcement of alternative behaviour. This is useful when it is difficult to find a behaviour that is incompatible with the undesired one, so another behaviour is chosen that can be reinforced.

DRL actually refers to differential reinforcement of lower frequency. The aim is to decrease the frequency of the undesired behaviour, but not necessarily to remove it all together. It doesn’t tend to get used a lot in dog training. Some trainers have defined DRL as differential reinforcement of lower intensity.

We also have Insight learning, Latent learning, Social learning, Counter conditioning, Systematic desensitsation and Observational learning to consider.

Learning theory and positive dog training is so much more than just rewarding the good and ignoring the bad.