Voice is captured and transformed in a binary format. This can be done with different levels of quality: higher the quality, closer you are to the actual sound of the speaker’s voice and more is the data required and therefore larger must be the bandwidth.