The audio recognizer is used to analyze the audio where frequency, speech to text, text to speech can be recognized. Apple provided few frameworks which can be used to detect speech and it is quite useful in many places. Speech recognition is useful for converting real-time audio to detect and understand what other people try to say or convey. It will detect text, pitch, frequency, amplitude of the audio. When a developer of the iPhone App Development Company will implement audio transmission then he needs to use some inbuilt framework like an audio kit, speech where it will recognize and populate data.
To Set up audio recognizer developer need to follow many steps:
1. The developer needs to add some privacy-related things in info.plist for audio permission where the developer needs to be granted
- NSSpeechRecognitionUsageDescription
- NSMicrophoneUsageDescription
2. Developer needs to import all below framework for Speech recognizer along with frequency, amplitude and converting into text
- AudioKit
- Foundation
- Speech
- AudioToolbox
- Accelerate
- MediaPlayer
- Foundation
- OpenGLES
3. The developer needs to create controller, class, and functions that can detect the speech.
Step 1:
The developer needs to create a Controller to Detect Speech and import all necessary libraries and files need to create there.
class ViewController: UIViewController, SFSpeechRecognizerDelegate {
Step 2:
Define all local and global variables along with objects
let speechRecognizer: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale.init(identifier:”en-uk”))
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
var recognitionTask: SFSpeechRecognitionTask?
let audioEngine = AVAudioEngine()
Step 3:
Define the audio object where the developer needs to implement the permission from the app side and record the audio. If the user did not provided permission to prompt dialog then it cannot be recorded and captured to analyze the audio.
override func viewDidLoad() {
super.viewDidLoad()
speechRecognizer?.delegate = self
SFSpeechRecognizer.requestAuthorization{ status in
var audiobtnVariable = false
switch status {
case .authorized:
audiobtnVariable= true
print(“Permission received”)
case .denied:
audiobtnVariable = false
print(“Premission not granted by the user”)
case .notDetermined:
audiobtnVariable = false
print(“Speech is not recognized “)
case .restricted:
audiobtnVariable = false
print(“Speech not supporting in this particular device”)
}
DispatchQueue.main.async {
self.speechRecognitionButton.isEnabled = buttonState
}
}
self.speechRecognitionLabel.frame.size.width = view.bounds.width – 64
}
Step 4:
SFSpeech has a lot of capability where it is used to record audio, audio recognition, audio buffering, allocate speeches and developer can use preload audio files also where it can be recognized. SFSpeech can be canceled, pause, stop and resume the current activities to capture the audio session and parse to analyze. SFSpeech can transform internally and analyze the audio to converting into text, capture frequency, and amplitude also.
func startSpeechRecording() {
if audiorecognitionTask != nil { //used to track progress of a transcription or cancel it
recognitionTask?.cancel()
audiorecognitionTask = nil
}
let audioRecordSession= AVAudioSession.sharedInstance()
do {
try audioRecordSession.setCategory(AVAudioSession.Category(rawValue:
convertFromAVAudioSessionCategory(AVAudioSession.Category.record)), mode: .default)
try audioRecordSession.setMode(AVAudioSession.Mode.measurement)
try audioRecordSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print(“Failed to setup audio session”)
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest() //read from buffer
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError(“Could not create request instance”)
}
recognitionRequest.shouldReportPartialResults = true
audiorecognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) {
res, err in
var isLast = false
if res != nil { //res contains transcription of a chunk of audio, corresponding to a single word usually
isLast = (res?.isFinal)!
}
if err != nil || isLast {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.audiorecognitionTask = nil
self.speechRecognitionButton.isEnabled = true
let bestStr = res?.bestTranscription.formattedString
var inDict = self.speechDict.contains { $0.key == bestStr}
if inDict {
self.speechRecognitionLabel.text = bestStr
self.audioinput= self.speechDict[bestStr!]!
}
else {
self.speechRecognitionLabel.text = “can’t find it in the dictionary”
}
}
}
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) {
buffer, _ in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print(“Can’t start the engine”)
}
}
Step 5:
The developer needs to create a button action to perform audio recording and perform an action to analyze the recording.
@IBAction func speechRecognitionButtonClicked(_ sender: Any) {
if audioEngine.isRunning {
audioEngine.stop()
recognitionRequest?.endAudio()
speechRecognitionButton.isEnabled = false
self.speechRecognitionButton.setTitle(“Record”, for: .normal)
} else {
startSpeechRecording()
speechRecognitionButton.setTitle(“Stop”, for: .normal)
}
}
Audio Frequency, Amplitude and Pitch Detector
Developer can utilize the audio kit; there are many property, class and object to be used to detect frequency and amplitude for audio recognizer. Developer need to create object for microphone and capture the audio session. It will capture the audio, detect frequency, pitch signal with lower and upper bound.
//Creation of microphone object
let microphoneObject = AKMicrophone()
//Creation and Detection of frequency and amplitude from the
let frequencyDetetctor = AKFrequencyfrequencyDetetctor()
//Finding the lower and upper bound of frequency mixer
let mixerNodeDetector = AKBooster()
AudioKit.output = microphoneObject
AudioKit.start()
// Developer need to load inside viewDidLoad
viewDidLoad (){
frequencyDetetctor=AKFrequencyTracker.init(mic,minimumFrequency:200, maximumFrequency: 2000)
mixerNodeDetector = AKBooster(frequencyDetetctor, gain: 0)
findfrequency()
}
func findfrequency() {
if frequencyDetetctor.amplitude > 0.1 {
frequencyLabel.text = String(format: “%0.1f”, frequencyDetetctor.frequency)
var frequency = Float(frequencyDetetctor.frequency)
while (frequency > Float(noteFrequencies[noteFrequencies.count-1])) {
frequency = frequency / 2.0
}
while (frequency < Float(noteFrequencies[0])) {
frequency = frequency * 2.0
}
var minDistance: Float = 10000.0
var index = 0
for i in 0..<noteFrequencies.count {
let distance = fabsf(Float(noteFrequencies[i]) – frequency)
if (distance < minDistance){
index = i
minDistance = distance
}
}
let octave = Int(log2f(Float(frequencyDetetctor.frequency) / frequency))
frequencyDetector.text = “\(noteNamesWithSharps[index])\(octave)”
amplitudeDetector.text = “\(noteNamesWithFlats[index])\(octave)”
}
amplitudeLabel.text = String(format: “%0.2f”, frequencyDetetctor.amplitude)
}