Audio Recognizer with Pitch, Frequency and Types

November 29, 2019

1729

The audio recognizer is used to analyze the audio where frequency, speech to text, text to speech can be recognized. Apple provided few frameworks which can be used to detect speech and it is quite useful in many places. Speech recognition is useful for converting real-time audio to detect and understand what other people try to say or convey. It will detect text, pitch, frequency, amplitude of the audio. When a developer of the iPhone App Development Company will implement audio transmission then he needs to use some inbuilt framework like an audio kit, speech where it will recognize and populate data.

To Set up audio recognizer developer need to follow many steps:

1. The developer needs to add some privacy-related things in info.plist for audio permission where the developer needs to be granted

NSSpeechRecognitionUsageDescription
NSMicrophoneUsageDescription

2. Developer needs to import all below framework for Speech recognizer along with frequency, amplitude and converting into text

AudioKit
Foundation
Speech
AudioToolbox
Accelerate
MediaPlayer
Foundation
OpenGLES

3. The developer needs to create controller, class, and functions that can detect the speech.

Step 1:

The developer needs to create a Controller to Detect Speech and import all necessary libraries and files need to create there.

class ViewController: UIViewController, SFSpeechRecognizerDelegate {

Step 2:

Define all local and global variables along with objects

let speechRecognizer: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale.init(identifier:”en-uk”))

var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

var recognitionTask: SFSpeechRecognitionTask?

let audioEngine = AVAudioEngine()

Step 3:

Define the audio object where the developer needs to implement the permission from the app side and record the audio. If the user did not provided permission to prompt dialog then it cannot be recorded and captured to analyze the audio.

override func viewDidLoad() {

super.viewDidLoad()

speechRecognizer?.delegate = self

SFSpeechRecognizer.requestAuthorization{ status in

var audiobtnVariable = false

switch status {

case .authorized:

audiobtnVariable= true

print(“Permission received”)

case .denied:

audiobtnVariable = false

print(“Premission not granted by the user”)

case .notDetermined:

audiobtnVariable = false

print(“Speech is not recognized “)

case .restricted:

audiobtnVariable = false

print(“Speech not supporting in this particular device”)

}

DispatchQueue.main.async {

self.speechRecognitionButton.isEnabled = buttonState

}

self.speechRecognitionLabel.frame.size.width = view.bounds.width – 64

}

Step 4:

SFSpeech has a lot of capability where it is used to record audio, audio recognition, audio buffering, allocate speeches and developer can use preload audio files also where it can be recognized. SFSpeech can be canceled, pause, stop and resume the current activities to capture the audio session and parse to analyze. SFSpeech can transform internally and analyze the audio to converting into text, capture frequency, and amplitude also.

func startSpeechRecording() {

if audiorecognitionTask != nil { //used to track progress of a transcription or cancel it

recognitionTask?.cancel()

audiorecognitionTask = nil

}

let audioRecordSession= AVAudioSession.sharedInstance()

do {

try audioRecordSession.setCategory(AVAudioSession.Category(rawValue:

convertFromAVAudioSessionCategory(AVAudioSession.Category.record)), mode: .default)

try audioRecordSession.setMode(AVAudioSession.Mode.measurement)

try audioRecordSession.setActive(true, options: .notifyOthersOnDeactivation)

} catch {

print(“Failed to setup audio session”)

}

recognitionRequest = SFSpeechAudioBufferRecognitionRequest() //read from buffer

let inputNode = audioEngine.inputNode

guard let recognitionRequest = recognitionRequest else {

fatalError(“Could not create request instance”)

}

recognitionRequest.shouldReportPartialResults = true

audiorecognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) {

res, err in

var isLast = false

if res != nil { //res contains transcription of a chunk of audio, corresponding to a single word usually

isLast = (res?.isFinal)!

}

if err != nil || isLast {

self.audioEngine.stop()

inputNode.removeTap(onBus: 0)

self.recognitionRequest = nil

self.audiorecognitionTask = nil

self.speechRecognitionButton.isEnabled = true

let bestStr = res?.bestTranscription.formattedString

var inDict = self.speechDict.contains { $0.key == bestStr}

if inDict {

self.speechRecognitionLabel.text = bestStr

self.audioinput= self.speechDict[bestStr!]!

}

else {

self.speechRecognitionLabel.text = “can’t find it in the dictionary”

}

let format = inputNode.outputFormat(forBus: 0)

inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) {

buffer, _ in

self.recognitionRequest?.append(buffer)

}

audioEngine.prepare()

do {

try audioEngine.start()

} catch {

print(“Can’t start the engine”)

}

Step 5:

The developer needs to create a button action to perform audio recording and perform an action to analyze the recording.

@IBAction func speechRecognitionButtonClicked(_ sender: Any) {

if audioEngine.isRunning {

audioEngine.stop()

recognitionRequest?.endAudio()

speechRecognitionButton.isEnabled = false

self.speechRecognitionButton.setTitle(“Record”, for: .normal)

} else {

startSpeechRecording()

speechRecognitionButton.setTitle(“Stop”, for: .normal)

}

Audio Frequency, Amplitude and Pitch Detector

Developer can utilize the audio kit; there are many property, class and object to be used to detect frequency and amplitude for audio recognizer. Developer need to create object for microphone and capture the audio session. It will capture the audio, detect frequency, pitch signal with lower and upper bound.

//Creation of microphone object

let microphoneObject = AKMicrophone()

//Creation and Detection of frequency and amplitude from the

let frequencyDetetctor = AKFrequencyfrequencyDetetctor()

//Finding the lower and upper bound of frequency mixer

let mixerNodeDetector = AKBooster()

AudioKit.output = microphoneObject

AudioKit.start()

// Developer need to load inside viewDidLoad

viewDidLoad (){

frequencyDetetctor=AKFrequencyTracker.init(mic,minimumFrequency:200, maximumFrequency: 2000)

mixerNodeDetector = AKBooster(frequencyDetetctor, gain: 0)

findfrequency()

}

func findfrequency() {

if frequencyDetetctor.amplitude > 0.1 {

frequencyLabel.text = String(format: “%0.1f”, frequencyDetetctor.frequency)

var frequency = Float(frequencyDetetctor.frequency)

while (frequency > Float(noteFrequencies[noteFrequencies.count-1])) {

frequency = frequency / 2.0

}

while (frequency < Float(noteFrequencies[0])) {

frequency = frequency * 2.0

}

var minDistance: Float = 10000.0

var index = 0

for i in 0..<noteFrequencies.count {

let distance = fabsf(Float(noteFrequencies[i]) – frequency)

if (distance < minDistance){

index = i

minDistance = distance

}

let octave = Int(log2f(Float(frequencyDetetctor.frequency) / frequency))

frequencyDetector.text = “\(noteNamesWithSharps[index])\(octave)”

amplitudeDetector.text = “\(noteNamesWithFlats[index])\(octave)”

}

amplitudeLabel.text = String(format: “%0.2f”, frequencyDetetctor.amplitude)

}

Audio Recognizer with Pitch, Frequency and Types

Audio Frequency, Amplitude and Pitch Detector

Prepare for A Successful Performance Evaluation As A Data Scientist in 2024

Elevate Your Talent Acquisition: Benefits of AI Hiring Software

The Ethical Use of Mobile Proxies: A Responsible User’s Guide

Most Popular

Essential Considerations for Selling your Business and why business valuation is important

Key Features of B2B Wholesale Platforms

Mickey Mouse Making Swiss Cheese: Did A Disney Cartoon Really Show It?

The Magical World of Cheese and Witchcraft

All Categories