Over the weekend I started looking into voice recognition software and how it was currently implemented by Microsoft. Most of this inspiration was based on avoiding my current depthbuffer Avatar issue with my developing xna game (trying to prevent burnout). The rest of this inspiration is based on Jarvis in the new Iron Man movie. Curious to see how feasible it would be setup a system for recognizing and dispatching commands. I also thought it would be a good way to get caught up with the WCF capabilities.
I started buying a $20 USB Logictech Microphone. This would be in future useful for Skype purposes anyways. Downloaded the recent Microsoft Speech SAPI5. I sat on moving forward and learning whats in these assemblies and perusing code examples. This was kind of a mistake. I instead started from the service side in WCF, my idea was to expose the WCF as a web service thus allowing xml requests from several sources/clients. I can have one particular client make a voice to xml request whereas another client can have a web interface to send an xml request to this service This was a good idea in point however until I discovered the voice recognition and how the functionality worked I had to modify the request and make subtle design changes as well.
My idea for the WCF service was to be able to inspect an incoming request and verify it again against every single rule involved. The rule most likely to be called would be returned and then subsequently invoked its execute command. The rules are very simple and basically just verify a certain type of phrase or semantics in the request and then dispatch the rule which would actually call a library to do something. The libraries are the part which will be invoked by the rule and would the major groundwork. Reason for this is, I intend these rules to do a whole lot of different things so I need the library to be simple to ensure scalability with ease. I also need a central point of intelligence to control this verification of the rules. So I went with a singleton object.
This how WCF is correctly listening…
[ServiceBehavior(InstanceContextMode=InstanceContextMode.Single)]
public class CommunicateAIVE : ICommunicateAIVE
{
public string GetData(string value)
{
ICommandEngine engine = CommandEngine.Instance;
CommandEngineRuleReturn result = engine.FindRequestType(value);
return result.Response;
}
...
This is how I’m currently handling the Command Engine… I’ll probably should make it thread safe.
public class CommandEngine : ICommandEngine
{
static List ruleSet = new List();
static CommandEngine()
{
}
private static CommandEngine _instance;
public static ICommandEngine Instance
{
get
{
if (_instance == null)
{
_instance = new CommandEngine();
InitializeRuleSet();
_instance.InitializeStateInformation();
_instance.InitializeGrammar();
}
return _instance;
}
}
InitializeRuleSet is where I actually load in all the rule based on loading a dll and invoking all the ctor’s via reflection. With this approach I thought it would be cool to branch off and have these rules/library in their own solution and use them as a support library.
private static void InitializeRuleSet()
{
Assembly assembly = Assembly.LoadFile(@"C:\dev\Service\ProjectAIVE\SupportLibraries\CommandEngineRulesLibrary.dll");
Type[] types = assembly.GetInterfacedTypes("ICommandEngineRule");
object[] constructorParameters = { CommandEngine.Instance };
foreach (Type toConstructType in types)
{
//avoid abstract classes
if (!toConstructType.IsAbstract)
{
ConstructorInfo[] ci = toConstructType.GetConstructors();
//find default constructor
foreach (ConstructorInfo c in ci)
{
if (c.GetParameters().Length == 1)
{
ruleSet.Add((ICommandEngineRule)c.Invoke(constructorParameters));
break;
}
}
}
}
}
Also here’s a sample command rule which doesn’t use a library since the functionality just returns something in the response.
public class SayHelloRule : BaseCommandRule
{
public SayHelloRule(ICommandEngine commandEngine) : base(commandEngine) { }
public override CommandEngineRuleReturn PerformActualResponseQuery(string request)
{
if (request.ToUpper().IndexOf("HELLO") != -1)
{
this.CommandEngineRuleReturn.Heuristic = 100;
this.CommandEngineRuleReturn.CommandEngineRuleReturnType = CommandEngineRuleReturnType.RequestSufficient;
this.CommandEngineRuleReturn.Response = string.Format("Hello {0}", this.CommandEngine.StateInfo["CommanderChief"]);
}
return base.PerformActualResponseQuery(request);
}
public override void ExecuteCommand(string request)
{
base.ExecuteCommand(request);
}
}
Based on every request these rules will be traversed and executed based on the following code.
public CommandEngineRuleReturn FindRequestType(string request)
{
foreach (ICommandEngineRule rule in ruleSet)
{
rule.GetResponseInfo(request);
}
ruleSet.Sort();
ICommandEngineRule bestRule = ruleSet.GetFirstElement();
ExecuteBestRuleCommand(bestRule, request);
switch (bestRule.CommandEngineRuleReturn.CommandEngineRuleReturnType)
{
case CommandEngineRuleReturnType.RequestCausedError:
break;
case CommandEngineRuleReturnType.RequestInsufficient:
this.IncompleteRequest = string.Format("{0}!{1}", this.IncompleteRequest, request);
break;
case CommandEngineRuleReturnType.RequestSufficient:
CommandEngineMemoryStruct newMemory = new CommandEngineMemoryStruct();
newMemory.Request = request;
newMemory.Response = bestRule.CommandEngineRuleReturn.Response;
this.Memory.Add(newMemory);
break;
}
I realize now I haven’t yet talked about anything for voice recognition however the post is getting long. But there is one other thing I thought I should mention, so I’m making these a multiple part series. I’ll to the VR in the next part.
So anyways, I’m working on these unit tests when I’m coming up with extension methods and creating new classes and it occurs to me that I might have a better way to verify these request/response from big picture or from the user point of view. Say for instance you execute a dozen voice commands correctly, the application is working like a dream. You then get the idea to add more functionality you do so and it works, all of a sudden certain commands are no longer dispatching correctly? What happened? After scrambling clearly one of your command rules you added recently are picking up the request instead of the old one you intended. Most of your unit tests are written to catch coverage within the class and not based on integration. Also why not based the tests are real “production” data. So I thought to myself, create a Feedback and a FeedbackRegression. Feedback rule would check to see if the user politely said Thanks or Thank you to the WCF Voice Recognition (need a better name). This would take the last request and the response which is now assume to be correct. We would then log this into a data source. The FeedbackRegression would said the logged request/response from the data source and run it through the Command Engine as if it came in from the client and validate that the response is correct.
Thought that was a cool automated integration test, additionally kept the scope I was currently at with regular unit tests.
Next Part creating Voice Client using Speech SAPI5…
For info on WCF there are tons of information on it at Channel9.msdn.com.