I have to say, I'm more impressed with the voice recognition than anything else. I actually didn't know the Kinect could do it that well. It'll probably help sidestep some of the problems with menu navigation in the finished product, though you'll probably have to come up with a more versatile means eventually. Do you plan on somehow standardizing the recognition sample for distribution, or have each user train it on his/her own voice?
Also, while I have good thoughts about most of the interface so far, I'm still not sold on the walking part. It'll be hard enough standing for a decent length of time while flailing your arms like a madman, but the way of moving forward and turning just looks unnatural to me. That's the sort of thing I think is still best left to a controller. The bow and third-person camera movements you made seemed a bit strange to me too, though that might just be incidental, and if it isn't, it can be improved.
Will there be a way to choose how much of the system we use for what? For example, will the user be able to use a USB controller for most actions, but the Kinect for shouts and magic?