Yes; the voice recognition is done entirely by the Kinect, draining no cycles from the Xbox. It also have several microphones, coupled with some audio processing, so that it can isolate the speaker, compensating for background noise and room dynamics.
That is part of the whole idea of the just-you-in-the-middle-of-the-room-without-any-wiring concept, that they modelled the motion control component around.
There is some rather neat and well low-pricepoint-engineered stuff in that thing. Unfortunately they may have targeted it at an unsuitable market, although I can easily imagine a brainstorming meeting, where they struggled to come up with an application and ended up with games as the only one viable.
Too bad that even without the inevitable gimmicky impression, inherent latency issues somewhat nerfs it as a controller for anything fast-paced (at least until the users get well adjusted to having to anticipate the action earlier. EDIT: ...and see between finger with the delay between action and response and which may also require developers to give cues earlier and make them less subtle).
Next thing they will try (...or are trying, I suppose), is control of TV-sets and then; who knows - the current sort of setup will probably only work in a controlled environment, since I expect any other IR light source in the room should mess up the kinect's projected scatter of dots.
I'm still planning to get one, mind - probably one of the new ones, with close-up optics (EDIT: ...which should be able to do a fairly detailed 3D capture of, say, a face - instantly and cheaply).
Ignoring the pitiful attempts to market it as something cool and and trendy, under the hood it consists of some rather nice nerd-shinies, for which there are bedroom developers coming up with various applications; some simply silly excercises, some kind of useful (at least as a proof of concept prototype), such as spatial awareness for robotics and aid for the visually impaired, not to mention the whole motion-capture-on-a-shoestring-budget thing.