The original Kinect camera has a resolution of 320 x 240 at 30fps. I think this can be increased to at least 8 megapixels at 30 fps. This means HD video-chatting.
Also the new Kinect has lips detection so it can process the voice-commands much better. I think the new Kinect can go a step further and integrate facial feature recognition such as eye-brows, eye-shape, mouth, nose, etc. so it can detect sighing, pouting, anger, smiling, etc. The new Kinect also has detection for voice tone, which I think is great. Hopefully the voice recognition will be more accurate because of this. Also perhaps they can use another keyword other than "Xbox" for the voice commands. Perhaps something like, "X-Command ... PLAY MOVIE".
Also I think it should record the room in 3D to understand the room a lot better - including bouncing sounds around the room which it does already.