Instead of relying on specialized APIs, the system uses screenshots for visual input and virtual mouse and keyboard actions to complete tasks.