You could update only a fraction of the skeletons each frame. Eg if running at 60fps, if you update half the skeletons every other frame your skeletons will move at 30fps for half the Animation update
CPU cost.
You could have multiple rendered skeletons use the same pose by sharing skeletons. Eg, if you have 50 skeletons to draw you might use only 10 actual skeleton instances. This means 10 skeletons have the same pose, but this may not be noticeable when mixed into a large group. In this example both the Animation update
and Skeleton updateWorldTransform
CPU cost is reduced by 80% (1 - 10 / 50).
Other ways to reduce the Animation update
time is to use fewer timelines. Some animators tend to key everything which can mean significantly more timelines. Also Animation uses a binary search, you could try a linear search which may perform better depending on your animations.
Using multiple threads to apply animations is possible. The animations are stateless, so the threads can share them. The skeleton instances are not, so each thread would get a number of skeleton instances to apply animations to. However, I would try other optimizations first.
You could apply an animation and record the values set by each timeline for each frame. You can see BonePlotting.java for a rough idea of how to use the API to do this. I would probably have the code look at the animation timelines so you record only the keyed values. Or maybe you know what all the keyed values are and can just store those for each frame.