Coder Social home page Coder Social logo

Comments (15)

NathanSweet avatar NathanSweet commented on May 17, 2024

First, are you CPU or GPU bound? Most mobile devices are fillrate limited, so the number of pixels (size of the images) you are drawing makes all the difference. Not transparent pixels still count against you. If you need to use less CPU, fewer bones will help but you won't know to what degree until you identify the CPU hotspots more accurately. tk2dSpineSkeleton#Update mostly calls other methods if you are using the latest source. Does the Unity profiler allow you to dig any deeper?

from spine-runtimes.

hourglasseye avatar hourglasseye commented on May 17, 2024

CPU or GPU Bound?

Okay. I enabled the "iPhone Unity internal profiler" to determine how CPU or GPU bound the stress test is. I am working with a 5th Gen iPod Touch. Still 100 skeletons, 46 bones each, the same as the one I sent you before (the obfuscated dragon) except scaled down to 1/4th of its size to minimize overdraw. None of the instances are animating.

Here are partial results I got (last 10 profiler updates): http://gonzogamesdev.com/internal_profiler_ipodtouch5.txt

I am looking at cpu-ogles-drv entires to determine how GPU-bound the stress test is. Here is the summary of cpu-ogles-drv entries (numbers are in milliseconds):

cpu-ogles-drv> min:  2.4   max:  3.2   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  3.0   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  3.5   avg:  2.6
cpu-ogles-drv> min:  2.4   max:  2.8   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  2.8   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  2.7   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  3.7   avg:  2.6
cpu-ogles-drv> min:  2.4   max:  3.0   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  3.2   avg:  2.5
cpu-ogles-drv> min:  2.4   max:  3.0   avg:  2.5

Based on Unity docs, values between 2 to 3 are okay. Anything more than 3 would mean that the simulation is GPU bound (docs here: https://docs.unity3d.com/Documentation/Manual/iphone-InternalProfiler.html). Based on the averages, it seems safe to say that we are not very GPU bound.

If I understand correctly, the stress test is quite CPU bound, with cpu-player values consuming most of the frame time:

cpu-player>    min: 264.1   max: 290.3   avg: 278.1
frametime>     min: 282.9   max: 312.0   avg: 300.0
----------------------------------------
cpu-player>    min: 261.1   max: 294.3   avg: 276.5
frametime>     min: 280.5   max: 317.4   avg: 298.8
----------------------------------------
cpu-player>    min: 258.2   max: 300.4   avg: 283.3
frametime>     min: 277.1   max: 320.4   avg: 303.7
----------------------------------------
cpu-player>    min: 272.0   max: 302.4   avg: 286.9
frametime>     min: 292.6   max: 323.2   avg: 307.7
----------------------------------------
cpu-player>    min: 271.5   max: 300.1   avg: 287.3
frametime>     min: 290.3   max: 320.4   avg: 307.9
----------------------------------------
cpu-player>    min: 258.7   max: 305.3   avg: 286.0
frametime>     min: 278.5   max: 336.7   avg: 307.1
----------------------------------------
cpu-player>    min: 245.6   max: 320.6   avg: 280.5
frametime>     min: 264.6   max: 339.9   avg: 301.6
----------------------------------------
cpu-player>    min: 256.2   max: 297.4   avg: 279.1
frametime>     min: 274.7   max: 318.0   avg: 298.8
----------------------------------------
cpu-player>    min: 268.9   max: 304.0   avg: 281.4
frametime>     min: 287.2   max: 323.5   avg: 301.2
----------------------------------------
cpu-player>    min: 259.6   max: 290.4   avg: 278.5
frametime>     min: 280.2   max: 310.6   avg: 297.7

Deep Profiling
Now with regards to deeper profiling, Unity does have this feature called "Deep Profiling" but it does not seem to work on mobile devices given the strain it puts on simulations (it records all function/method calls). I ran a deep profiling test on desktop to get a good idea of which functions consume the most milliseconds:
screen shot 2013-07-31 at 5 10 07 pm
I managed to isolate the following functions:

tk2dSpineSkeleton.Update() - 0.2%
    Skeleton.UpdateWorldTransform() - 2.5%
        Bone.UpdateWorldTransform() - 28.1%
    tk2dSpineSkeleton.UpdateCache() - 2.6%
    tk2dSpineSkeleton.UpdateMesh() - 21.3%
        RegionAttachment.ComputeVertices() - 4.6%
    tk2dSpineSkeleton.UpdateEditorGizmo() - 12.3%
        Vector3.Max() - 9.0%
        Vector3.Min() - 9.0%

Because deep profiling is unavailable on mobile, I made use of Profiler.BeginSample and Profiler.EndSample to perform "deep profiling" manually on the functions I isolated. I got the following results:
screen shot 2013-07-31 at 6 06 14 pm

Based on this, the major culprits seem to be:

tk2dSpineSkeleton.UpdateMesh
RegionAttachment.ComputeVertices
Skeleton.UpdateWorldTransform
Bone.UpdateWorldTransform

I am looking at these functions in isolation, and all I can think of at the moment is micro-optimizations. Is it possible to somehow improve the runtime's performance based on the information provided here?

Suggestion: Dirty-flag Checking
One high-level optimization I would like to suggest is to not run tk2dSpineSkeleton.UpdateMesh and Skeleton.UpdateWorldTransform functions when no changes to the skeleton are being made. Maybe dirty-flag checking is possible? All of these numbers I showed here are for a simulation with non-animated skeletons. The skeletons are just sitting there, but lots of operations are running on each skeleton even though no data has changed.
However, once these skeletons start animating, the runtime might end up just running the updates all the time anyway, so I am not sure if this is a good investment.
Nonetheless, to test if this will improve things, I attached this component to each skeleton instance:

public class SpineDisabler : MonoBehaviour {

    IEnumerator DisableSpine() {
        yield return new WaitForSeconds(0);
        GetComponent<tk2dSpineSkeleton>().enabled = false;
    }

    private void Start() {
        StartCoroutine(DisableSpine());
    }
}

What this does is to disable the skeleton code once the skeleton has been set-up. This improves fps significantly:

iPhone Unity internal profiler stats:
cpu-player>    min: 17.6   max: 19.4   avg: 18.5
cpu-ogles-drv> min:  2.4   max:  2.8   avg:  2.5
cpu-present>   min:  0.7   max:  3.3   avg:  1.1
frametime>     min: 29.2   max: 36.4   avg: 33.3
draw-call #>   min:   6    max:   6    avg:   6     | batched:   100
tris #>        min: 10312  max: 10312  avg: 10312   | batched:  9200
verts #>       min: 20624  max: 20624  avg: 20624   | batched: 18400
player-detail> physx:  0.3 animation:  0.0 culling  0.0 skinning:  0.0 batching:  4.1 render: 13.2 fixed-update-count: 1 .. 2
mono-scripts>  update:  0.3   fixedUpdate:  0.0 coroutines:  0.0 
mono-memory>   used heap: 2093056 allocated heap: 2502656  max number of collections: 30 collection total duration: 234.7

From 200+ms per frame to ~20ms per frame. 20ms still seems pretty large, and it seems that the slowdown is caused by this FPS Graph plugin we're showing. I took it out and here is the result:

cpu-player>    min:  4.9   max:  6.1   avg:  5.3
cpu-ogles-drv> min:  2.3   max:  2.6   avg:  2.4
cpu-present>   min:  0.7   max:  3.3   avg:  1.1
frametime>     min: 30.7   max: 36.0   avg: 33.3
draw-call #>   min:   3    max:   3    avg:   3     | batched:   100
tris #>        min:  9204  max:  9204  avg:  9204   | batched:  9200
verts #>       min: 18408  max: 18408  avg: 18408   | batched: 18400
player-detail> physx:  0.3 animation:  0.0 culling  0.0 skinning:  0.0 batching:  4.2 render: -0.0 fixed-update-count: 1 .. 2
mono-scripts>  update:  0.2   fixedUpdate:  0.0 coroutines:  0.0 
mono-memory>   used heap: 2105344 allocated heap: 2502656  max number of collections: 0 collection total duration:  0.0

Maybe the problem stems from the runtime trying to update 18k vertices every frame? Is there any way around that?

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

Thanks for digging into it further!

Since most or all skeletons will be animated in a real app, a dirty flag would just add a little more overhead. If you had skeletons that aren't animated you could avoid calling Skeleton/Bone#updateWorldTransform. We could probably do this for SkeletonAnimation, but I doubt it will help real apps. Yould could even write a class that avoids calling computeVertices if you really had a use case where your skeletons don't animate.

You are drawing about 18k vertices, which is about 4600 images. That is a lot of images for a mobile game. There is no way around computing the vertices each frame, as it is expected they change each frame.

You probably have about 4600 bones and it is the same story there, we have to compute the world transforms as it is expected the local transforms change each frame.

There may be some micro optimizations we can do, but I doubt we will find a large chunk of work that can be avoided entirely.

from spine-runtimes.

hourglasseye avatar hourglasseye commented on May 17, 2024

We really are going to have to cut back on bones. One really sad case is that the HTC Inspire (same as HTC Desire HD) can only run 12 skeletons to stay within 30 fps. If we cut the number of bones in half, we'll still just have ~24 skeletons.

Suggestion: COLLADA Export?

Seeing as there is no way around iterating through each vertex using C#, perhaps the best way to get Spine animations running faster is through Unity's built-in animation capability. To test this idea, I asked one of our artists for an animated 2D strip with 46 bones and 184 vertices. I can run 100 instances of it on a 5th Gen iPod Touch and 50 instances on the HTC Inspire while staying at 30fps.

I was going to suggest that the Spine Editor have a .dae (COLLADA) export option where the .dae file would contain information from both skeleton.json and atlas.json. This way, Unity can treat the Spine animation as an animated 3d model and run it through its (apparently) optimized animator.

However, I took a peek at the Spine Editor Trello and saw that there is still a lot in the backlog. I also see a new "events" feature in the works so that might not be translatable to the COLLADA format. There's also the possibility of not being able to use this along with 2D Toolkit. Is this idea intersting to you? If so, is there a place where I can submit Spine Editor-based suggestions? Either way, this isn't something I can expect to happen anytime soon I think.

Idea: COLLADA Converter?

An idea I have is that maybe I can write a converter that will consume skeleton.json and atlas.json to produce a .dae file. How often can I expect the format of skeleton.json and atlas.json files to change? Is this an approach you would endorse?

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

You could be fill rate limited on the HTC Inspire. Have you tried disabling rendering entirely while still updating the bone world transforms?

46 bones per character with so many characters is quite a bit.

If Collada is acceptable for Unity, I think it would be best to have something to convert from Spine's JSON format to Collada. The JSON format is intended to be consumed externally (eg, by all runtimes) and also imported back into Spine, so it is relatively stable. New features are usually additive and could be ignored by the convertor.

Every time you show more or fewer images than the last frame (causing the number of vertices to change) then spine-tk2d will reallocate some arrays in UpdateCache. This is pretty nasty and I've fixed this in spine-unity. If the number of vertices doesn't change, then this won't be affecting your benchmarks. I think with your dragon you are ok.

I made some changes just now, not sure how much they will impact your benchmarks:
43d572c
0a7c47f

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

Any chance those changes helped your benchmarks?

from spine-runtimes.

hourglasseye avatar hourglasseye commented on May 17, 2024

Oh wait, what. I think I clicked the "Commend and Close" button by mistake last night while drafting a reply. I scrapped the draft and probably clicked it thinking it was some cancel button while sleepy. My bad.

I got performance improvement on the 5th Gen iPod Touch. Milliseconds consumed were cut in half. Also, it does seem that the simulation was fillrate limited on the HTC Inspire. I will post stats in this space come Monday.

I have been looking around for an easy way to generate collada files and pycollada almost fit the bill. Turns out generating animation data is not supported.

I am also wondering if generating AnimationClips and using those to animate generated meshes would be an effective way to make use of Unity's built-in animation (as an alternative to generating collada files). That would be one AnimationClip per animation entry in skeleton.json, and one quad mesh per bone. Then maybe nest each quad based on the skeleton. I am unsure of how well this would work, but I am thinking of trying this soon. I am hoping that this way, we will only need to run code on each bone, rather than on each vertex (except maybe for UVs?). Will this approach work with Spine animations, or do you think I would hit a wall somewhere while implementing it?

Also, is there documentation on how to understand skeleton.json files? Like what each key in skeleton.json means?

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

Sounds like it helped, sweet! How did it improve the number of dragons you can have at once?

Spine's JSON format is documented here:
http://esotericsoftware.com/spine-json-format/

I'm not familiar with AnimationClip, but you'll need to support applying SRT transforms, changing vertex colors to tint images, and changing texture coordinates for switching attached images.

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

I just did a big commit for both Unity runtimes, spine-unity and spine-tk2d. The code for each is almost identical now (but duplicated in each project for simplicity). spine-unity gains support for using an atlas with multiple pages. spine-tk2d gains optimizations that spine-unity has had for a while. A lot of spine-tk2d code was replaced, so it may now perform differently in your benchmarks. The source is a lot nicer now.

I wonder if there are Unity configurations you can make to eek out more performance? Eg, uncheck "casts shadows" or whatever.

I was thinking, you could write a runtime for Unity that is based on spine-c. This would likely outperform C# on mobile.

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

Note one of the optimizations in the new code is "mesh.MarkDynamic();", which might give nice results since we update the mesh every frame.

from spine-runtimes.

hourglasseye avatar hourglasseye commented on May 17, 2024

Cool. Big performance boost while testing on the 5th gen iPod Touch. See results here: http://gonzogamesdev.com/stresstest_ipodtouch5th.txt

Same test, but with animated dragons instead of just static.

Frame time consumption dropped to ~51. In summary:

cpu-player>    min: 50.0   max: 54.9   avg: 52.1
cpu-player>    min: 50.6   max: 59.1   avg: 52.0
cpu-player>    min: 49.9   max: 52.9   avg: 51.3
cpu-player>    min: 49.9   max: 53.9   avg: 51.6
cpu-player>    min: 49.6   max: 53.6   avg: 51.7
cpu-player>    min: 48.9   max: 53.5   avg: 51.4
cpu-player>    min: 49.5   max: 53.8   avg: 51.8
cpu-player>    min: 49.2   max: 53.8   avg: 51.7
cpu-player>    min: 49.4   max: 59.0   avg: 52.2
cpu-player>    min: 49.9   max: 55.2   avg: 51.5

I attempted to remove all default-implementation properties (getters and setters), then replaced them with public variables out of curiosity. It shaved off ~4 ms:

cpu-player>    min: 44.8   max: 49.3   avg: 47.1
cpu-player>    min: 45.2   max: 50.3   avg: 47.0
cpu-player>    min: 45.1   max: 54.0   avg: 47.4
cpu-player>    min: 45.3   max: 51.7   avg: 47.1
cpu-player>    min: 45.4   max: 49.6   avg: 47.2
cpu-player>    min: 45.5   max: 48.7   avg: 46.9
cpu-player>    min: 45.2   max: 48.9   avg: 46.9
cpu-player>    min: 45.4   max: 49.1   avg: 47.3
cpu-player>    min: 44.8   max: 49.5   avg: 46.7
cpu-player>    min: 45.8   max: 50.9   avg: 47.2

I read somewhere that using properties seems to be best practice, however, so I am not sure if sacrificing the flexibility they offer is worth the optimization.

I am following up on a skeleton with fewer bones to test with. Hopefully, our artists can halve the number of bones at least. I will be back with updated stats once they hand the new skeleton in.

Thanks for the awesome update 😄

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

Great to hear! :) Too bad about the properties. I vaguely remember reading something that they could be optimized to be just as fast as fields, but it seems not.

from spine-runtimes.

pharan avatar pharan commented on May 17, 2024

Yeah, I read something like that too on MSDN and elsewhere; that auto-implemented properties are supposed to get inlined by the compiler: http://msdn.microsoft.com/library/ms973852.aspx (see Properties)

Sad that it works out that way though. But I don't think it's any loss of flexibility or stability (or awfully difficult to do) if the client decides to turn auto-implemented properties into fields in the spine-csharp classes. But, personally, I'd leave the properties as is on the official thing.

EDIT: oh. according to that table in the article, for properties vs fields: setting is equal. getting is slower.

from spine-runtimes.

NathanSweet avatar NathanSweet commented on May 17, 2024

I've just committed not using properties internally. Yay micro optimizations!

from spine-runtimes.

hourglasseye avatar hourglasseye commented on May 17, 2024

Closing this for now. We are doing okay with the current runtime version.

from spine-runtimes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.