Pages

Monday, September 17, 2012

Cudafy Me: Part 4 of 4

These posts are meant to inspire you to enter into the world of graphics processor programming.

cube_edge1

Posts in This Series

Full Source Code

http://cudafytsp.codeplex.com/

Benchmarks

Computing an 11 city solution with 40 million permutations
on a Dell T7500 (Xeon X5690 @ 3.47GHz)
with an NVidia Tesla C2050:

  1. Single CPU: 47.340 seconds
  2. Multi threaded: 7.965 seconds
  3. GPGPU: 0.117 seconds
    Note: 12 cities - 480 million permutations – it took 2.206 seconds on the GPGPU

Computing an 11 city solution with 40 million permutations
on a 2009 HP Pavilion dv7 (i7 CPU Q 720 @ 1.60GHz)
with an NVidia GeForce GT 230M (48 CUDA cores):

  1. Single CPU: 162.535 seconds
  2. Multi threaded: 36.240 seconds
  3. GPGPU: 2.181 seconds

Computing a 10 city solution with 3.6 million permutations
on a Dell T3400 (Core 2 Duo E6850 @ 3.00GHz)
with an NVidia Quadro NVS 290:

  1. Single CPU: 5.707 seconds
  2. Multi threaded: 3.097 seconds
  3. GPGPU: 0.888 seconds

Nuances

There are a few things that took me a while to figure out that might be helpful to others.

  • I used VS2010 Pro. I heard you can use the express editions if you install both the C# and C++ versions. It seems Microsoft is making harder to find the VS2010 though.
  • You cannot use VS2012 yet, and probably never be able to use the express edition of VS2012 since I doubt express will support desktop applications.
  • It really is not that hard to add CL.EXE to your path. Just type “environment” into the Windows 7 start box that states “search programs and files” and pick the “Edits system and environment variables”.
  • If your video card is older, try this:
    CudafyTranslator.Cudafy(eArchitecture.sm_11);
  • If the GPGPU call takes more than a few seconds then the video driver resets. I think you can tweak the timeout somewhere.
  • Things go a bit wonky if you call functions in your Cudafied code that return void.
  • It is a bit faster to call the version of launch that takes the function to call as a string.
  • I think calling CopyFromDevice forces a call to Synchronize first.
  • You cannot allocate arrays in Cudafied code, so get used to calling AllocateShared for all threads.
  • Recursion is not supported in Cudafied code.

Can We Go Faster? Oh Yeah!

The GPGPU algorithm described in this series is an introductory primer meant to inspire you to enter into the world of graphics processor programming. For those who are looking for phonemically fast solutions to the traveling salesman problem, you might consider starting with the link from GPU Science below. The author Dr. Kamil Rocki of the University of Tokyo taunts us: "Anyone faster out there?"

Essential Resources

11 comments:

  1. John,

    Thanks for the article. I had fun yesterday downloading and runnign your example to test it out. One quick question: using 10 cities, I am getting times of about 6000 ms / 1500 ms / 50 ms, but the GPU compile and load is taking another 5-6 seconds. Is that normal, or have I missed something?

    Pieter

    ReplyDelete
  2. Pieter,
    I'm glad you were able to run the sample successfully. The start-up delay is normal. It is the result of Cudafy generating the cuda.cu code and then running the compiler.
    - John

    ReplyDelete
  3. Hi John,

    Thank you for the quick response. I decided to follow up a bit, as the overhead seemed high, and discovered that much of the delay seems to be due to the presence of CPU/MPU code in the TSP class. I experimented with running three separate sub-classes off of a shared abstrat class, and got the follwoing times, showing much improved compile/load times:

    Starting with no cache
    Permutations 3628800 with single class
    CpuTsp 2496 1904189 644.0571
    MpuTsp 726 1904189 644.0571
    GpuTsp 43 1904189 644.0571
    Load 5615

    Permutations 3628800 with separate classes
    CpuTsp 2392 1904189 644.0571
    MpuTsp 725 1904189 644.0571
    GpuTsp 34 1904189 644.0571
    Load 1064

    Starting with cache
    Permutations 3628800 with single class
    CpuTsp 2421 1904189 644.0571
    MpuTsp 717 1904189 644.0571
    GpuTsp 34 1904189 644.0571
    Load 7

    Permutations 3628800 with separate classes
    CpuTsp 2381 1904189 644.0571
    MpuTsp 716 1904189 644.0571
    GpuTsp 34 1904189 644.0571
    Load 2

    Done ...

    One surprise was that I could not leave the shared cudafy code in the abstract class, but had to copy-paste it into the GpuTsp class and leave it in AbstractTsp; otherwise Cudafy could not seemt o find it on compile.

    Pieter

    ReplyDelete
  4. Helpful hint re Nuance bullet 7: I found that this code:

    gpu.Launch(BlocksPerGrid, ThreadsPerBlock,
    ( (Action) GpuFindPathDistance ).Method.Name,
    _permutations, _cities, gpuLatitudes, gpuLongitudes, gpuAnswer);

    ran just as fast as your original call using the method-name string, and faster by about 10% than a direct type-safe call to the function.

    ReplyDelete
  5. Distance formula!

    Shouldn't the formula for distance be this:
    distance += (float)Math.Sqrt(Square(latitude - previousLatitude)
    +
    Square(longitude - previousLongitude));
    with a + instead of a * between the two terms?

    ;-)

    Pieter

    ReplyDelete
    Replies
    1. Many years ago I learned to use whitespace to highlight terms separately from factors, for just this reason. I only spotted this when I re-formatted the routine to track something else down.

      Delete
    2. Thank you again. I will update the code as time permits.

      Delete
  6. John,

    I referenced your article here on Code-Plex (http://cudafytuningtutorial.codeplex.com/) while documenting the results of playing around with CUDAfy. If you have an opportunity to peruse it, any comments or suggestiosn would be appreciated.

    Pieter

    ReplyDelete
  7. I downloaded your source, and loaded it in VS2010 C# Express, but when I try to compile and run it tells me:

    Error 1 The type or namespace name 'Cudafy' could not be found (are you missing a using directive or an assembly reference?) C:\Users\TT\Desktop\CudafyTsp\CudafyTsp\Program.cs 5 7 CudafyTsp


    Along with many other errors complaining about namespace blah could not be found.

    Any ideas?

    ReplyDelete
    Replies
    1. Not sure. Did you install Cudafy first? Did any of the Cudafy examples work?

      Delete

Note: Only a member of this blog may post a comment.