1. 程式人生 > >NVIDIA Nsight Eclipse Edition for Jetson TK1

NVIDIA Nsight Eclipse Edition for Jetson TK1

NVIDIA® Nsight™ Eclipse Edition is a full-featured, integrated development environment that lets you easily develop CUDA® applications for either your local (x86) system or a remote (x86 or ARM) target. In this post, I will walk you through the process of remote-developing CUDA applications for the NVIDIA 

Jetson TK1, an ARM-based development kit.

Nsight supports two remote development modes: cross-compilation and “synchronize projects” mode. Cross-compiling for ARM on your x86 host system requires that all of the ARM libraries with which you will link your application be present on your host system. In synchronize-projects mode, on the other hand, your source code is synchronized between host and target systems and compiled and linked directly on the remote target, which has the advantage that all your libraries get resolved on the target system and need not be present on the host. Neither of these remote development modes requires an NVIDIA GPU to be present in your host system.

Note: CUDA cross-compilation tools for ARM are available only in the Ubuntu 12.04 DEB package of the CUDA 6 Toolkit.  If your host system is running a Linux distribution other than Ubuntu 12.

CUDA toolkit setup

The first step involved in cross-compilation is installing the CUDA 6 Toolkit on your host system. To get started, let’s download the required Ubuntu 12.04 DEB package from the 

CUDA download page. Installation instructions can be found in the Getting Started Guide for Linux, but I will summarize them below for CUDA 6.

1. Enable armhf as a foreign architecture to get the cross-armhf packages installed:

$ sudo sh -c \ 'echo "foreign-architecture armhf" >> /etc/dpkg/dpkg.cfg.d/multiarch'
$ sudo apt-get update

2. Run dpkg to install and update the repo meta-data:

$ sudo dpkg  i cuda-repo-ubuntu1204_6.0-37_amd64.deb
$ sudo apt-get update

3. Install cuda cross and ARM GNU packages (these will be linked in future toolkit versions):

$ sudo apt-get install cuda-cross-armhf
$ sudo apt-get install g++-4.6-arm-linux-gnueabihf

4. OPTIONAL – if you also wish to do native x86 CUDA development and have an NVIDIA GPU in your host system then you can install the full toolchain and driver:

$ sudo apt-get install cuda

Reboot your system if you installed the driver so that NVIDIA driver gets loaded. Then update paths to the toolkit install location as follows:

$ export PATH=/usr/local/cuda/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

At the end of these steps you should see armv7-linux-gnueabihf and the optional x86_64_linux folder under /usr/local/cuda/targets/.

For your cross-development needs, Jetson TK1 comes prepopulated with Linux for Tegra (L4T), a modified Ubuntu (13.04 or higher) Linux distribution provided by NVIDIA. NVIDIA provides the board support package and a software stack that includes the CUDA Toolkit, OpenGL 4.4 drivers, and the NVIDIA VisionWorks™ Toolkit. You can download all of these, as well as examples and documentation, from the Jetson TK1 Support Page.

Importing Your First Jetson TK1 CUDA Sample into Nsight

With the CUDA Toolkit installed and the paths setup on the host system, launch Nsight by typing “nsight” (without the quotes) at the command line or by finding the Nsight icon in the Ubuntu dashboard. Once Nsight is loaded, navigate to File->New->CUDA C/C++ Project and import an existing CUDA sample to start the Project Creation wizard. For the project name, enter “boxfilter-arm” and select “Import CUDA Sample” in the project type and “CUDA Toolkit 6.0” in the toolchains. Next, choose the Boxfilter sample which can be found under the Imaging category. The remaining options in the wizard let you choose which GPU and CPU architectures to generate code for.  First, we will choose the GPU code that should be generated by the nvcc compiler.  Since Jetson TK1 includes an NVIDIA Kepler™ GPU, choose SM32 GPU binary code and SM30 PTX intermediate code. (The latter is so that any Kepler-class GPU can run this application.) The next page in the wizard lets you decide if you wish to do native x86 development or cross-compile for an ARM system. To cross compile for ARM, choose ARM architecture in the CPU architecture drop-down box.

nsight_arm_cross_compiler_selection

Building Your First Jetson TK1 Application from Nsight

CUDA samples are generic code samples that can be imported and run on various hardware configurations. For this cross build exercise the ARM library dependencies used by this application has to be resolved first. Here’s how you can resolve those:

1. Right click on the project and navigate to Properties->Build->Settings->Tool Settings->NVCC Linker->Libraries and update the paths to point to linux/armv7l instead of linux/x86_64. This will resolve the libGLEW library dependencies. Also remove the entry for GLU since that library is unused.

nsight_cuda_samples_lib_updates

2. Click on the Miscellaneous tab and add a new -Xlinker option “—unresolved-symbols=ignore-in-shared-libs” (without the quotes).

3. In the terminal window use the scp utility to copy the remaining libraries from your Jetson TK1:

scp [email protected].ip.address:/usr/lib/arm-linux-gnueabihf/libglut.so.3/usr/arm-linux-gnueabihf/lib folder,with a symlink to libglut.so
scp [email protected].ip.address:/usr/lib/arm-linux-gnueabihf/tegra/libGL.so.1/usr/arm-linux/gnueabihf/lib folder,with a symlink to libGL.so
scp [email protected].ip.address:/usr/lib/arm-linux-gnueabihf/libX11.so.6/usr/arm-linux-gnueabihf/lib folder,with a symlink to libX11.so

Note: You need to copy these ARM libraries only for the first CUDA sample. You may need additional libraries for other samples.

The build process for ARM cross-development is similar to the local build process. Just click on the build “hammer” icon in the toolbar menu to build a debug ARM binary.  As part of the compilation process, Nsight will launch nvcc for the GPU code and the arm-linux-gnueabihf-g++-4.6 cross-compiler for the CPU code as follows:

Building file:../src/boxFilter_kernel.cu
Invoking: NVCC Compiler/usr/local/cuda-6.0/bin/nvcc -I"/usr/local/cuda-6.0/samples/3_Imaging"-I"/usr/local/cuda-6.0/samples/common/inc"-I"/home/satish/cuda-workspace_new/boxfilter-arm"-G -g -O0 -ccbin arm-linux-gnueabihf-g++-4.6-gencode arch=compute_30,
code=sm_30 -gencode arch=compute_32,code=sm_32 --target-cpu-architecture ARM -m32 -odir "src"-M -o "src/boxFilter_kernel.d""../src/boxFilter_kernel.cu"/usr/local/cuda-6.0/bin/nvcc --compile -G -I"/usr/local/cuda-6.0/samples/3_Imaging"-I"/usr/local/cuda-6.0/samples/common/inc"-I"/home/satish/cuda-workspace_new/boxfilter-arm"-O0 -g -gencode arch=compute_30,code=compute_30 -gencode arch=compute_32,
code=sm_32 --target-cpu-architecture ARM -m32 -ccbin arm-linux-gnueabihf-g++-4.6-x cu -"src/boxFilter_kernel.o""../src/boxFilter_kernel.cu"Finished building:../src/boxFilter_kernel.cu

After the compilation steps, the linker will resolve all library references, giving you a boxfilter-arm binary that is ready to run.

Running Your First Jetson TK1 Application from Nsight

To run the code on the target Jetson TK1 system, click on Run As->Remote C/C++ Application to setup the target system user and host address.

nsight_remote_run

Once you finish the remote target system configuration setup, click on the Run icon and you will see a new entry to run the boxfilter-arm binary on the Jetson TK1.

Note: Box filter application relies on data files that reside in the data/ subfolder of the application, which will need to be copied to the target system. Use the scp utility to copy those files into the /tmp/nsight-debug/data/ folder on your Jetson TK1.

Next, edit the boxfilter.cpp file as follows:
1. To ensure that the application runs on the correct display device, add this line to the top of the main function:

setenv(“DISPLAY”,“:0”,0);

2. Add the following lines to the top of the display function so that app auto-terminates after a few seconds. This is required to gather deterministic execution data across multiple runs of the application, which we will need later in the profiling section:

staticint icnt =120;while(!icnt--){
    cudaDeviceReset();
    _exit(EXIT_SUCCESS);}

Click on Run to execute the modified Box Filter application on your Jetson TK1.

Debugging Your First Jetson TK1 Application in Nsight

The remote target system configuration that you set up in Nsight earlier will also be visible under the debugger icon in the toolbar.

Before you launch the debugger, note that by default Jetson TK1 does not allow any application to solely occupy the GPU 100% of the time. In order to run the debugger, we need to fix this. On your Jetson TK1, login as root (sudo su) and then disable the timeout as follows (in future releases of CUDA, the debugger will handle this automatically):

[email protected]-ubuntu:/home/ubuntu# echo N > sys/kernel/debug/gk20a.0/timeouts_enabled

Now we can launch the debugger using the debug icon back on the host system. Nsight will switch you to its debugger perspective and break on the first instruction in the CPU code. You can single-step a bit there to see the execution on the CPU and watch the variables and registers as they are updated.

To break on any and all CUDA kernels executing on the GPU, go to the breakpoint tab in the top-right pane of Nsight and click on the cube icon dropdown. Then select the “break on application kernel launches” feature to break on the first instruction of a CUDA kernel launch. You can now resume the application, which will run until the first breakpoint is hit in the CUDA kernel. From here, you can browse the CPU and GPU call stack in the top-left pane. You can also view the variables, registers and HW state in the top-right pane. In addition, you can see that the Jetson TK1’s GPU is executing 16 blocks of 64 threads each running on the single Streaming Multiprocessor (SMX) of this GK20A GPU.

You can also switch to disassembly view and watch the register values being updated by clicking on the i-> icon to do GPU instruction-level single-stepping.

nsight_debug_view

To “pin” (focus on) specific GPU threads, double click the thread(s) of interest in the CUDA tab in the top-right pane. The pinned CUDA threads will appear in the top-left pane, allowing you to select and single-step just those threads. (Keep in mind, however, that single-stepping a given thread causes the remaining threads of the same warp to step as well, since they share a program counter.)  You can experiment and watch this by pinning threads that belong to different warps.

There are more useful debug features that you will find by going into the debug configuration settings from the debug icon drop down, such as enabling cuda-memcheck and attaching to a running process (on the host system only).

To quit the application you are debugging, click the red stop button in the debugger perspective.

Profiling Your First Jetson TK1 Application in Nsight

Let’s switch back to the C++ project editor view to start the profiler run. The remote target system configuration you setup in Nsight earlier will also be visible to you under the profiler icon in the toolbar.

Before you launch the profiler, note that you need to create a release build with -lineinfo included in the compile options. This tells the compiler to generate information on source-to-instruction correlation. To do this, first go to the project settings by right-clicking on the project in the left pane. Then navigate to Properties->Build->Settings->Tool Settings->Debugging and check the box that says “Generate line-number…” and click Apply.

Back in the main window, click on the build hammer dropdown menu to create a release build. Resolve any build issues as you did during the first run above, then click on the Run As->Remote C/C++ Application to run the release build of the application. At this point Nsight will overwrite the Jetson TK1 system with the release binary you want to profile and run it once.

Next click on the profile icon dropdown and choose Profile Configurations where you must select “Profile Remote Application” since the binary is already on the Jetson TK1. Nsight will then switch you to the profiler perspective while it runs the application to gather an execution timeline view of all the CUDA Runtime and Driver API calls and of the kernels that executed on the GPU. The properties tab displays details of any event you select from this timeline; the details of the events can also be viewed in text form in the Details tab in the lower pane.

nsight_profile_view

Below the timeline view in the lower pane, there is also an Analysis tab that is very useful for performance tuning. It guides you through a step-by-step approach on resolving performance bottlenecks in your application. You can switch between guided and unguided analysis by clicking on their icons under the Analysis tab.

You can also get a source-to-instruction correlation view, with hot spots (where the instructions-executed count was particularly high) identified in red as shown in the figure below. You get this view from within the guided analysis mode by first clicking on “Examine Individual Kernels” and selecting the highest ranked (100) kernel from the list of examined kernels, then clicking “Perform Kernel Analysis” followed by “Perform Compute Analysis.” From there, clicking “Show Kernel Profile” will show d_boxfilter_rgba_a kernel in the right pane. Double-click on the kernel name to see the source-to-instruction view. Clicking on a given line of source code highlights the corresponding GPU instructions.

nsight_rc_to_sass

As you can see, whether you are new to NVIDIA® Nsight™ Eclipse Edition or an avid Nsight user, Nsight makes it just as easy and straightforward to create CUDA applications for the Jetson TK1 platform as for all your CUDA-enabled GPUs.

04, I recommend the synchronize-projects remote development mode, which I will cover in detail in a later blog post.

相關推薦

NVIDIA Nsight Eclipse Edition for Jetson TK1

NVIDIA® Nsight™ Eclipse Edition is a full-featured, integrated development environment that lets you easily develop CUDA® applications f

NVIDIA Jetson TX2 進階:Nsight Eclipse Edition

一、NSight簡介     Jetpack開發工具為人工智慧提供了一整套軟體架構,包括程式碼示例(Sample Code)、NSight開發工具(NSight Developer Tools)。同時也為我們提供了豐富的多媒體API(Multimedia API),這些API涵蓋深度學習(Deep

Linux下CUDA整合開發環境-NSight Eclipse Edition

從CUDA5.0開始,CUDA ToolKit和CUDA SDK整合到了一個安裝包內,同時安裝包內還集成了Nividia開發的基於Eclipse的CUDA整合開發工具NSight Eclipse Edition,給CUDA開發者提供了一個很好的開發工具。 之前嘗試過不少Li

NSight Eclipse Edition 下建立CUDA程式並執行遠端編譯及除錯

配置好NSight Eclipse Edition 的CUDA開發環境後就可以進行開發了。 (1)載入已寫好的sample 建立一個CUDA專案 選擇匯入CUDA Sample 下一步(可以選擇全部例子,也可以選擇一個) 這裡我選擇了clock

NVIDIA Nsight Eclipse 中使用CULA庫

CULA 是GPU的線性代數庫。 在NVIDIA Nsight Eclipse新建一個cuda c 專案 設定專案的屬性  在Properties 》》General>>>Paths and Symbols 中 設定  includes 新增CULA

Nvidia Nsight Eclipse匯入已有工程

苦逼的匯入工程都不會,步驟記錄如下: File->New->MakeFile Project With Existing Code 之後出現下面的介面: 點選Browse

NVIDIA Jetson TK1終端執行rviz顯示segmentation fault的解決方法

#問題描述 終端執行 rosrun rviz rviz 顯示 $ Segmentation fault #解決辦法 方案一 一、重新設定TK1的顯示卡設定 sudo apt-get purge nvidia-* sudo rm /etc/X11/xorg.conf

NVIDIA Jetson TK1 火狐瀏覽器崩潰問題

背景:Caffe訓練好網路後想用jupyter視覺化caffemodel,但發現Firefox瀏覽器崩潰了打不開,而Jupyter Notebook需要依賴瀏覽器,而Tk1是32位系統,安裝軟體需要注意幾點,寫這個教程記錄一下。 Warning:Warning:

InstallShield Limited Edition for Visual Studio 國內註冊時國家無下拉框解決方法

exe -i 添加 -s war value span 輸入 eval 註冊地址:http://learn.flexerasoftware.com/content/IS-EVAL-InstallShield-Limited-Edition-Visual-Studio 火狐打

VS 之 InstallShield Limited Edition for Visual Studio 2015 圖文教程

安裝使用 全部 文件夾 安裝步驟 一行 ive format 徹底 ins 從Visual Studio 2012開始,微軟就把自家原來的安裝與部署工具徹底廢掉了,轉而讓大家去安裝使用第三方的打包工具“InstallShield Limited Edition

Jetson tk1 安裝 Intel 7260ac 無線網絡卡驅動

首先,利用Jetpack將Jetson TK1升級到最新的L4T (version 21.3 +) 如果工作環境能提供有線網路,請將網線插到開發板,在開發板L4T的terminal輸入以下指令來下載並安裝驅動: sudo apt-get install git git clon

Eclipse neon for java ee開發android點選layout下檔案閃退問題

Eclipse neon開發android,專案用的4.0.3版本系統,原來在JDK1.6環境開發,而Eclipse neon要求JDK1.8,雖然做了JDK1.6相容,點選layout資料夾下的xml檔案,依舊出現Eclipse閃退的問題,log顯示和MSCVR.dll衝突了,裝了個JDK1.6進

NVIDIA Launches Software Tools for Turing GPUs

Developers can get off to a running start with Turing, our new GPU architecture, using our latest software tools. Unveiled last month, Turing is one of the

NVIDIA Xavier Achieves Milestone for Safe Self-Driving

The end product isn’t what makes a new technology safe. It’s everything that goes into a product — from design to development to manufacturing. That’s why

InstallShield 2013 Limited Edition for Visual Studio

新建打包專案後,解決方案資源管理器中的結構如下: Project Assistant介面如下: 在Project Assistant中按照步驟建立打包專案: 1、Application Information 中填寫公司名稱、產品名稱、版本號、網址的資訊

InstallShield Limited Edition for Visual Studio 2013 圖文教程(教你如何打包.NET程式)

InstallShield Limited Edition for Visual Studio 2013 圖文教程(教你如何打包.NET Framework進去) 從Visual Studio 2012開始,微軟就把自家原來的安裝與部署工具徹底廢掉了,轉而讓大家去安裝使用第三方的打包工具“Install

JETSON TK1 ~ 刷機和克隆韌體

1:PC端的ubuntu。 要求必須是正常系統,不可以使用虛擬機器。由於燒寫過程採用刷機模式,虛擬機器刷機易導致刷機問題。 2:驅動包、檔案系統和原始碼下載 1、Driver package(驅動包,相當於安裝程式) 2、Sample File System 3、Kernel sources 3

【C#】vs2012 安裝與部署 怎麼打包程式(InstallShield 2013 Limited Edition for Visual Studio)

下載完後安裝,然後獲取啟用碼:InstallShield的啟用碼:用你的郵箱註冊,會免費發註冊碼的,不需要破解這裡先贈送兩個:8469BQW-D11-00C159848N5790BQW-D11-1194

Linux下安裝JRE和Eclipse IDE for C/C++ Developers

Linux32位,下載eclipse-cpp-luna-R-linux-gtk.tar.gz和jre-8u11-linux-i586.rpm  放到家目錄中。http://www.eclipse.org/downloads/?osType=linux&release

InstallShield Limited Edition for Visual Studio 2013 簡單體驗

最近工作比較忙,在開發過程中想到最後還是打包成安裝程式,這樣是最完美的因為開發的程式是需要.NET支援的,做成安裝程式後在安裝的時候如果沒有必要的環境,安裝程式可以一同安裝。廢話少說直接啟動VS2012後找到安裝和部署選中InstallShield 選項,初次安裝會開啟一個