Videos
You don't need -s because it is the default situation by itself.
-o specifies the output file which can be imported later or open by Nvidia visual profiler, which is not human readable.
If you need to compute occupancy you can use this occupancy calculator provided by Nvidia. There's an article about it.
If you need to trace branches you can use profiling in trace mode and trace branches.
You can open your output file in Nvidia Visual Profiler (usually included in CUDA SDK).
There's also one more possibility to produce human-readable files: you can specify --log-file human-readable-output.log option for nvprof (of course human-readable-output.log is your output file name).
You can enable some other nvprof options for your log output:
--print-gpu-tracefor GPU trace;--eventsfor collecting events (for example branch, number of launched warps etc.);--metricsfor some custom metrics (like shared load transactions, dram utilization etc - full list of metrics you can view by typingnvprof --query-metricsin your command line).
Full list of options you can find in NVIDIA nvprof documentation.
» pip install nvprof