Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER #35860

Closed
gawain-git-code opened this issue Jan 14, 2020 · 56 comments
Assignees
Labels
comp:keras Keras related issues comp:tensorboard Tensorboard related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release type:bug Bug

Comments

@gawain-git-code
Copy link

gawain-git-code commented Jan 14, 2020

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Stateless LSTM from Keras tutorial using tf backend
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.1.0
  • Python version: 3.7.4
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: MX150 10GB

Describe the current behavior
When using tf.keras.callbacks.TensorBoard() without the profile_batch setting, it gives out errors of CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER from tensorflow/core/profiler/internal/gpu/cupti_tracer.cc.

Describe the expected behavior
With profile_batch = 0, these two errors are gone.
But comes back when profile_batch = 1, or other non-zero values.

Code to reproduce the issue


from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM


input_len = 1000
tsteps = 2
lahead = 1
batch_size = 1
epochs = 5

print("*" * 33)
if lahead >= tsteps:
    print("STATELESS LSTM WILL ALSO CONVERGE")
else:
    print("STATELESS LSTM WILL NOT CONVERGE")
print("*" * 33)

np.random.seed(1986)

print('Generating Data...')


def gen_uniform_amp(amp=1, xn=10000):

    data_input = np.random.uniform(-1 * amp, +1 * amp, xn)
    data_input = pd.DataFrame(data_input)
    return data_input


to_drop = max(tsteps - 1, lahead - 1)
data_input = gen_uniform_amp(amp=0.1, xn=input_len + to_drop)

expected_output = data_input.rolling(window=tsteps, center=False).mean()

if lahead > 1:
    data_input = np.repeat(data_input.values, repeats=lahead, axis=1)
    data_input = pd.DataFrame(data_input)
    for i, c in enumerate(data_input.columns):
        data_input[c] = data_input[c].shift(i)

expected_output = expected_output[to_drop:]
data_input = data_input[to_drop:]


def create_model(stateful):
    model = Sequential()
    model.add(LSTM(20,
              input_shape=(lahead, 1),
              batch_size=batch_size,
              stateful=stateful))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer='adam')
    return model

print('Creating Stateful Model...')
model_stateful = create_model(stateful=True)


def split_data(x, y, ratio=0.8):
    to_train = int(input_len * ratio)
    to_train -= to_train % batch_size
    x_train = x[:to_train]
    y_train = y[:to_train]
    x_test = x[to_train:]
    y_test = y[to_train:]

    # tweak to match with batch_size
    to_drop = x.shape[0] % batch_size
    if to_drop > 0:
        x_test = x_test[:-1 * to_drop]
        y_test = y_test[:-1 * to_drop]

    # some reshaping
    reshape_3 = lambda x: x.values.reshape((x.shape[0], x.shape[1], 1))
    x_train = reshape_3(x_train)
    x_test = reshape_3(x_test)

    reshape_2 = lambda x: x.values.reshape((x.shape[0], 1))
    y_train = reshape_2(y_train)
    y_test = reshape_2(y_test)

    return (x_train, y_train), (x_test, y_test)


(x_train, y_train), (x_test, y_test) = split_data(data_input, expected_output)
print('x_train.shape: ', x_train.shape)
print('y_train.shape: ', y_train.shape)
print('x_test.shape: ', x_test.shape)
print('y_test.shape: ', y_test.shape)

print('Creating Stateless Model...')
model_stateless = create_model(stateful=False)

import os
import datetime
ROOT_DIR = os.getcwd()
log_dir = os.path.join('callback_tests')
if not os.path.exists(log_dir):
    os.makedirs(log_dir)
print(log_dir)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
                                       
print('Training')
history = model_stateless.fit(x_train,
                    y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    shuffle=False,
                    callbacks=[tensorboard_callback]
                    )


Other info / logs
Train on 800 samples, validate on 200 samples
2020-01-14 21:30:27.591905: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-01-14 21:30:27.594743: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-01-14 21:30:27.599172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll
2020-01-14 21:30:27.704083: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-01-14 21:30:27.716790: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
Epoch 1/5
2020-01-14 21:30:28.370429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-01-14 21:30:28.651767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-01-14 21:30:29.662864: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2020-01-14 21:30:29.670282: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88] GpuTracer has collected 0 callback api events and 0 activity events.
800/800 [==============================] - 5s 6ms/sample - loss: 0.0011 - val_loss: 0.0011
Epoch 2/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5921e-04 - val_loss: 0.0010
Epoch 3/5
800/800 [==============================] - 3s 3ms/sample - loss: 8.5613e-04 - val_loss: 0.0010
Epoch 4/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5458e-04 - val_loss: 9.9713e-04
Epoch 5/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5345e-04 - val_loss: 9.8825e-04

@airMeng
Copy link

airMeng commented Jan 15, 2020

Facing the same problem.

@kevin-hartman
Copy link

Same issue. Different code sample.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.1.0
Python version: 3.7.6
CUDA/cuDNN version: 10.1
GPU model and memory: RX 2080 Ti

@eduardofv
Copy link

eduardofv commented Jan 15, 2020

Same (INSUFFICIENT PRIVILEGES):

2020-01-15 20:28:38.181667: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-01-15 20:28:38.183369: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow installed from (source or binary): binary

CUDA/cuDNN version: 10.1

Host: ttmagpie_d99d3f105d0a
Python: 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
Tensorflow: 2.1.0
GPU: available
GPU 0: GeForce GTX 960M (UUID: GPU-c604cc5b-50da-5483-bde7-f562fb1c3420)
GPU Memory:4GB
Keras: 2.2.4-tf
Hub: 0.7.0

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   46C    P5    N/A /  N/A |   3869MiB /  4046MiB |     26%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@oanush oanush self-assigned this Jan 16, 2020
@oanush oanush added comp:tensorboard Tensorboard related issues TF 2.1 for tracking issues in 2.1 release type:support Support issues labels Jan 16, 2020
@oanush
Copy link

oanush commented Jan 16, 2020

@gawain-git-code ,
I tried running the code in colab and I was able to run it successfully. please find the gist for reference.Thanks!

@oanush oanush added the stat:awaiting response Status - Awaiting response from author label Jan 16, 2020
@gawain-git-code
Copy link
Author

@airMeng @kevin-hartman @eduardofv Are you guys satisfy with the answer from @oanush ?

I personally would not because that was just telling me I can only use profile_batch settings safely in google colab instead of on my own machine setup.

This was not answering the root cause of the problem of why cupti_tracer is signalling the errors. But thank you @oanush for spending the time to help out.

@oanush oanush assigned gowthamkpr and unassigned oanush Jan 17, 2020
@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jan 17, 2020
@eduardofv
Copy link

I am still receiving the error but have moved to another environment. It may have to do with driver updates on Ubuntu. Will try to check again and get back to you.

@gowthamkpr gowthamkpr added type:bug Bug and removed type:support Support issues labels Jan 22, 2020
@gowthamkpr
Copy link

The error states that

User doesn't have sufficient privileges which are required to start the profiling session. One possible reason for this may be that the NVIDIA driver or your system administrator may have restricted access to the NVIDIA GPU performance counters.

May be its an error with the configuration itself.

@gowthamkpr gowthamkpr added the stat:awaiting response Status - Awaiting response from author label Jan 23, 2020
@shaywinter
Copy link

Facing same issue when capturing profile data by tensorboard through gRPC: Tried following solution by nvidia (enable non privileged access to profile counters) - to no avail. my training runs as root inside a container (which was other solution suggested by nvidia).
TF2.1.0, NVIDIA Driver Version: 418.56 CUDA Version: 10.1

@seongmoon729
Copy link

seongmoon729 commented Jan 29, 2020

I have the same issue when I use the official TensorFlow docker image(tensorflow/tensorflow:2.1.0-gpu-py3).
I used 'tf.keras.callbacks.TensorBoard' with model.fit() method, but during the first epoch following errors came out.

E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER

So I go back to the 'tensorflow/tensorflow:2.0.0-gpu-py3' image.

@gowthamkpr gowthamkpr added the comp:keras Keras related issues label Feb 3, 2020
@gowthamkpr gowthamkpr assigned omalleyt12 and unassigned gowthamkpr Feb 3, 2020
@gowthamkpr gowthamkpr added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Feb 3, 2020
@tamaramiteva
Copy link

tamaramiteva commented Feb 4, 2020

@dartlune Did going back to the 'tensorflow/tensorflow:2.0.0-gpu-py3' image helped?

I cannot save the model in both versions of the docker image! The weird thing is that when running the model in jupyter notebook, it saves the model each iteration! But not in python3!

Any suggestions?

@seongmoon729
Copy link

@tamaramiteva Actually I had some error with tensorflow/tensorflow:2.0.0-gpu-py3'.
It was a little different error with the above things.
So I did a little search and found a solution that adds some paths.

LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
LD_INCLUDE_PATH=:/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include

@Liu-Da
Copy link

Liu-Da commented Feb 10, 2020

Any update about this problem?

@vlasenkoalexey
Copy link

This is due to NVIDIA CUPTI libary API change: https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
In order for GPU profiling to work you need to run your job as sudo.
On Kubernetes you'll need a allowPrivilegeEscalation: true, you can see example here:
https://github.com/vlasenkoalexey/criteo/blob/05e2aa4c5a15b9e437a364295b7f1e5e2653a22b/scripts/template.yaml.jinja#L136
It is not convenient and won't work for all use-cases, I hope there is a better solution.

Also note that tensorboard profiler plugin got broken by Chrome 80 update, see tensorflow/tensorboard#3209

Suggested workaround works - run Chrome with --enable-blink-features=ShadowDOMV0,CustomElementsV0,HTMLImports flags

like:
/usr/bin/google-chrome-stable --enable-blink-features=ShadowDOMV0,CustomElementsV0,HTMLImports

@trisolaran
Copy link
Contributor

Adding options nvidia "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf
and reboot should resolve the permision issue.

@mikechen66
Copy link

mikechen66 commented Jul 28, 2020

I have had the test on the above-mentioned solutions. It seems that there is no quick way to go out of the dilemma of the error "CUPTI_ERROR_INSUFFICIENT_PRIVILEGES"

1. The ad-hoc solution

Even though Nvidia gives the temporal solution "CAP_SYS_ADMIN", it is a ad hoc solution. It sometimes works and does not work in the rest of the time.

$ python abc.py --cap-add=CAP_SYS_ADMIN

2. LD_LIBRARY_PATH is not reliable

The following solutions sometimes work ans does not work in the rest of the time.

LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
LD_INCLUDE_PATH=:/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include

or

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/extras/CUPTI/lib64

3. No path of "/etc/modprobe.d/nvidia-kernel-common.conf"

The modprobe.d does not include the path iof "/etc/modprobe.d/nvidia-kernel-common.conf". So I could not add "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf

Nvidia gives quite value explanation on the error. So it is quite strange.

@Liu-Da
Copy link

Liu-Da commented Sep 16, 2020

Adding options nvidia "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf
and reboot should resolve the permision issue.

Thanks , It worked for me.

@bonryu
Copy link

bonryu commented Oct 19, 2020

None of the solutions offered here nor anywhere else has worked for me. Perhaps it may work if one upgrades from Ubuntu 16.04 to Ubuntu 18.04. But since I'm on a shared server, it may take some time to do the upgrade. I have not tried docker yet.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.3.0 (installed with pip install -U tensorflow)
Python version: 3.8.5
CUDA/cuDNN version: 10.1
GPU model and memory: GTX 1080 Ti 11 GB

@SarfarazHabib
Copy link

I am having the same error in anaconda environment. None of the solutions posted above work for me. Does anyone have any ideas what can be done?
Also what does this error actually mean, if someone is kind enough to explain it to a noob ?

@trisolaran
Copy link
Contributor

Tensorflow use NVIDIA provided libcupti for GPU tracing support. However since CUDA 10, that functionality requires CAP_SYS_ADMIN privilege, or you should change the /etc/modprobe.d/nvidia-kernel-common.conf (which also require sudo, but only once).
More information can be found at (which include how to do it on windows)
https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti

I believe that NVIDIA enforce this restriction because of some research papers said you can steal user secrets by probing performance counters.

@SarfarazHabib
Copy link

Hey @trisolaran thanks for brief intro. The thing is i do not have /etc/modprobe.d/nvidia-kernel-common.conf such file. I am using a conda environment.

@kunihik0
Copy link

@SarfarazHabib Hi I am using a conda enviroment too and I solved this problem by adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf. I had not had the file too , so you should make the file.

@SarfarazHabib
Copy link

@kunihik0 Thanks alot for the help. The error is now gone but my training stucks after random epochs. I am using tensorflow 2.3 for now on ubuntu 18.04. Can anyone guide me in any direction with respect to this new problem ??

@solarflarefx
Copy link

Anyone with the same issue on Windows 10? The two offered solutions only work for Linux.

This solved the issue for me:
Right-click on your desktop desktop for quick access to the NVIDIA Control Panel
Windows Step 1: Open the NVIDIA Control Panel, select 'Desktop', and ensure 'Enable Developer Settings' is checked.
Windows Step 2: Under 'Developer' > 'Manage GPU Performance Counters', select 'Allow access to the GPU performance counter to all users' to enable unrestricted profiling[1]

This solution works great for Windows 10 systems. But what about Windows Server 2019? It seems that now Microsoft requires you to get NVIDIA Control Panel from the Microsoft Store, but that is not available on Windows Server 2019. Is there an alternative way to allow these permissions on Windows Server 2019?

@Ruomei
Copy link

Ruomei commented Nov 29, 2020

Adding options nvidia "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf
and reboot should resolve the permision issue.

This works for me after switching from conda to virtualenv and I also need to use sudo inside my virtualenv with the python under /venv/bin/python. Other dependencies are:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.3.1
Python version: 3.7.6
CUDA/cuDNN version: 11.0
GPU model: GeForce GTX TITAN X

Now I can profile with --profile_steps=1000, 1005, for example, 5 steps, but if I increase it to 10, there is this non-deterministic segfault appearing. Not sure whether this happened to anyone else?

@d-miketa
Copy link

Now I can profile with --profile_steps=1000, 1005, for example, 5 steps, but if I increase it to 10, there is this non-deterministic segfault appearing. Not sure whether this happened to anyone else?

Yes, I get that segfault too – I think it's because the overhead of profiling, on top of regular GPU computations, causes GPU memory overflow.

@dhiren-hamal
Copy link

In order to run docker:
nvidia-docker run '--privileged=true' -d -it --name retina_net -v /home/readib/Experiments/:/home -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest /bin/bash

@lyw615
Copy link

lyw615 commented May 19, 2021

@vlasenkoalexey Do you mean the version of NVIDIA CUPTI libary that changes API result in the error? Will the old version that API doesn't change run normally?

@vlasenkoalexey
Copy link

CUPTI library is part of CUDA, before CUDA 10.x profiling didn't require admin privileges. See nvidia doc for details: https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti

@lyw615
Copy link

lyw615 commented May 20, 2021

@vlasenkoalexey But CUDA 10.x is also troubled with the privileges problem in my local , so do many people under this issue. My local configuration: ubuntu 18.04 python3.7 cuda10.1/ cuda10.2 (two machines)

@sushreebarsa
Copy link
Contributor

Was able to reproduce the issue in TF v2.5 ,please find the gist here..Thanks !

@castielhzh
Copy link

I also met this problem.My OS is centos7, adding the conf file under /etc/modprobe.d/ and then rebuilding the inital RAM disk by sudo dracut --force and reboot solve the problem.

@mohantym mohantym self-assigned this Sep 13, 2021
@mohantym
Copy link
Contributor

Hi @gawain-git-code , Could you look at this thread for answer ?

@mohantym mohantym added the stat:awaiting response Status - Awaiting response from author label Sep 20, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 27, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@whybeyoung
Copy link

how about cpu train?

@corneliusschroeder
Copy link

I'm running a tensorflow application in a Docker Container on a Windows Machine with WSL2. I get the following errors:

tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started. I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1363] Profiler found 1 GPUs I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcupti.so.10.1 E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1408] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_NOT_INITIALIZED E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1447] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_NOT_INITIALIZED E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1430] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER

I changed the /etc/modprobe.d/nvidia-kernel-common.conf file as suggested and run the docker container as root-user.
Does anyone have an idea what to try next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues comp:tensorboard Tensorboard related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release type:bug Bug
Projects
None yet
Development

No branches or pull requests