A NumPy-compatible matrix library accelerated by CUDA

CuPy is an open-source matrix library accelerated with NVIDIA CUDA. It also uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver and NCCL to make full use of the GPU architecture.

CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement.
All you need to do is just replace `numpy`

with `cupy`

in your Python code.
It supports various methods, indexing, data types, broadcasting and more.

```
>>> import cupy as cp
>>> x = cp.arange(6).reshape(2, 3).astype('f')
>>> x
array([[ 0., 1., 2.],
[ 3., 4., 5.]], dtype=float32)
>>> x.sum(axis=1)
array([ 3., 12.], dtype=float32)
```

The easiest way to install CuPy is to use pip. CuPy also supports various versions of CUDA and cuDNN. The installer automatically detects installed versions of CUDA and cuDNN in your environment. See details.

```
% pip install cupy
```

You can easily make a custom CUDA kernel if you want to make your code run faster, requiring only a small code snippet of C++. CuPy automatically wraps and compiles it to make a CUDA binary. Compiled binaries are cached and reused in subsequent runs. See details.

```
>>> x = cp.arange(6, dtype='f').reshape(2, 3)
>>> y = cp.arange(3, dtype='f')
>>> kernel = cp.ElementwiseKernel(
... 'float32 x, float32 y', 'float32 z',
... '''if (x - 2 > y) {
... z = x * y;
... } else {
... z = x + y;
... }''', 'my_kernel')
>>> kernel(x, y)
array([[ 0., 2., 4.],
[ 0., 4., 10.]], dtype=float32)
```