subclassing.py
28 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
"""=============================
Subclassing ndarray in python
=============================
Introduction
------------
Subclassing ndarray is relatively simple, but it has some complications
compared to other Python objects. On this page we explain the machinery
that allows you to subclass ndarray, and the implications for
implementing a subclass.
ndarrays and object creation
============================
Subclassing ndarray is complicated by the fact that new instances of
ndarray classes can come about in three different ways. These are:
#. Explicit constructor call - as in ``MySubClass(params)``. This is
the usual route to Python instance creation.
#. View casting - casting an existing ndarray as a given subclass
#. New from template - creating a new instance from a template
instance. Examples include returning slices from a subclassed array,
creating return types from ufuncs, and copying arrays. See
:ref:`new-from-template` for more details
The last two are characteristics of ndarrays - in order to support
things like array slicing. The complications of subclassing ndarray are
due to the mechanisms numpy has to support these latter two routes of
instance creation.
.. _view-casting:
View casting
------------
*View casting* is the standard ndarray mechanism by which you take an
ndarray of any subclass, and return a view of the array as another
(specified) subclass:
>>> import numpy as np
>>> # create a completely useless ndarray subclass
>>> class C(np.ndarray): pass
>>> # create a standard ndarray
>>> arr = np.zeros((3,))
>>> # take a view of it, as our useless subclass
>>> c_arr = arr.view(C)
>>> type(c_arr)
<class 'C'>
.. _new-from-template:
Creating new from template
--------------------------
New instances of an ndarray subclass can also come about by a very
similar mechanism to :ref:`view-casting`, when numpy finds it needs to
create a new instance from a template instance. The most obvious place
this has to happen is when you are taking slices of subclassed arrays.
For example:
>>> v = c_arr[1:]
>>> type(v) # the view is of type 'C'
<class 'C'>
>>> v is c_arr # but it's a new instance
False
The slice is a *view* onto the original ``c_arr`` data. So, when we
take a view from the ndarray, we return a new ndarray, of the same
class, that points to the data in the original.
There are other points in the use of ndarrays where we need such views,
such as copying arrays (``c_arr.copy()``), creating ufunc output arrays
(see also :ref:`array-wrap`), and reducing methods (like
``c_arr.mean()``.
Relationship of view casting and new-from-template
--------------------------------------------------
These paths both use the same machinery. We make the distinction here,
because they result in different input to your methods. Specifically,
:ref:`view-casting` means you have created a new instance of your array
type from any potential subclass of ndarray. :ref:`new-from-template`
means you have created a new instance of your class from a pre-existing
instance, allowing you - for example - to copy across attributes that
are particular to your subclass.
Implications for subclassing
----------------------------
If we subclass ndarray, we need to deal not only with explicit
construction of our array type, but also :ref:`view-casting` or
:ref:`new-from-template`. NumPy has the machinery to do this, and this
machinery that makes subclassing slightly non-standard.
There are two aspects to the machinery that ndarray uses to support
views and new-from-template in subclasses.
The first is the use of the ``ndarray.__new__`` method for the main work
of object initialization, rather then the more usual ``__init__``
method. The second is the use of the ``__array_finalize__`` method to
allow subclasses to clean up after the creation of views and new
instances from templates.
A brief Python primer on ``__new__`` and ``__init__``
=====================================================
``__new__`` is a standard Python method, and, if present, is called
before ``__init__`` when we create a class instance. See the `python
__new__ documentation
<https://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail.
For example, consider the following Python code:
.. testcode::
class C(object):
def __new__(cls, *args):
print('Cls in __new__:', cls)
print('Args in __new__:', args)
# The `object` type __new__ method takes a single argument.
return object.__new__(cls)
def __init__(self, *args):
print('type(self) in __init__:', type(self))
print('Args in __init__:', args)
meaning that we get:
>>> c = C('hello')
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
type(self) in __init__: <class 'C'>
Args in __init__: ('hello',)
When we call ``C('hello')``, the ``__new__`` method gets its own class
as first argument, and the passed argument, which is the string
``'hello'``. After python calls ``__new__``, it usually (see below)
calls our ``__init__`` method, with the output of ``__new__`` as the
first argument (now a class instance), and the passed arguments
following.
As you can see, the object can be initialized in the ``__new__``
method or the ``__init__`` method, or both, and in fact ndarray does
not have an ``__init__`` method, because all the initialization is
done in the ``__new__`` method.
Why use ``__new__`` rather than just the usual ``__init__``? Because
in some cases, as for ndarray, we want to be able to return an object
of some other class. Consider the following:
.. testcode::
class D(C):
def __new__(cls, *args):
print('D cls is:', cls)
print('D args in __new__:', args)
return C.__new__(C, *args)
def __init__(self, *args):
# we never get here
print('In D __init__')
meaning that:
>>> obj = D('hello')
D cls is: <class 'D'>
D args in __new__: ('hello',)
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
>>> type(obj)
<class 'C'>
The definition of ``C`` is the same as before, but for ``D``, the
``__new__`` method returns an instance of class ``C`` rather than
``D``. Note that the ``__init__`` method of ``D`` does not get
called. In general, when the ``__new__`` method returns an object of
class other than the class in which it is defined, the ``__init__``
method of that class is not called.
This is how subclasses of the ndarray class are able to return views
that preserve the class type. When taking a view, the standard
ndarray machinery creates the new ndarray object with something
like::
obj = ndarray.__new__(subtype, shape, ...
where ``subdtype`` is the subclass. Thus the returned view is of the
same class as the subclass, rather than being of class ``ndarray``.
That solves the problem of returning views of the same type, but now
we have a new problem. The machinery of ndarray can set the class
this way, in its standard methods for taking views, but the ndarray
``__new__`` method knows nothing of what we have done in our own
``__new__`` method in order to set attributes, and so on. (Aside -
why not call ``obj = subdtype.__new__(...`` then? Because we may not
have a ``__new__`` method with the same call signature).
The role of ``__array_finalize__``
==================================
``__array_finalize__`` is the mechanism that numpy provides to allow
subclasses to handle the various ways that new instances get created.
Remember that subclass instances can come about in these three ways:
#. explicit constructor call (``obj = MySubClass(params)``). This will
call the usual sequence of ``MySubClass.__new__`` then (if it exists)
``MySubClass.__init__``.
#. :ref:`view-casting`
#. :ref:`new-from-template`
Our ``MySubClass.__new__`` method only gets called in the case of the
explicit constructor call, so we can't rely on ``MySubClass.__new__`` or
``MySubClass.__init__`` to deal with the view casting and
new-from-template. It turns out that ``MySubClass.__array_finalize__``
*does* get called for all three methods of object creation, so this is
where our object creation housekeeping usually goes.
* For the explicit constructor call, our subclass will need to create a
new ndarray instance of its own class. In practice this means that
we, the authors of the code, will need to make a call to
``ndarray.__new__(MySubClass,...)``, a class-hierarchy prepared call to
``super(MySubClass, cls).__new__(cls, ...)``, or do view casting of an
existing array (see below)
* For view casting and new-from-template, the equivalent of
``ndarray.__new__(MySubClass,...`` is called, at the C level.
The arguments that ``__array_finalize__`` receives differ for the three
methods of instance creation above.
The following code allows us to look at the call sequences and arguments:
.. testcode::
import numpy as np
class C(np.ndarray):
def __new__(cls, *args, **kwargs):
print('In __new__ with class %s' % cls)
return super(C, cls).__new__(cls, *args, **kwargs)
def __init__(self, *args, **kwargs):
# in practice you probably will not need or want an __init__
# method for your subclass
print('In __init__ with class %s' % self.__class__)
def __array_finalize__(self, obj):
print('In array_finalize:')
print(' self type is %s' % type(self))
print(' obj type is %s' % type(obj))
Now:
>>> # Explicit constructor
>>> c = C((10,))
In __new__ with class <class 'C'>
In array_finalize:
self type is <class 'C'>
obj type is <type 'NoneType'>
In __init__ with class <class 'C'>
>>> # View casting
>>> a = np.arange(10)
>>> cast_a = a.view(C)
In array_finalize:
self type is <class 'C'>
obj type is <type 'numpy.ndarray'>
>>> # Slicing (example of new-from-template)
>>> cv = c[:1]
In array_finalize:
self type is <class 'C'>
obj type is <class 'C'>
The signature of ``__array_finalize__`` is::
def __array_finalize__(self, obj):
One sees that the ``super`` call, which goes to
``ndarray.__new__``, passes ``__array_finalize__`` the new object, of our
own class (``self``) as well as the object from which the view has been
taken (``obj``). As you can see from the output above, the ``self`` is
always a newly created instance of our subclass, and the type of ``obj``
differs for the three instance creation methods:
* When called from the explicit constructor, ``obj`` is ``None``
* When called from view casting, ``obj`` can be an instance of any
subclass of ndarray, including our own.
* When called in new-from-template, ``obj`` is another instance of our
own subclass, that we might use to update the new ``self`` instance.
Because ``__array_finalize__`` is the only method that always sees new
instances being created, it is the sensible place to fill in instance
defaults for new object attributes, among other tasks.
This may be clearer with an example.
Simple example - adding an extra attribute to ndarray
-----------------------------------------------------
.. testcode::
import numpy as np
class InfoArray(np.ndarray):
def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
strides=None, order=None, info=None):
# Create the ndarray instance of our type, given the usual
# ndarray input arguments. This will call the standard
# ndarray constructor, but return an object of our type.
# It also triggers a call to InfoArray.__array_finalize__
obj = super(InfoArray, subtype).__new__(subtype, shape, dtype,
buffer, offset, strides,
order)
# set the new 'info' attribute to the value passed
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# ``self`` is a new object resulting from
# ndarray.__new__(InfoArray, ...), therefore it only has
# attributes that the ndarray.__new__ constructor gave it -
# i.e. those of a standard ndarray.
#
# We could have got to the ndarray.__new__ call in 3 ways:
# From an explicit constructor - e.g. InfoArray():
# obj is None
# (we're in the middle of the InfoArray.__new__
# constructor, and self.info will be set when we return to
# InfoArray.__new__)
if obj is None: return
# From view casting - e.g arr.view(InfoArray):
# obj is arr
# (type(obj) can be InfoArray)
# From new-from-template - e.g infoarr[:3]
# type(obj) is InfoArray
#
# Note that it is here, rather than in the __new__ method,
# that we set the default value for 'info', because this
# method sees all creation of default objects - with the
# InfoArray.__new__ constructor, but also with
# arr.view(InfoArray).
self.info = getattr(obj, 'info', None)
# We do not need to return anything
Using the object looks like this:
>>> obj = InfoArray(shape=(3,)) # explicit constructor
>>> type(obj)
<class 'InfoArray'>
>>> obj.info is None
True
>>> obj = InfoArray(shape=(3,), info='information')
>>> obj.info
'information'
>>> v = obj[1:] # new-from-template - here - slicing
>>> type(v)
<class 'InfoArray'>
>>> v.info
'information'
>>> arr = np.arange(10)
>>> cast_arr = arr.view(InfoArray) # view casting
>>> type(cast_arr)
<class 'InfoArray'>
>>> cast_arr.info is None
True
This class isn't very useful, because it has the same constructor as the
bare ndarray object, including passing in buffers and shapes and so on.
We would probably prefer the constructor to be able to take an already
formed ndarray from the usual numpy calls to ``np.array`` and return an
object.
Slightly more realistic example - attribute added to existing array
-------------------------------------------------------------------
Here is a class that takes a standard ndarray that already exists, casts
as our type, and adds an extra attribute.
.. testcode::
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
So:
>>> arr = np.arange(5)
>>> obj = RealisticInfoArray(arr, info='information')
>>> type(obj)
<class 'RealisticInfoArray'>
>>> obj.info
'information'
>>> v = obj[1:]
>>> type(v)
<class 'RealisticInfoArray'>
>>> v.info
'information'
.. _array-ufunc:
``__array_ufunc__`` for ufuncs
------------------------------
.. versionadded:: 1.13
A subclass can override what happens when executing numpy ufuncs on it by
overriding the default ``ndarray.__array_ufunc__`` method. This method is
executed *instead* of the ufunc and should return either the result of the
operation, or :obj:`NotImplemented` if the operation requested is not
implemented.
The signature of ``__array_ufunc__`` is::
def __array_ufunc__(ufunc, method, *inputs, **kwargs):
- *ufunc* is the ufunc object that was called.
- *method* is a string indicating how the Ufunc was called, either
``"__call__"`` to indicate it was called directly, or one of its
:ref:`methods<ufuncs.methods>`: ``"reduce"``, ``"accumulate"``,
``"reduceat"``, ``"outer"``, or ``"at"``.
- *inputs* is a tuple of the input arguments to the ``ufunc``
- *kwargs* contains any optional or keyword arguments passed to the
function. This includes any ``out`` arguments, which are always
contained in a tuple.
A typical implementation would convert any inputs or outputs that are
instances of one's own class, pass everything on to a superclass using
``super()``, and finally return the results after possible
back-conversion. An example, taken from the test case
``test_ufunc_override_with_super`` in ``core/tests/test_umath.py``, is the
following.
.. testcode::
input numpy as np
class A(np.ndarray):
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
args = []
in_no = []
for i, input_ in enumerate(inputs):
if isinstance(input_, A):
in_no.append(i)
args.append(input_.view(np.ndarray))
else:
args.append(input_)
outputs = kwargs.pop('out', None)
out_no = []
if outputs:
out_args = []
for j, output in enumerate(outputs):
if isinstance(output, A):
out_no.append(j)
out_args.append(output.view(np.ndarray))
else:
out_args.append(output)
kwargs['out'] = tuple(out_args)
else:
outputs = (None,) * ufunc.nout
info = {}
if in_no:
info['inputs'] = in_no
if out_no:
info['outputs'] = out_no
results = super(A, self).__array_ufunc__(ufunc, method,
*args, **kwargs)
if results is NotImplemented:
return NotImplemented
if method == 'at':
if isinstance(inputs[0], A):
inputs[0].info = info
return
if ufunc.nout == 1:
results = (results,)
results = tuple((np.asarray(result).view(A)
if output is None else output)
for result, output in zip(results, outputs))
if results and isinstance(results[0], A):
results[0].info = info
return results[0] if len(results) == 1 else results
So, this class does not actually do anything interesting: it just
converts any instances of its own to regular ndarray (otherwise, we'd
get infinite recursion!), and adds an ``info`` dictionary that tells
which inputs and outputs it converted. Hence, e.g.,
>>> a = np.arange(5.).view(A)
>>> b = np.sin(a)
>>> b.info
{'inputs': [0]}
>>> b = np.sin(np.arange(5.), out=(a,))
>>> b.info
{'outputs': [0]}
>>> a = np.arange(5.).view(A)
>>> b = np.ones(1).view(A)
>>> c = a + b
>>> c.info
{'inputs': [0, 1]}
>>> a += b
>>> a.info
{'inputs': [0, 1], 'outputs': [0]}
Note that another approach would be to to use ``getattr(ufunc,
methods)(*inputs, **kwargs)`` instead of the ``super`` call. For this example,
the result would be identical, but there is a difference if another operand
also defines ``__array_ufunc__``. E.g., lets assume that we evalulate
``np.add(a, b)``, where ``b`` is an instance of another class ``B`` that has
an override. If you use ``super`` as in the example,
``ndarray.__array_ufunc__`` will notice that ``b`` has an override, which
means it cannot evaluate the result itself. Thus, it will return
`NotImplemented` and so will our class ``A``. Then, control will be passed
over to ``b``, which either knows how to deal with us and produces a result,
or does not and returns `NotImplemented`, raising a ``TypeError``.
If instead, we replace our ``super`` call with ``getattr(ufunc, method)``, we
effectively do ``np.add(a.view(np.ndarray), b)``. Again, ``B.__array_ufunc__``
will be called, but now it sees an ``ndarray`` as the other argument. Likely,
it will know how to handle this, and return a new instance of the ``B`` class
to us. Our example class is not set up to handle this, but it might well be
the best approach if, e.g., one were to re-implement ``MaskedArray`` using
``__array_ufunc__``.
As a final note: if the ``super`` route is suited to a given class, an
advantage of using it is that it helps in constructing class hierarchies.
E.g., suppose that our other class ``B`` also used the ``super`` in its
``__array_ufunc__`` implementation, and we created a class ``C`` that depended
on both, i.e., ``class C(A, B)`` (with, for simplicity, not another
``__array_ufunc__`` override). Then any ufunc on an instance of ``C`` would
pass on to ``A.__array_ufunc__``, the ``super`` call in ``A`` would go to
``B.__array_ufunc__``, and the ``super`` call in ``B`` would go to
``ndarray.__array_ufunc__``, thus allowing ``A`` and ``B`` to collaborate.
.. _array-wrap:
``__array_wrap__`` for ufuncs and other functions
-------------------------------------------------
Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using
``__array_wrap__`` and ``__array_prepare__``. These two allowed one to
change the output type of a ufunc, but, in contrast to
``__array_ufunc__``, did not allow one to make any changes to the inputs.
It is hoped to eventually deprecate these, but ``__array_wrap__`` is also
used by other numpy functions and methods, such as ``squeeze``, so at the
present time is still needed for full functionality.
Conceptually, ``__array_wrap__`` "wraps up the action" in the sense of
allowing a subclass to set the type of the return value and update
attributes and metadata. Let's show how this works with an example. First
we return to the simpler example subclass, but with a different name and
some print statements:
.. testcode::
import numpy as np
class MySubClass(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
print('In __array_finalize__:')
print(' self is %s' % repr(self))
print(' obj is %s' % repr(obj))
if obj is None: return
self.info = getattr(obj, 'info', None)
def __array_wrap__(self, out_arr, context=None):
print('In __array_wrap__:')
print(' self is %s' % repr(self))
print(' arr is %s' % repr(out_arr))
# then just call the parent
return super(MySubClass, self).__array_wrap__(self, out_arr, context)
We run a ufunc on an instance of our new array:
>>> obj = MySubClass(np.arange(5), info='spam')
In __array_finalize__:
self is MySubClass([0, 1, 2, 3, 4])
obj is array([0, 1, 2, 3, 4])
>>> arr2 = np.arange(5)+1
>>> ret = np.add(arr2, obj)
In __array_wrap__:
self is MySubClass([0, 1, 2, 3, 4])
arr is array([1, 3, 5, 7, 9])
In __array_finalize__:
self is MySubClass([1, 3, 5, 7, 9])
obj is MySubClass([0, 1, 2, 3, 4])
>>> ret
MySubClass([1, 3, 5, 7, 9])
>>> ret.info
'spam'
Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method
with arguments ``self`` as ``obj``, and ``out_arr`` as the (ndarray) result
of the addition. In turn, the default ``__array_wrap__``
(``ndarray.__array_wrap__``) has cast the result to class ``MySubClass``,
and called ``__array_finalize__`` - hence the copying of the ``info``
attribute. This has all happened at the C level.
But, we could do anything we wanted:
.. testcode::
class SillySubClass(np.ndarray):
def __array_wrap__(self, arr, context=None):
return 'I lost your data'
>>> arr1 = np.arange(5)
>>> obj = arr1.view(SillySubClass)
>>> arr2 = np.arange(5)
>>> ret = np.multiply(obj, arr2)
>>> ret
'I lost your data'
So, by defining a specific ``__array_wrap__`` method for our subclass,
we can tweak the output from ufuncs. The ``__array_wrap__`` method
requires ``self``, then an argument - which is the result of the ufunc -
and an optional parameter *context*. This parameter is returned by
ufuncs as a 3-element tuple: (name of the ufunc, arguments of the ufunc,
domain of the ufunc), but is not set by other numpy functions. Though,
as seen above, it is possible to do otherwise, ``__array_wrap__`` should
return an instance of its containing class. See the masked array
subclass for an implementation.
In addition to ``__array_wrap__``, which is called on the way out of the
ufunc, there is also an ``__array_prepare__`` method which is called on
the way into the ufunc, after the output arrays are created but before any
computation has been performed. The default implementation does nothing
but pass through the array. ``__array_prepare__`` should not attempt to
access the array data or resize the array, it is intended for setting the
output array type, updating attributes and metadata, and performing any
checks based on the input that may be desired before computation begins.
Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or
subclass thereof or raise an error.
Extra gotchas - custom ``__del__`` methods and ndarray.base
-----------------------------------------------------------
One of the problems that ndarray solves is keeping track of memory
ownership of ndarrays and their views. Consider the case where we have
created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``.
The two objects are looking at the same memory. NumPy keeps track of
where the data came from for a particular array or view, with the
``base`` attribute:
>>> # A normal ndarray, that owns its own data
>>> arr = np.zeros((4,))
>>> # In this case, base is None
>>> arr.base is None
True
>>> # We take a view
>>> v1 = arr[1:]
>>> # base now points to the array that it derived from
>>> v1.base is arr
True
>>> # Take a view of a view
>>> v2 = v1[1:]
>>> # base points to the view it derived from
>>> v2.base is v1
True
In general, if the array owns its own memory, as for ``arr`` in this
case, then ``arr.base`` will be None - there are some exceptions to this
- see the numpy book for more details.
The ``base`` attribute is useful in being able to tell whether we have
a view or the original array. This in turn can be useful if we need
to know whether or not to do some specific cleanup when the subclassed
array is deleted. For example, we may only want to do the cleanup if
the original array is deleted, but not the views. For an example of
how this can work, have a look at the ``memmap`` class in
``numpy.core``.
Subclassing and Downstream Compatibility
----------------------------------------
When sub-classing ``ndarray`` or creating duck-types that mimic the ``ndarray``
interface, it is your responsibility to decide how aligned your APIs will be
with those of numpy. For convenience, many numpy functions that have a corresponding
``ndarray`` method (e.g., ``sum``, ``mean``, ``take``, ``reshape``) work by checking
if the first argument to a function has a method of the same name. If it exists, the
method is called instead of coercing the arguments to a numpy array.
For example, if you want your sub-class or duck-type to be compatible with
numpy's ``sum`` function, the method signature for this object's ``sum`` method
should be the following:
.. testcode::
def sum(self, axis=None, dtype=None, out=None, keepdims=False):
...
This is the exact same method signature for ``np.sum``, so now if a user calls
``np.sum`` on this object, numpy will call the object's own ``sum`` method and
pass in these arguments enumerated above in the signature, and no errors will
be raised because the signatures are completely compatible with each other.
If, however, you decide to deviate from this signature and do something like this:
.. testcode::
def sum(self, axis=None, dtype=None):
...
This object is no longer compatible with ``np.sum`` because if you call ``np.sum``,
it will pass in unexpected arguments ``out`` and ``keepdims``, causing a TypeError
to be raised.
If you wish to maintain compatibility with numpy and its subsequent versions (which
might add new keyword arguments) but do not want to surface all of numpy's arguments,
your function's signature should accept ``**kwargs``. For example:
.. testcode::
def sum(self, axis=None, dtype=None, **unused_kwargs):
...
This object is now compatible with ``np.sum`` again because any extraneous arguments
(i.e. keywords that are not ``axis`` or ``dtype``) will be hidden away in the
``**unused_kwargs`` parameter.
"""
from __future__ import division, absolute_import, print_function