In [161]: a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','
...: float']})
...: for i in range(len(a)):
...: a[i] = i
...:
In [162]: a
Out[162]:
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8')])
define the new dtype:
In [164]: a.dtype.descr
Out[164]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8')]
In [165]: a.dtype.descr+[('test','O')]
Out[165]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')]
In [166]: dt= a.dtype.descr+[('test','O')]
new array of right size and dtype:
In [167]: b = np.empty(a.shape, dt)
copy values from a to b by field name:
In [168]: for name in a.dtype.names:
...: b[name] = a[name]
...:
In [169]: b
Out[169]:
array([(0, 0, 0., None), (1, 1, 1., None), (2, 2, 2., None)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')])
Many of the rf functions do this field by field copy:
rf.recursive_fill_fields(a,b)
rf.append_fields uses this after it initializes it's output array.
In earlier versions a multifield index produced a copy, so expressions like b[list(a.dtype.names)] = a would not work.
I don't know if it's worth trying to figure out what rf.append_fields is doing. Those functions are somewhat old, and not heavily used (note the special import). So it's entirely likely that they have bugs, or edge cases , that don't work. The functions that I've examined function much as I demonstrated - make a new dtype, and result array, and copy data by field name.
In recent releases there have been changes in how multiple fields are accessed. There are some new functions in recfunctions to facilitate working with structured arrays, such as repack_fields.
https://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields
I don't know if any of that applies to the append_fields problem. I see there's also a section about structured arrays with objects, but I haven't studied that:
https://docs.scipy.org/doc/numpy/user/basics.rec.html#viewing-structured-arrays-containing-objects
In order to prevent clobbering object pointers in fields of numpy.object type, numpy currently does not allow views of structured arrays containing objects.
This line apparently refers to the use of view method. Views created by field indexing, whether single name or multifield lists, are not affected.
The error in append_fields comes from this operation:
In [183]: data = np.array([None,None,None])
In [184]: data
Out[184]: array([None, None, None], dtype=object)
In [185]: data.view([('test',object)])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-185-c46c4464b53c> in <module>
----> 1 data.view([('test',object)])
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
There's no problem creating an compound dtype with object dtypes:
In [186]: np.array([None,None,None], dtype=[('test',object)])
Out[186]: array([(None,), (None,), (None,)], dtype=[('test', 'O')])
But I don't see any recfunctions that are capable of joining a and data.
view can be used to change the field names of a:
In [219]: a.view([('AA',int),('BB',int),('cc',float)])
Out[219]:
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
dtype=[('AA', '<i8'), ('BB', '<i8'), ('cc', '<f8')])
but trying to do so for b fails for the same reason:
In [220]: b.view([('AA',int),('BB',int),('cc',float),('d',object)])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-220-ab0a6e4dd57f> in <module>
----> 1 b.view([('AA',int),('BB',int),('cc',float),('d',object)])
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
I start with a object dtype array, and try to view with i8 (same size dtype), I get this same error. So the restriction on view of a object dtype isn't limited to structured arrays. The need for such a restriction in the case of object pointer to i8 makes sense. The need for such a restriction in the case of embedding the object pointer in a compound dtype might not be so compelling. It might even be overkill, or just a case of simply playing it safe and simple.
In [267]: x.dtype
Out[267]: dtype('O')
In [268]: x.shape
Out[268]: (3,)
In [269]: x.dtype.itemsize
Out[269]: 8
In [270]: x.view('i8')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-270-30c78b13cd10> in <module>
----> 1 x.view('i8')
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
Note that the test in line 493 checks the hasobject property of both the new and old dtypes. A more nuanced test might check if both hasobject, but I suspect the logic could get quite complex. Sometimes a simple prohibition is safer (and easier) a complex set of tests.
In further testing
In [283]: rf.structured_to_unstructured(a)
Out[283]:
array([[ 3., 3., 0.],
[12., 10., 1.],
[ 2., 2., 2.]])
but trying to do the same on b, or even a subset of its fields produces the familiar error:
rf.structured_to_unstructured(b)
rf.structured_to_unstructured(b[['A','B','C']])
I have to first use repack to make a object-less copy:
rf.structured_to_unstructured(rf.repack_fields(b[['A','B','C']]))