In labels above,[0,0,0,1,1,2,2,3,3] represents the outer layer's label,0 represents 'a',1 represents 'b',2 represents 'c', 3 represents 'd'.
[0,1,2,0,2,0,1,1,2] represents the inner layer's label,0 represents 1 in data's inner index, 1-->2,2-->3.
data['b']
1-0.50224530.640700
dtype: float64
data['b':'c']
b1 -0.50224530.640700
c 10.06363921.290096
dtype: float64
data.loc[['b','d']] # data['b','d'] is wrong,because in this case,'b' shall be index,'d' shall be columns.
b1 -0.50224530.640700
d 2 -0.00389930.541342
dtype: float64
data.loc['b','d']
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-8-f6a5fae3fedc> in <module>()
----> 1 data.loc['b','d']
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1470except (KeyError, IndexError):
1471 pass
-> 1472returnself._getitem_tuple(key)
1473else:
1474# we by definition only have the 0th axis
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
873874# no multi-index, so validate all of the indexers
--> 875self._has_valid_tuple(tup)
876877# ugly hack for GH #836
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key)
218for i, k in enumerate(key):
219if i >= self.obj.ndim:
--> 220 raise IndexingError('Too many indexers')
221try:
222self._validate_key(k, i)
IndexingError: Too many indexers
data.loc[:,2]
a -0.348014
c 1.290096
d -0.003899
dtype: float64
data.unstack()
1
2
3
a
-0.396969
-0.348014
-1.340860
b
-0.502245
NaN
0.640700
c
0.063639
1.290096
NaN
d
NaN
-0.003899
0.541342
data.unstack().stack()
a1 -0.3969692 -0.3480143 -1.340860b1 -0.50224530.640700
c 10.06363921.290096
d 2 -0.00389930.541342
dtype: float64
With a DataFrame,either axis can have a hierarchical index.
Notice the level[] after another [] until to the specified column.
help(frame.loc)
Help on _LocIndexer inmodule pandas.core.indexing object:
class _LocIndexer(_LocationIndexer)
| Access a groupofrowsand columns by label(s) or a boolean array.
|| ``.loc[]`` is primarily label based, but may also be used with a
|boolean array.
|| Allowed inputs are:
||- A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is| interpreted as a *label*of the index, and**never**as an
|integer position along the index).
|- A list orarrayof labels, e.g. ``['a', 'b', 'c']``.
|- A slice object with labels, e.g. ``'a':'f'``.
|| .. warning:: Note that contrary to usual python slices, **both** the
|startand the stop are included
||- A booleanarrayof the same length as the axis being sliced,
| e.g. ``[True, False, True]``.
|- A ``callable`` functionwithone argument (the calling Series, DataFrame
|or Panel) and that returns valid output for indexing (oneof the above)
|| See more at :ref:`Selection by Label <indexing.label>`
|| See Also
|--------| DataFrame.at : Access a single valuefor a row/column label pair
| DataFrame.iloc : Access groupofrowsand columns byintegerposition(s)
| DataFrame.xs : Returns a cross-section (row(s) orcolumn(s)) from the
| Series/DataFrame.
| Series.loc : Access groupofvaluesusing labels
|| Examples
|--------|**Getting values**||>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
| ... index=['cobra', 'viper', 'sidewinder'],
| ... columns=['max_speed', 'shield'])
|>>> df
| max_speed shield
| cobra 12| viper 45| sidewinder 78|| Single label. Note this returns the rowas a Series.
||>>> df.loc['viper']
| max_speed 4| shield 5| Name: viper, dtype: int64
|| List of labels. Note using ``[[]]`` returns a DataFrame.
||>>> df.loc[['viper', 'sidewinder']]
| max_speed shield
| viper 45| sidewinder 78|| Single label forrowandcolumn||>>> df.loc['cobra', 'shield']
|2|| Slice with labels forrowand single label for column. As mentioned
| above, note that both the startand stop of the slice are included.
||>>> df.loc['cobra':'viper', 'max_speed']
| cobra 1| viper 4| Name: max_speed, dtype: int64
||Boolean list with the same length as the row axis
||>>> df.loc[[False, False, True]]
| max_speed shield
| sidewinder 78|| Conditional that returns a boolean Series
||>>> df.loc[df['shield'] >6]
| max_speed shield
| sidewinder 78|| Conditional that returns a boolean Series withcolumn labels specified
||>>> df.loc[df['shield'] >6, ['max_speed']]
| max_speed
| sidewinder 7|| Callable that returns a boolean Series
||>>> df.loc[lambda df: df['shield'] ==8]
| max_speed shield
| sidewinder 78||**Setting values**||Setvalueforall items matching the list of labels
||>>> df.loc[['viper', 'sidewinder'], ['shield']] =50|>>> df
| max_speed shield
| cobra 12| viper 450| sidewinder 750||Setvaluefor an entire row||>>> df.loc['cobra'] =10|>>> df
| max_speed shield
| cobra 1010| viper 450| sidewinder 750||Setvaluefor an entire column||>>> df.loc[:, 'max_speed'] =30|>>> df
| max_speed shield
| cobra 3010| viper 3050| sidewinder 3050||Setvalueforrows matching callable condition||>>> df.loc[df['shield'] >35] =0|>>> df
| max_speed shield
| cobra 3010| viper 00| sidewinder 00||**Getting valueson a DataFrame with an index that has integer labels**|| Another example using integers for the index
||>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
| ... index=[7, 8, 9], columns=['max_speed', 'shield'])
|>>> df
| max_speed shield
|712|845|978|| Slice withinteger labels for rows. As mentioned above, note that both| the startand stop of the slice are included.
||>>> df.loc[7:9]
| max_speed shield
|712|845|978||**Getting valueswith a MultiIndex**|| A number of examples using a DataFrame with a MultiIndex
||>>> tuples = [
| ... ('cobra', 'mark i'), ('cobra', 'mark ii'),
| ... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
| ... ('viper', 'mark ii'), ('viper', 'mark iii')
| ... ]
|>>> index = pd.MultiIndex.from_tuples(tuples)
|>>>values= [[12, 2], [0, 4], [10, 20],
| ... [1, 4], [7, 1], [16, 36]]
|>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
|>>> df
| max_speed shield
| cobra mark i 122| mark ii 04| sidewinder mark i 1020| mark ii 14| viper mark ii 71| mark iii 1636|| Single label. Note this returns a DataFrame with a single index.
||>>> df.loc['cobra']
| max_speed shield
| mark i 122| mark ii 04|| Single index tuple. Note this returns a Series.
||>>> df.loc[('cobra', 'mark ii')]
| max_speed 0| shield 4| Name: (cobra, mark ii), dtype: int64
|| Single label forrowand column. Similarto passing in a tuple, this
|returns a Series.
||>>> df.loc['cobra', 'mark i']
| max_speed 12| shield 2| Name: (cobra, mark i), dtype: int64
|| Single tuple. Note using ``[[]]`` returns a DataFrame.
||>>> df.loc[[('cobra', 'mark ii')]]
| max_speed shield
| cobra mark ii 04|| Single tuple for the index with a single label for the column||>>> df.loc[('cobra', 'mark i'), 'shield']
|2|| Slice from index tuple to single label
||>>> df.loc[('cobra', 'mark i'):'viper']
| max_speed shield
| cobra mark i 122| mark ii 04| sidewinder mark i 1020| mark ii 14| viper mark ii 71| mark iii 1636|| Slice from index tuple to index tuple
||>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
| max_speed shield
| cobra mark i 122| mark ii 04| sidewinder mark i 1020| mark ii 14| viper mark ii 71|| Raises
|------| KeyError:
|whenany items arenot found
||Method resolution order:
| _LocIndexer
| _LocationIndexer
| _NDFrameIndexer
| pandas._libs.indexing._NDFrameIndexerBase
| builtins.object
|| Methods inherited from _LocationIndexer:
|| __getitem__(self, key)
||----------------------------------------------------------------------| Methods inherited from _NDFrameIndexer:
|| __call__(self, axis=None)
|Call self as a function.
|| __iter__(self)
|| __setitem__(self, key, value)
||----------------------------------------------------------------------| Data descriptors inherited from _NDFrameIndexer:
|| __dict__
| dictionary for instance variables (if defined)
|| __weakref__
| list of weak referencesto the object (if defined)
||----------------------------------------------------------------------| Data and other attributes inherited from _NDFrameIndexer:
|| axis =None||----------------------------------------------------------------------| Methods inherited from pandas._libs.indexing._NDFrameIndexerBase:
|| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|| __reduce__ = __reduce_cython__(...)
|| __setstate__ = __setstate_cython__(...)
||----------------------------------------------------------------------|Static methods inherited from pandas._libs.indexing._NDFrameIndexerBase:
|| __new__(*args, **kwargs) from builtins.type
|Createandreturn a new object. See help(type) for accurate signature.
||----------------------------------------------------------------------| Data descriptors inherited from pandas._libs.indexing._NDFrameIndexerBase:
|| name
|| ndim
|| obj
Help on method swaplevel in module pandas.core.frame:
swaplevel(i=-2, j=-1, axis=0) method of pandas.core.frame.DataFrame instance
Swap levels i and j in a MultiIndex on a particular axis
Parameters
----------
i, j : int, string (can be mixed)
Level of index to be swapped. Can pass level name as string.
Returns
-------
swapped : type of caller (new object)
.. versionchanged:: 0.18.1
The indexes ``i`` and ``j`` are now optional, and default to
the two innermost levels of the index.
frame.swaplevel('Key1','Key2')
state
Ohio
Colorado
color
Green
Red
Green
Key2
Key1
1
a
0
1
2
2
a
3
4
5
1
b
6
7
8
2
b
9
10
11
help(frame.sort_index)
Help on method sort_index in module pandas.core.frame:
sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None) method of pandas.core.frame.DataFrame instance
Sort objectbylabels (along an axis)
Parameters
----------
axis : index, columns to direct sorting
level : intor level name or list of ints or list of level names
ifnot None, sort on values in specified index level(s)
ascending : boolean, default True
Sort ascending vs. descending
inplace : bool, default False
if True, perform operation in-place
kind : {'quicksort', 'mergesort', 'heapsort'}, default'quicksort'
Choice of sorting algorithm. See also ndarray.np.sort for more
information. `mergesort` is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default'last'
`first` puts NaNs at the beginning, `last` puts NaNs at the end.
Not implemented for MultiIndex.
sort_remaining : bool, default True
iftrueand sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level
Returns
-------
sorted_obj : DataFrame
Help on function merge in module pandas.core.reshape.merge:
merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Merge DataFrame objects by performing a database-style join operation by
columns or indexes.
If joining columns on columns, the DataFrame indexes *will be
ignored*. Otherwise if joining indexes on indexes or indexes on a column or
columns, the index will be passed on.
Parameters
----------
left : DataFrame
right : DataFrame
how : {'left', 'right', 'outer', 'inner'}, default 'inner'
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically
* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys
on : label or list
Column or index level names to join on. These must be found in both
DataFrames. If `on` is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
left_on : label or list, or array-like
Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
right_on : label or list, or array-like
Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels
right_index : boolean, default False
Use the index from the right DataFrame as the join key. Same caveats as
left_index
sort : boolean, default False
Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword)
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right
side, respectively
copy : boolean, default True
If False, do not copy data unnecessarily
indicator : boolean or string, default False
If True, adds a column to output DataFrame called "_merge" with
information on the source of each row.
If string, column with information on source of each row will be added to
output DataFrame, and column will be named value of string.
Information column is Categorical-type and takes on a value of "left_only"
for observations whose merge key only appears in 'left' DataFrame,
"right_only" for observations whose merge key only appears in 'right'
DataFrame, and "both" if the observation's merge key is found in both.
validate : string, default None
If specified, checks if merge is of specified type.
* "one_to_one" or "1:1": check if merge keys are unique in both
left and right datasets.
* "one_to_many" or "1:m": check if merge keys are unique in left
dataset.
* "many_to_one" or "m:1": check if merge keys are unique in right
dataset.
* "many_to_many" or "m:m": allowed, but does not result in checks.
.. versionadded:: 0.21.0
Notes
-----
Support for specifying index levels as the `on`, `left_on`, and
`right_on` parameters was added in version 0.23.0
Examples
--------
>>> A >>> B
lkey value rkey value
0 foo 1 0 foo 5
1 bar 2 1 bar 6
2 baz 3 2 qux 7
3 foo 4 3 bar 8
>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 4 foo 5
2 bar 2 bar 6
3 bar 2 bar 8
4 baz 3 NaN NaN
5 NaN NaN qux 7
Returns
-------
merged : DataFrame
The output type will the be same as 'left', if it is a subclass
of DataFrame.
See also
--------
merge_ordered
merge_asof
DataFrame.join
Above, the many-to-one case has been demonstrated,and that means in pd.merge(df3,df4),values in column 'key' of df4 are all unique.Now,in terms of many-to-many,which means values in column 'key' of df4 are not unique,it forms the Cartesian product of rows.
A last issue to consider in merge operations is the treatment of overlapping column names;merge has a suffixes option for specifying strings to append to overlapping names in the left and right DataFrame objects.
In some cases, the merge key(s) in a DataFrame will be found in its index. In this case,you can pass left_index=True or right_index=True(or both) to indicate that the index should be used as the merge key.
DataFrame has a convenient join instance for merging by index.It can also be used to combine together many DataFrame objects having the same or similiar indexes but non-overlapping columns.
D:\Anaconda\lib\site-packages\pandas\core\frame.py:6369: FutureWarning: Sorting because non-concatenation axis isnot aligned. A future version
of pandas will change tonot sort bydefault.
To accept the future behavior, pass 'sort=False'.To retain the current behavior and silence the warning, pass 'sort=True'.
verify_integrity=True)
Ohio
Nevada
Missori
Alabama
New York
Oregon
a
1.0
2.0
NaN
NaN
7.0
8.0
b
3.0
4.0
7.0
8.0
NaN
NaN
c
5.0
6.0
9.0
10.0
9.0
10.0
d
NaN
NaN
11.0
12.0
NaN
NaN
e
NaN
NaN
13.0
14.0
11.0
12.0
f
NaN
NaN
NaN
NaN
16.0
17.0
Concatenating along an axis
Another kind of data combination operation is referred to interchangebly as concatenation,binding or stacking.Numpy's concatenate can do this with Numpy arrays.
By default,concat works along axis=0,producing another Series.If you pass axis=1,the result will instead be a DataFrame(axis=1 is the column)
pd.concat([s1,s2,s3],axis=1)
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to notsort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
0
1
2
a
0.0
NaN
NaN
b
1.0
NaN
NaN
c
NaN
2.0
NaN
d
NaN
3.0
NaN
e
NaN
4.0
NaN
f
NaN
NaN
5.0
g
NaN
NaN
6.0
s4=pd.concat([s1,s2]);s4
a0b1
c 2
d 3
e 4
dtype: int64
s1
a0b1
dtype: int64
pd.concat([s1,s4])
a0b1a0b1
c 2
d 3
e 4
dtype: int64
pd.concat([s1,s4],axis=1)
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to notsort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
0
1
a
0.0
0
b
1.0
1
c
NaN
2
d
NaN
3
e
NaN
4
join:either 'inner' or 'outer'(default);whether to intersection(inner) or union(outer) together indexes along the other axis.
pd.concat([s1,s4],axis=1,join='inner')
0
1
a
0
0
b
1
1
join_axes: Specific indexes to use for the other n-1 axes insteda of performing union/intersection loggic
keys:values to associate with objects being concatenated,forming a hierarchical index along the concatenation axis;can either be a list or array of arbitrary values,an array of tuples,or a list of arrays(if multiple-level arrays passed in levels)
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to notsort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
"""Entry point for launching an IPython kernel.
Help on built-infunctionwhereinmodule numpy.core.multiarray:
where(...)
where(condition, [x, y])
Return elements, either from `x` or `y`, depending on `condition`.
If only `condition` is given, return ``condition.nonzero()``.
Parameters
----------condition : array_like, bool
WhenTrue, yield `x`, otherwise yield `y`.
x, y : array_like, optional
Valuesfrom which to choose. `x`, `y` and `condition` need to be
broadcastable tosome shape.
Returns-------out : ndarray or tuple of ndarrays
If both `x` and `y` are specified, the output arraycontains
elements of `x` where `condition` isTrue, and elements from
`y` elsewhere.
If only `condition` is given, return the tuple
``condition.nonzero()``, the indices where `condition` is True.
See Also
--------
nonzero, choose
Notes
-----
If `x` and `y` are given and input arrays are1-D, `where` is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
Examples
-------->>> np.where([[True, False], [True, True]],
... [[1, 2], [3, 4]],
... [[9, 8], [7, 6]])
array([[1, 8],
[3, 4]])
>>> np.where([[0, 1], [1, 0]])
(array([0, 1]), array([1, 0]))
>>> x = np.arange(9.).reshape(3, 3)
>>> np.where( x >5 )
(array([2, 2, 2]), array([0, 1, 2]))
>>> x[np.where( x >3.0 )] # Note: resultis1D.
array([ 4., 5., 6., 7., 8.])
>>> np.where(x <5, x, -1) # Note: broadcasting.
array([[ 0., 1., 2.],
[ 3., 4., -1.],
[-1., -1., -1.]])
Find the indices of elements of `x` that arein `goodvalues`.
>>> goodvalues = [3, 4, 7]
>>> ix = np.isin(x, goodvalues)
>>> ix
array([[False, False, False],
[ True, True, False],
[False, True, False]])
>>> np.where(ix)
(array([1, 1, 2]), array([0, 1, 1]))
There is another data combination situation that cannot be expressed as either a merge or concatenation operation.You may have two datasets whose indexes overlap in full or part.As a motivating example,consider Numpy's where function,which performs the array-oriented equivalent of an if-else expression.
f True
e False
d True
c False
b False
a True
dtype: bool
np.where(pd.isnull(a),b,a)
array([0. , 2.5, 2. , 3.5, 4.5, nan])
Series has a combine_first method,which performs the equivalent of this operation along with pandas's usual data alignment logic:
help(pd.Series.combine_first)
Help on function combine_first inmodule pandas.core.series:
combine_first(self, other)
Combine Series values, choosing the calling Series's values
first. Result index will be the union of the two indexes
Parameters
----------
other : Series
Returns
-------
combined : Series
Examples
--------
>>> s1 = pd.Series([1, np.nan])
>>> s2 = pd.Series([3, 4])
>>> s1.combine_first(s2)
0 1.0
1 4.0
dtype: float64
See Also
--------
Series.combine : Perform elementwise operation on two Series
using a given function
b
f 0.0
e 1.0
d 2.0
c 3.0b4.0a NaN
dtype: float64
b[:-2]
f 0.0
e 1.0
d 2.0
c 3.0
dtype: float64
a[2:]
d NaNc3.5
b 4.5
a NaN
dtype: float64
a
f NaN
e 2.5
d NaNc3.5
b 4.5
a NaN
dtype: float64
b[:-2].combine_first(a[2:]) # Combine Series values, choosing the calling Series's values first. Result index will be the union of the two indexes
With DattaFrames,combine_first does the same thing column by column,so you can think of it as 'patching' missing data in the calling object with data from the object you pass.
Help on method reindex in module pandas.core.frame:
reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None) method of pandas.core.frame.DataFrame instance
Conform DataFrame to new index with optional filling logic, placing
NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
copy=False
Parameters
----------
labels : array-like, optional
New labels / index to conform the axis specified by 'axis' to.
index, columns : array-like, optional (should be specified using keywords)
New labels / index to conform to. Preferably an Index object to
avoid duplicating data
axis : intor str, optional
Axis to target. Can be either the axis name ('index', 'columns')
or number (0, 1).
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
method to usefor filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* default: don't fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value
limit : int, default None
Maximum number of consecutive elements to forward or backward fill
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
.. versionadded:: 0.21.0 (list-like tolerance)
Examples
--------
``DataFrame.reindex`` supports two calling conventions
* ``(index=index_labels, columns=column_labels, ...)``
* ``(labels, axis={'index', 'columns'}, ...)``
We *highly* recommend using keyword arguments to clarify your
intent.
Create a dataframe with some fictional data.
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({
... 'http_status': [200,200,404,404,301],
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
... index=index)
>>> df
http_status response_time
Firefox 2000.04
Chrome 2000.02
Safari 4040.07
IE104040.08
Konqueror 3011.00
Create a new indexand reindex the dataframe. By default
values in the new index that donot have corresponding
records in the dataframe are assigned ``NaN``.
>>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
>>> df.reindex(new_index)
http_status response_time
Safari 404.00.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10404.00.08
Chrome 200.00.02
We can fill in the missing values by passing a value to
the keyword ``fill_value``. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
``method`` to fill the ``NaN`` values.
>>> df.reindex(new_index, fill_value=0)
http_status response_time
Safari 4040.07
Iceweasel 00.00
Comodo Dragon 00.00
IE104040.08
Chrome 2000.02
>>> df.reindex(new_index, fill_value='missing')
http_status response_time
Safari 4040.07
Iceweasel missing missing
Comodo Dragon missing missing
IE104040.08
Chrome 2000.02
We can also reindex the columns.
>>> df.reindex(columns=['http_status', 'user_agent'])
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10404 NaN
Konqueror 301 NaN
Or we can use"axis-style" keyword arguments
>>> df.reindex(['http_status', 'user_agent'], axis="columns")
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10404 NaN
Konqueror 301 NaN
To further illustrate the filling functionality in
``reindex``, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
... index=date_index)
>>> df2
prices
2010-01-011002010-01-021012010-01-03 NaN
2010-01-041002010-01-05892010-01-0688
Suppose we decide to expand the dataframe to cover a wider
date range.
>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
prices
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-011002010-01-021012010-01-03 NaN
2010-01-041002010-01-05892010-01-06882010-01-07 NaN
The index entries that did not have a value in the original data frame
(for example, '2009-12-29') are by default filled with ``NaN``.
If desired, we can fill in the missing values using one of several
options.
For example, to backpropagate the last valid value to fill the ``NaN``values, pass ``bfill`` as an argument to the ``method`` keyword.
>>> df2.reindex(date_index2, method='bfill')
prices
2009-12-291002009-12-301002009-12-311002010-01-011002010-01-021012010-01-03 NaN
2010-01-041002010-01-05892010-01-06882010-01-07 NaN
Please note that the ``NaN`` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the ``NaN``values present
in the original dataframe, use the ``fillna()`` method.
See the :ref:`user guide <basics.reindexing>`for more.
Returns
-------
reindexed : DataFrame
An inverse operation to pivot for DataFrames is pands.melt.Rather than transforming one column into many in a new DataFrame, it mergers multiple columns into one,producing a DataFrame that is longer than the input.
help(pd.DataFrame.melt)
Help on function melt in module pandas.core.frame:
melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
"Unpivots" a DataFrame from wide format to long format, optionally
leaving identifier variables set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), whileall other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable'and'value'.
.. versionadded:: 0.20.0
Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are notsetas `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : intor string, optional
If columns are a MultiIndex then use this level to melt.
See also
--------
melt
pivot_table
DataFrame.pivot
Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 121 b 342 c 56
>>> df.melt(id_vars=['A'], value_vars=['B'])
A variable value
0 a B 11 b B 32 c B 5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 11 b B 32 c B 53 a C 24 b C 45 c C 6
The names of 'variable'and'value' columns can be customized:
>>> df.melt(id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 11 b B 32 c B 5
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 121 b 342 c 56
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 11 b B 32 c B 5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 11 b B E 32 c B E 5
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix