pandas 最常用的三种基本数据结构:
1、dataFrame:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
DataFrame相当于有表格(eg excel),有行表头和列表头
1.1初始化:
a=pd.DataFrame(np.random.rand(4,5),index=list("ABCD"),columns=list('abcde'))
1.2
a['f']=[1,2,3,4] a['e']=10 print a print "=======================" #增加行或修改行 a.ix['D']=10
(1)
a.values;
a.index
a.columns
a[['b','e']] #取'b','e'列
a['b'] #取'b'列
(2)
a.['a'] #取a列
a.iat[0] #取A行 用序号表示
a.loc['A'] #A行
a.at['A','a'] #取A行a列的交叉的值
a.loc[['A','B']] # A ,B hang
Attributes
T |
Transpose index and columns. a.T |
at |
Access a single value for a row/column label pair. a.at[index,column] a.at[index,column] = 10 赋值 |
axes |
Return a list representing the axes of the DataFrame. |
blocks |
(DEPRECATED) Internal property, property synonym for as_blocks() |
columns |
The column labels of the DataFrame. |
dtypes |
Return the dtypes in the DataFrame. |
empty |
Indicator whether DataFrame is empty. |
ftypes |
Return the ftypes (indication of sparse/dense and dtype) in DataFrame. |
iat |
Access a single value for a row/column pair by integer position. |
iloc |
Purely integer-location based indexing for selection by position. |
index |
The index (row labels) of the DataFrame. |
ix |
A primarily label-location based indexer, with integer position fallback. |
loc |
Access a group of rows and columns by label(s) or a boolean array. |
ndim |
Return an int representing the number of axes / array dimensions. |
shape |
Return a tuple representing the dimensionality of the DataFrame. |
size |
Return an int representing the number of elements in this object. |
style |
Property returning a Styler object containing methods for building a styled HTML representation fo the DataFrame. |
values |
Return a Numpy representation of the DataFrame. |
is_copy |
Methods
abs () |
Return a Series/DataFrame with absolute numeric value of each element. |
add (other[, axis, level, fill_value]) |
Addition of dataframe and other, element-wise (binary operator add). |
add_prefix (prefix) |
Prefix labels with string prefix. |
add_suffix (suffix) |
Suffix labels with string suffix. |
agg (func[, axis]) |
Aggregate using one or more operations over the specified axis. |
aggregate (func[, axis]) |
Aggregate using one or more operations over the specified axis. |
align (other[, join, axis, level, copy, …]) |
Align two objects on their axes with the specified join method for each axis Index |
all ([axis, bool_only, skipna, level]) |
Return whether all elements are True, potentially over an axis. |
any ([axis, bool_only, skipna, level]) |
Return whether any element is True over requested axis. |
append (other[, ignore_index, …]) |
Append rows of other to the end of this frame, returning a new object. |
apply (func[, axis, broadcast, raw, reduce, …]) |
Apply a function along an axis of the DataFrame. |
applymap (func) |
Apply a function to a Dataframe elementwise. |
as_blocks ([copy]) |
(DEPRECATED) Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype. |
as_matrix ([columns]) |
(DEPRECATED) Convert the frame to its Numpy-array representation. |
asfreq (freq[, method, how, normalize, …]) |
Convert TimeSeries to specified frequency. |
asof (where[, subset]) |
The last row without any NaN is taken (or the last row without NaN considering only the subset of columns in the case of a DataFrame) |
assign (**kwargs) |
Assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones. |
astype (dtype[, copy, errors]) |
Cast a pandas object to a specified dtype dtype . |
at_time (time[, asof]) |
Select values at particular time of day (e.g. |
between_time (start_time, end_time[, …]) |
Select values between particular times of the day (e.g., 9:00-9:30 AM). |
bfill ([axis, inplace, limit, downcast]) |
Synonym for DataFrame.fillna(method='bfill') |
bool () |
Return the bool of a single element PandasObject. |
boxplot ([column, by, ax, fontsize, rot, …]) |
Make a box plot from DataFrame columns. |
clip ([lower, upper, axis, inplace]) |
Trim values at input threshold(s). |
clip_lower (threshold[, axis, inplace]) |
Return copy of the input with values below a threshold truncated. |
clip_upper (threshold[, axis, inplace]) |
Return copy of input with values above given value(s) truncated. |
combine (other, func[, fill_value, overwrite]) |
Add two DataFrame objects and do not propagate NaN values, so if for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well) |
combine_first (other) |
Combine two DataFrame objects and default to non-null values in frame calling the method. |
compound ([axis, skipna, level]) |
Return the compound percentage of the values for the requested axis |
consolidate ([inplace]) |
(DEPRECATED) Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). |
convert_objects ([convert_dates, …]) |
(DEPRECATED) Attempt to infer better dtype for object columns. |
copy ([deep]) |
Make a copy of this object’s indices and data. |
corr ([method, min_periods]) |
Compute pairwise correlation of columns, excluding NA/null values |
corrwith (other[, axis, drop]) |
Compute pairwise correlation between rows or columns of two DataFrame objects. |
count ([axis, level, numeric_only]) |
Count non-NA cells for each column or row. |
cov ([min_periods]) |
Compute pairwise covariance of columns, excluding NA/null values. |
cummax ([axis, skipna]) |
Return cumulative maximum over a DataFrame or Series axis. |
cummin ([axis, skipna]) |
Return cumulative minimum over a DataFrame or Series axis. |
cumprod ([axis, skipna]) |
Return cumulative product over a DataFrame or Series axis. |
cumsum ([axis, skipna]) |
Return cumulative sum over a DataFrame or Series axis. |
describe ([percentiles, include, exclude]) |
Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values. |
diff ([periods, axis]) |
First discrete difference of element. |
div (other[, axis, level, fill_value]) |
Floating division of dataframe and other, element-wise (binary operator truediv). |
divide (other[, axis, level, fill_value]) |
Floating division of dataframe and other, element-wise (binary operator truediv). |
dot (other) |
Matrix multiplication with DataFrame or Series objects. |
drop ([labels, axis, index, columns, level, …]) |
Drop specified labels from rows or columns. |
drop_duplicates ([subset, keep, inplace]) |
Return DataFrame with duplicate rows removed, optionally only considering certain columns |
dropna ([axis, how, thresh, subset, inplace]) |
Remove missing values. |
duplicated ([subset, keep]) |
Return boolean Series denoting duplicate rows, optionally only considering certain columns |
eq (other[, axis, level]) |
Wrapper for flexible comparison methods eq |
equals (other) |
Determines if two NDFrame objects contain the same elements. |
eval (expr[, inplace]) |
Evaluate a string describing operations on DataFrame columns. |
ewm ([com, span, halflife, alpha, …]) |
Provides exponential weighted functions |
expanding ([min_periods, center, axis]) |
Provides expanding transformations. |
ffill ([axis, inplace, limit, downcast]) |
Synonym for DataFrame.fillna(method='ffill') |
fillna ([value, method, axis, inplace, …]) |
Fill NA/NaN values using the specified method |
filter ([items, like, regex, axis]) |
Subset rows or columns of dataframe according to labels in the specified index. |
first (offset) |
Convenience method for subsetting initial periods of time series data based on a date offset. |
first_valid_index () |
Return index for first non-NA/null value. |
floordiv (other[, axis, level, fill_value]) |
Integer division of dataframe and other, element-wise (binary operator floordiv). |
from_csv (path[, header, sep, index_col, …]) |
(DEPRECATED) Read CSV file. |
from_dict (data[, orient, dtype, columns]) |
Construct DataFrame from dict of array-like or dicts. |
from_items (items[, columns, orient]) |
(DEPRECATED) Construct a dataframe from a list of tuples |
from_records (data[, index, exclude, …]) |
Convert structured or record ndarray to DataFrame |
ge (other[, axis, level]) |
Wrapper for flexible comparison methods ge |
get (key[, default]) |
Get item from object for given key (DataFrame column, Panel slice, etc.). |
get_dtype_counts () |
Return counts of unique dtypes in this object. |
get_ftype_counts () |
(DEPRECATED) Return counts of unique ftypes in this object. |
get_value (index, col[, takeable]) |
(DEPRECATED) Quickly retrieve single value at passed column and index |
get_values () |
Return an ndarray after converting sparse values to dense. |
groupby ([by, axis, level, as_index, sort, …]) |
Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns. |
gt (other[, axis, level]) |
Wrapper for flexible comparison methods gt |
head ([n]) |
Return the first n rows. |
hist ([column, by, grid, xlabelsize, xrot, …]) |
Make a histogram of the DataFrame’s. |
idxmax ([axis, skipna]) |
Return index of first occurrence of maximum over requested axis. |
idxmin ([axis, skipna]) |
Return index of first occurrence of minimum over requested axis. |
infer_objects () |
Attempt to infer better dtypes for object columns. |
info ([verbose, buf, max_cols, memory_usage, …]) |
Print a concise summary of a DataFrame. |
insert (loc, column, value[, allow_duplicates]) |
Insert column into DataFrame at specified location. |
interpolate ([method, axis, limit, inplace, …]) |
Interpolate values according to different methods. |
isin (values) |
Return boolean DataFrame showing whether each element in the DataFrame is contained in values. |
isna () |
Detect missing values. |
isnull () |
Detect missing values. |
items () |
Iterator over (column name, Series) pairs. |
iteritems () |
Iterator over (column name, Series) pairs. |
iterrows () |
Iterate over DataFrame rows as (index, Series) pairs. |
itertuples ([index, name]) |
Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple. |
join (other[, on, how, lsuffix, rsuffix, sort]) |
Join columns with other DataFrame either on index or on a key column. |
keys () |
Get the ‘info axis’ (see Indexing for more) |
kurt ([axis, skipna, level, numeric_only]) |
Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
kurtosis ([axis, skipna, level, numeric_only]) |
Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
last (offset) |
Convenience method for subsetting final periods of time series data based on a date offset. |
last_valid_index () |
Return index for last non-NA/null value. |
le (other[, axis, level]) |
Wrapper for flexible comparison methods le |
lookup (row_labels, col_labels) |
Label-based “fancy indexing” function for DataFrame. |
lt (other[, axis, level]) |
Wrapper for flexible comparison methods lt |
mad ([axis, skipna, level]) |
Return the mean absolute deviation of the values for the requested axis |
mask (cond[, other, inplace, axis, level, …]) |
Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other. |
max ([axis, skipna, level, numeric_only]) |
This method returns the maximum of the values in the object. |
mean ([axis, skipna, level, numeric_only]) |
Return the mean of the values for the requested axis |
median ([axis, skipna, level, numeric_only]) |
Return the median of the values for the requested axis |
melt ([id_vars, value_vars, var_name, …]) |
“Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set. |
memory_usage ([index, deep]) |
Return the memory usage of each column in bytes. |
merge (right[, how, on, left_on, right_on, …]) |
Merge DataFrame objects by performing a database-style join operation by columns or indexes. |
min ([axis, skipna, level, numeric_only]) |
This method returns the minimum of the values in the object. |
mod (other[, axis, level, fill_value]) |
Modulo of dataframe and other, element-wise (binary operator mod). |
mode ([axis, numeric_only]) |
Gets the mode(s) of each element along the axis selected. |
mul (other[, axis, level, fill_value]) |
Multiplication of dataframe and other, element-wise (binary operator mul). |
multiply (other[, axis, level, fill_value]) |
Multiplication of dataframe and other, element-wise (binary operator mul). |
ne (other[, axis, level]) |
Wrapper for flexible comparison methods ne |
nlargest (n, columns[, keep]) |
Return the first n rows ordered by columns in descending order. |
notna () |
Detect existing (non-missing) values. |
notnull () |
Detect existing (non-missing) values. |
nsmallest (n, columns[, keep]) |
Get the rows of a DataFrame sorted by the n smallest values of columns. |
nunique ([axis, dropna]) |
Return Series with number of distinct observations over requested axis. |
pct_change ([periods, fill_method, limit, freq]) |
Percentage change between the current and a prior element. |
pipe (func, *args, **kwargs) |
Apply func(self, *args, **kwargs) |
pivot ([index, columns, values]) |
Return reshaped DataFrame organized by given index / column values. |
pivot_table ([values, index, columns, …]) |
Create a spreadsheet-style pivot table as a DataFrame. |
plot |
alias of pandas.plotting._core.FramePlotMethods |
pop (item) |
Return item and drop from frame. |
pow (other[, axis, level, fill_value]) |
Exponential power of dataframe and other, element-wise (binary operator pow). |
prod ([axis, skipna, level, numeric_only, …]) |
Return the product of the values for the requested axis |
product ([axis, skipna, level, numeric_only, …]) |
Return the product of the values for the requested axis |
quantile ([q, axis, numeric_only, interpolation]) |
Return values at the given quantile over requested axis, a la numpy.percentile. |
query (expr[, inplace]) |
Query the columns of a frame with a boolean expression. |
radd (other[, axis, level, fill_value]) |
Addition of dataframe and other, element-wise (binary operator radd). |
rank ([axis, method, numeric_only, …]) |
Compute numerical data ranks (1 through n) along axis. |
rdiv (other[, axis, level, fill_value]) |
Floating division of dataframe and other, element-wise (binary operator rtruediv). |
reindex ([labels, index, columns, axis, …]) |
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. |
reindex_axis (labels[, axis, method, level, …]) |
Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. |
reindex_like (other[, method, copy, limit, …]) |
Return an object with matching indices to myself. |
rename ([mapper, index, columns, axis, copy, …]) |
Alter axes labels. |
rename_axis (mapper[, axis, copy, inplace]) |
Alter the name of the index or columns. |
reorder_levels (order[, axis]) |
Rearrange index levels using input order. |
replace ([to_replace, value, inplace, limit, …]) |
Replace values given in to_replace with value. |
resample (rule[, how, axis, fill_method, …]) |
Convenience method for frequency conversion and resampling of time series. |
reset_index ([level, drop, inplace, …]) |
For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. |
rfloordiv (other[, axis, level, fill_value]) |
Integer division of dataframe and other, element-wise (binary operator rfloordiv). |
rmod (other[, axis, level, fill_value]) |
Modulo of dataframe and other, element-wise (binary operator rmod). |
rmul (other[, axis, level, fill_value]) |
Multiplication of dataframe and other, element-wise (binary operator rmul). |
rolling (window[, min_periods, center, …]) |
Provides rolling window calculations. |
round ([decimals]) |
Round a DataFrame to a variable number of decimal places. |
rpow (other[, axis, level, fill_value]) |
Exponential power of dataframe and other, element-wise (binary operator rpow). |
rsub (other[, axis, level, fill_value]) |
Subtraction of dataframe and other, element-wise (binary operator rsub). |
rtruediv (other[, axis, level, fill_value]) |
Floating division of dataframe and other, element-wise (binary operator rtruediv). |
sample ([n, frac, replace, weights, …]) |
Return a random sample of items from an axis of object. |
select (crit[, axis]) |
(DEPRECATED) Return data corresponding to axis labels matching criteria |
select_dtypes ([include, exclude]) |
Return a subset of the DataFrame’s columns based on the column dtypes. |
sem ([axis, skipna, level, ddof, numeric_only]) |
Return unbiased standard error of the mean over requested axis. |
set_axis (labels[, axis, inplace]) |
Assign desired index to given axis. |
set_index (keys[, drop, append, inplace, …]) |
Set the DataFrame index (row labels) using one or more existing columns. |
set_value (index, col, value[, takeable]) |
(DEPRECATED) Put single value at passed column and index |
shift ([periods, freq, axis]) |
Shift index by desired number of periods with an optional time freq |
skew ([axis, skipna, level, numeric_only]) |
Return unbiased skew over requested axis Normalized by N-1 |
slice_shift ([periods, axis]) |
Equivalent to shift without copying data. |
sort_index ([axis, level, ascending, …]) |
Sort object by labels (along an axis) |
sort_values (by[, axis, ascending, inplace, …]) |
Sort by the values along either axis |
sortlevel ([level, axis, ascending, inplace, …]) |
(DEPRECATED) Sort multilevel index by chosen axis and primary level. |
squeeze ([axis]) |
Squeeze length 1 dimensions. |
stack ([level, dropna]) |
Stack the prescribed level(s) from columns to index. |
std ([axis, skipna, level, ddof, numeric_only]) |
Return sample standard deviation over requested axis. |
sub (other[, axis, level, fill_value]) |
Subtraction of dataframe and other, element-wise (binary operator sub). |
subtract (other[, axis, level, fill_value]) |
Subtraction of dataframe and other, element-wise (binary operator sub). |
sum ([axis, skipna, level, numeric_only, …]) |
Return the sum of the values for the requested axis |
swapaxes (axis1, axis2[, copy]) |
Interchange axes and swap values axes appropriately |
swaplevel ([i, j, axis]) |
Swap levels i and j in a MultiIndex on a particular axis |
tail ([n]) |
Return the last n rows. |
take (indices[, axis, convert, is_copy]) |
Return the elements in the given positional indices along an axis. |
to_clipboard ([excel, sep]) |
Copy object to the system clipboard. |
to_csv ([path_or_buf, sep, na_rep, …]) |
Write DataFrame to a comma-separated values (csv) file |
to_dense () |
Return dense representation of NDFrame (as opposed to sparse) |
to_dict ([orient, into]) |
Convert the DataFrame to a dictionary. |
to_excel (excel_writer[, sheet_name, na_rep, …]) |
Write DataFrame to an excel sheet |
to_feather (fname) |
write out the binary feather-format for DataFrames |
to_gbq (destination_table, project_id[, …]) |
Write a DataFrame to a Google BigQuery table. |
to_hdf (path_or_buf, key, **kwargs) |
Write the contained data to an HDF5 file using HDFStore. |
to_html ([buf, columns, col_space, header, …]) |
Render a DataFrame as an HTML table. |
to_json ([path_or_buf, orient, date_format, …]) |
Convert the object to a JSON string. |
to_latex ([buf, columns, col_space, header, …]) |
Render an object to a tabular environment table. |
to_msgpack ([path_or_buf, encoding]) |
msgpack (serialize) object to input file path |
to_panel () |
(DEPRECATED) Transform long (stacked) format (DataFrame) into wide (3D, Panel) format. |
to_parquet (fname[, engine, compression]) |
Write a DataFrame to the binary parquet format. |
to_period ([freq, axis, copy]) |
Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed) |
to_pickle (path[, compression, protocol]) |
Pickle (serialize) object to file. |
to_records ([index, convert_datetime64]) |
Convert DataFrame to a NumPy record array. |
to_sparse ([fill_value, kind]) |
Convert to SparseDataFrame |
to_sql (name, con[, schema, if_exists, …]) |
Write records stored in a DataFrame to a SQL database. |
to_stata (fname[, convert_dates, …]) |
Export Stata binary dta files. |
to_string ([buf, columns, col_space, header, …]) |
Render a DataFrame to a console-friendly tabular output. |
to_timestamp ([freq, how, axis, copy]) |
Cast to DatetimeIndex of timestamps, at beginning of period |
to_xarray () |
Return an xarray object from the pandas object. |
transform (func, *args, **kwargs) |
Call function producing a like-indexed NDFrame and return a NDFrame with the transformed values |
transpose (*args, **kwargs) |
Transpose index and columns. |
truediv (other[, axis, level, fill_value]) |
Floating division of dataframe and other, element-wise (binary operator truediv). |
truncate ([before, after, axis, copy]) |
Truncate a Series or DataFrame before and after some index value. |
tshift ([periods, freq, axis]) |
Shift the time index, using the index’s frequency if available. |
tz_convert (tz[, axis, level, copy]) |
Convert tz-aware axis to target time zone. |
tz_localize (tz[, axis, level, copy, ambiguous]) |
Localize tz-naive TimeSeries to target time zone. |
unstack ([level, fill_value]) |
Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. |
update (other[, join, overwrite, …]) |
Modify in place using non-NA values from another DataFrame. |
var ([axis, skipna, level, ddof, numeric_only]) |
Return unbiased variance over requested axis. |
where (cond[, other, inplace, axis, level, …]) |
Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. |
xs (key[, axis, level, drop_level]) |
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. |
2、Series:
可以将Series看成是一个定长的有序字典,因为它是索引值到数据值的一个映射。它可以用在许多原来需要字典参数的函数中。
eg:
1、判断索引:
'b' in obj2
相关函数:
初始化:
1、
obj=Series([4,7,-5,3])
obj_values = obj.values;
obj_values = obj.index;
2、
dic = {'a':'b','c':'d'}
dic_series = Series(dic)
3、category
4、常用函数:
to_numeric() (conversion to numeric dtypes) to_datetime() (conversion to datetime objects) to_timedelta() (conversion to timedelta objects)
经典案例:
>>> df = pd.DataFrame({ ... 'a': [4, 5, 6, 7], ... 'b': [10, 20, 30, 40], ... 'c': [100, 50, -30, -50] ... }) >>> df a b c 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 >>> df.loc[(df.c - 43).abs().argsort()] a b c 1 5 20 50 0 4 10 100 2 6 30 -30 3 7 40 -50