kra.process

kra.process.agg(df: DataFrame, *aggs: int | float | Decimal | date | time | datetime | timedelta | str | bool | bytes | np.ndarray[Any, Any] | list[Any] | Expr | Series | None | Iterable[int | float | Decimal | date | time | datetime | timedelta | str | bool | bytes | np.ndarray[Any, Any] | list[Any] | Expr | Series | None], **named_aggs: int | float | Decimal | date | time | datetime | timedelta | str | bool | bytes | np.ndarray[Any, Any] | list[Any] | Expr | Series | None) DataFrame

Aggreegate whole DataFrame as a single group. This is a convenient way to apply aggregation expressions to the entire DataFrame without needing to specify a group key.

Parameters:
  • df (pl.DataFrame) – The DataFrame to aggregate.

  • *aggs (IntoExpr or Iterable[IntoExpr]) – Positional aggregation expressions to apply to the DataFrame.

  • **named_aggs (IntoExpr) – Named aggregation expressions, where the key is the name of the resulting column and the value is the aggregation expression.

Returns:

Aggregated DataFrame.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> import kra
>>> df = pl.DataFrame({"group": ["A", "A", "B"], "value": [1, 2, 3]})
>>> kra.agg(df, pl.col("value").sum().alias("total_value"))
shape: (1, 1)
┌─────────────┐
│ total_value │
├─────────────┤
│ 6           │
└─────────────┘
>>> kra.agg(df, total_value=pl.col("value").sum())
shape: (1, 1)
┌─────────────┐
│ total_value │
├─────────────┤
│ 6           │
└─────────────┘
kra.process.drop_null_cols(df: DataFrame) DataFrame

Exclude columns of type Null from the DataFrame.

Returns:

DataFrame with all columns of type Null removed.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> import kra
>>> df = pl.DataFrame({"a": [1, 2], "b": [None, None]})
>>> df.drop_null_cols()
shape: (2, 1)
┌─────┐
│ a   │
├─────┤
│ 1   │
│ 2   │
└─────┘
kra.process.fork(df: DataFrame, new_dfs: list) list[DataFrame]

Fork a DataFrame into multiple new DataFrames with additional columns.

Parameters:

new_dfs (list of dict) – Each dict specifies new columns to add to a forked DataFrame.

Returns:

List of new DataFrames, each with the specified additional columns.

Return type:

list of pl.DataFrame

Examples

>>> import polars as pl
>>> import kra
>>> df = pl.DataFrame({"a": [1, 2]})
>>> forks = df.fork([{"b": [10, 20]}, {"c": [100, 200]}])
>>> for f in forks:
...     print(f)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
├─────┼─────┤
│ 1   ┆ 10  │
│ 2   ┆ 20  │
└─────┴─────┘
shape: (2, 2)
┌─────┬───────┐
│ a   ┆ c     │
├─────┼───────┤
│ 1   ┆ 100   │
│ 2   ┆ 200   │
└─────┴───────┘
kra.process.round(df: DataFrame, decimals: int = 2) DataFrame

Round all numeric columns in the DataFrame to a specified number of decimal places.

Parameters:
  • df (pl.DataFrame) – The DataFrame to round.

  • decimals (int) – The number of decimal places to round to (default is 2).

Returns:

DataFrame with all numeric columns rounded to the specified number of decimal places.

Return type:

pl.DataFrame

Examples

>>> import polars as pl
>>> import kra
>>> df = pl.DataFrame({"a": [1.234, 2.345], "b": [3.456, 4.567], "c": ["x", "y"]})
>>> kra.round(df, decimals=1)
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
├─────┼─────┼─────┤
│ 1.2 ┆ 3.5 ┆ x   │
│ 2.3 ┆ 4.6 ┆ y   │
└─────┴─────┴─────┘