Querying Commands

Introduction

To query data after importing, there is the following pattern:

mbpy <MODE> <extractor> <LOADER> [<chain1> <chain2> <chainN> …]

Extractors

The extractor's job is to load the initial set of data. In most cases, the extractor used is from-db which queries the SQLite database that contains imported datasets, however there are many others, as can be seen by running mbpy extract --help:

╭─ Extractors ───────────────────────────────────────────────────────────────────╮
│ from-csv           Define the path to csv file                                 │
│ from-excel         Define the path to csv file                                 │
│ from-excel-dl      Extract dataframes from excel downloads available in        │
│                    ManageBac.  Algorithm auto-detects columns and adds title   │
│                    column                                                      │
│ from-db-connect    use raw sql                                                 │
│ from-db            Select entities (and associated entities), fields           │
│ from-pipeline      Use pipelines as an extraction tool for more sophisticated  │
│                    data management.                                            │
│ from-api           Pass a name like students, classes, academic_info_grades    │
│ from-external-api  Call out to a list endpoint.  Default values work for OA V3 │
│                    students list endpoint. For OA, use --bearer and pass in    │
│                    the bearer via environment variable token after             │
│                    /oauth/token call.  FIXME: Add this here                    │
╰────────────────────────────────────────────────────────────────────────────────╯

Each extractor has different paramters, which can be discovered through --help:

mbpy extract from-db --help

╭─ Additional Options ───────────────────────────────────────────────────────────╮
│ *  --table   -t  (academic_terms|academic_years|classes|class_attendance_byter │
│                  m|class_attendance_bydate|class_timetable|hr_attendance_byter │
│                  m|hr_attendance_bydate|memberships|modified_since_dates|paren │
│                  ts|parentchild_relationships|school|school_subjects|students| │
│                  subject_group|tasks|task_results|teachers|term_grades|year_gr │
│                  oups)                                                         │
│                  [required]                                                    │
│    --fields  -f  Fields of the entity to include                               │
│                  (TEXT)                                                        │
│    --limit   -l  Define small number for faster dev                            │
│                  (INTEGER)                                                     │
│    --help        Show this message and exit.                                   │
╰────────────────────────────────────────────────────────────────────────────────╯

Loaders

The <LOADER> part of the sequence is the exporting functionality. It can output to screen, save a CSV, or even send an email.

mbpy extract - --help

Use the above help command to see a list of targets available.

Chains

The tail of the sequence can be "chained" together. Each of these commands accepts as input the data set from the previous command, performs some manipulation, and then outputs the resulting changed dataset.

Help for all available chain commands can be viewed using the following with the --help option.

mbpy extract - <target> --help
mbpy extract - print --help
mbpy extract - overview --help

In order to learn more about each chain command, you can use, for example with the explode chain command:

mbpy extract - print explode --help

The dash - in the commands above are there as a placeholders that stands in place of a real commands, useful when using the --help feature.

In this way, it is possible to build up a small command-line data flow.

Chain Commands List

Grouping, aggregation, and transformations

╭─ Grouping, aggregation, and transformations ───────────────────────────────────╮
│ cast                         Make the specified column the same values, but as │
│                              strings                                           │
│ collapse-to-json-column      Collapses each row to one column with key/values. │
│ columns                      Specify the order and rename columns in one       │
│                              command                                           │
│ convert-values               Map an entire column values, those not specified  │
│                              will become null.                                 │
│ dottie                       Expand the dataframe through a dot-specified path │
│                              of columns.                                       │
│ explode                      Create additional identical rows for every item   │
│                              indicated in the column provided.                 │
│ explode-all                  For any column that have at least one list item,  │
│                              expand                                            │
│ implode                      Opposite of explode.  Takes the values in column  │
│                              that will roll up to lists.  The other columns by │
│                              default take the first value.                     │
│ keep                         Delineate the columns to keep.  Easy way to       │
│                              remove unnecessary columns.                       │
│ normalize                    Convert a string column using a string mapper,    │
│                              such as str.lower()                               │
│ remove                       Deletes columns.                                  │
│ rename                       Rename columns                                    │
│ rename-prefix                Add a prefix to all of the specified columns      │
│ replace                      Replace all values to a specific value            │
│ replace-nulls-with-booleans  Replace any nulls with a boolean, for example     │
│                              False                                             │
│ replace-with-nulls           Pass in values that should be converted to None.  │
│                              Effects the entire input dataframe.               │
│ spread                       Adds additional columns for every top-level       │
│                              property found as json values.  This works by     │
│                              converting the dataframe to records, then         │
│                              flatdict which concats the properties in          │
│                              dot-seperated naming convention.                  │
│ spread-attendance            Same as spread, but with class attendance         │
│                              responses, re-shapes to avoid sparse dictionaries │
│ to-csv                       Exports the current dataframe to disk.            │
│ to-excel                     Output dataframe as an excel file.  The path has  │
│                              to end with .xlsx (due to implementation).        │
│ transpose                    Transpose the df.                                 │
│ trim-whitespace              Remove whitespace on either left or right side    │
╰────────────────────────────────────────────────────────────────────────────────╯

Merge, join, and concatenation

╭─ Merge, join, and concatenation ───────────────────────────────────────────────╮
│ add-country  Add a new column with the full country name, according to the     │
│              column specified.                                                 │
│ add-row      Manually specify a row to add                                     │
│ api-call     Use dataframe as input to build API calls.                        │
│ api-merge    Merge data from entity in an api call into the dataframe by       │
│              indicating left column(s) and right column(s) as indexes.         │
│ count        Count of how many non-null values in each column                  │
│ csv-merge    Read in a csv from a path, merge it with the dataframe.           │
│ db-merge     Attach data to current dataframe by adding in a table from the    │
│              db.                                                               │
│ split-on     Change the column to a list, splitting on delimiter provided      │
│              (comma by default)                                                │
╰────────────────────────────────────────────────────────────────────────────────╯

Selection and Filtering

╭─ Selection and Filtering ──────────────────────────────────────────────────────╮
│ drop-columns-nulls  Drop all columns that are full of nulls                    │
│ drop-duplicates     Remove duplicates as determined by columns                 │
│ drop-nulls          Filter out any rows from the input dataframe that contain  │
│                     null in the specified column.                              │
│ duplicates          Remove duplicated items from the df, either throughout or  │
│                     only those found in certain columns.                       │
│ extract-duplicates  Remove all rows except duplicated items, either throughout │
│                     or only those found in certain columns.                    │
│ filter              Select the rows which columns have the value indicated,    │
│                     using operation (by default, equals).                      │
│ filters_by          Specify multiple filters.  See query for a more pythonic   │
│                     interface.                                                 │
│ head                Transform to only the first n number of rows               │
│ is-false            Select rows where passed column are False                  │
│ is-null             Select rows where passed columns are None                  │
│ is-true             Select rows where passed columns are True                  │
│ not-null            Select rows where passed columns are not None              │
│ query               Query the df with pythonic syntax.  Column names are       │
│                     variables, use backticks for columns with spaces.  The @   │
│                     symbol has special meaning (evaluate a local variable) so  │
│                     if you are filtering by email address, use FILTER --column │
│                     email --value email@example.com instead.                   │
│ reindex             Change the indexes of the df                               │
│ reset-index         Removes all indexes.                                       │
│ sort                Change the order of the df according to the values in      │
│                     specified columns.                                         │
│ tail                Transform to only the first n number of rows               │
╰────────────────────────────────────────────────────────────────────────────────╯

Reshaping and pivoting

╭─ Reshaping and pivoting ───────────────────────────────────────────────────────╮
│ assign           Make a new column with values built from values from the row  │
│                  itself.  Useful for defining a "full name" column, for        │
│                  example.                                                      │
│ extract-on       Use regular expression named group.                           │
│ frequencies      Count of each values provided by columns.                     │
│ group-by-pivot   Group by rows and pivot on columns with values.               │
│ inc              Create a new column that is calculated from simple math on    │
│                  column of values.                                             │
│ list-to-columns  Given the column whose values are a list, create new columns  │
│                  for each item inside.  Specify the name of these new columns. │
│ pivot            Reshape to a pivot table using aggregator, including summary  │
│                  row                                                           │
│ product          Create a dataframe that includes sequence of dates            │
│ set              Create a new column with values.                              │
│ set-index        Sets the dataframes indexes to those passed                   │
╰────────────────────────────────────────────────────────────────────────────────╯

Output

╭─ Output ───────────────────────────────────────────────────────────────────────╮
│ print       Prints the current dataframe.                                      │
│ send-email  Send email with the dataframe as an attachment.  Connects to smtp  │
│             server, and thus requires credentials.                             │
│ summary     Print the datatable in a small-screen format.  Columns across      │
│             become rows (transpose), and only the first three records are      │
│             shown. Can be used to display steps.  Returns the original df; no  │
│             transform is passed to next processor.                             │
│ to-csv      Exports the current dataframe to disk.                             │
│ to-excel    Output dataframe as an excel file.  The path has to end with .xlsx │
│             (due to implementation).                                           │
╰────────────────────────────────────────────────────────────────────────────────╯

Debugging and Inspection

╭─ Debugging and Inspection ─────────────────────────────────────────────────────╮
│ confirm   Pauses and waits for the user to confim                              │
│ info      Display columns and number of null values, and by default the values │
│           of the first row.  Particularly useful to get sense of the           │
│           information.                                                         │
│ interact  Fire up python and interface with df directly.  Useful for           │
│           troubleshooting.                                                     │
╰────────────────────────────────────────────────────────────────────────────────╯

Miscellaneous

╭─ Miscellaneous ────────────────────────────────────────────────────────────────╮
│ add-rows-empty-slots  Creates a new dataframe that has a has many rows for     │
│                       every value in slots for each value in the column        │
│                       specified by key.  Rows that exist in original are then  │
│                       placed into it.                                          │
│                       New rows will be populated with values whenever there    │
│                       are unique.                                              │
│ pure-python           Define a function in code that performs the desired      │
│                       transformation.  Can only be used in pipeline context.   │
│ subprocess            Using values from the input dataframe, execute a         │
│                       subprocess for each row.                                 │
│ new-df                Create fresh new dataframe.  Use it to build report at   │
│                       end.                                                     │
│ to-dict                                                                        │
│ store                 Save the current dataframe for later.                    │
│ restore               Select the previously stored dataframe as the current    │
│                       one.                                                     │
│ load-csv              Load a csv, dropping the input dataframe                 │
╰────────────────────────────────────────────────────────────────────────────────╯

Last updated