Querying Commands
Introduction
To query data after importing, there is the following pattern:
mbpy <MODE> <extractor> <LOADER> [<chain1> <chain2> <chainN> …]
Extractors
The extractor's job is to load the initial set of data. In most cases, the extractor used is from-db
which queries the SQLite database that contains imported datasets, however there are many others, as can be seen by running mbpy extract --help
:
â•â”€ Extractors ───────────────────────────────────────────────────────────────────╮
│ from-csv Define the path to csv file │
│ from-excel Define the path to csv file │
│ from-excel-dl Extract dataframes from excel downloads available in │
│ ManageBac. Algorithm auto-detects columns and adds title │
│ column │
│ from-db-connect use raw sql │
│ from-db Select entities (and associated entities), fields │
│ from-pipeline Use pipelines as an extraction tool for more sophisticated │
│ data management. │
│ from-api Pass a name like students, classes, academic_info_grades │
│ from-external-api Call out to a list endpoint. Default values work for OA V3 │
│ students list endpoint. For OA, use --bearer and pass in │
│ the bearer via environment variable token after │
│ /oauth/token call. FIXME: Add this here │
╰────────────────────────────────────────────────────────────────────────────────╯
Each extractor has different paramters, which can be discovered through --help
:
mbpy extract from-db --help
â•â”€ Additional Options ───────────────────────────────────────────────────────────╮
│ * --table -t (academic_terms|academic_years|classes|class_attendance_byter │
│ m|class_attendance_bydate|class_timetable|hr_attendance_byter │
│ m|hr_attendance_bydate|memberships|modified_since_dates|paren │
│ ts|parentchild_relationships|school|school_subjects|students| │
│ subject_group|tasks|task_results|teachers|term_grades|year_gr │
│ oups) │
│ [required] │
│ --fields -f Fields of the entity to include │
│ (TEXT) │
│ --limit -l Define small number for faster dev │
│ (INTEGER) │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────╯
Loaders
The <LOADER>
part of the sequence is the exporting functionality. It can output to screen, save a CSV, or even send an email.
mbpy extract - --help
Use the above help command to see a list of targets available.
Chains
The tail of the sequence can be "chained" together. Each of these commands accepts as input the data set from the previous command, performs some manipulation, and then outputs the resulting changed dataset.
Help for all available chain commands can be viewed using the following with the --help
option.
mbpy extract - <target> --help
mbpy extract - print --help
mbpy extract - overview --help
In order to learn more about each chain command, you can use, for example with the explode
chain command:
mbpy extract - print explode --help
The dash -
in the commands above are there as a placeholders that stands in place of a real commands, useful when using the --help
feature.
In this way, it is possible to build up a small command-line data flow.
Chain Commands List
Grouping, aggregation, and transformations
â•â”€ Grouping, aggregation, and transformations ───────────────────────────────────╮
│ cast Make the specified column the same values, but as │
│ strings │
│ collapse-to-json-column Collapses each row to one column with key/values. │
│ columns Specify the order and rename columns in one │
│ command │
│ convert-values Map an entire column values, those not specified │
│ will become null. │
│ dottie Expand the dataframe through a dot-specified path │
│ of columns. │
│ explode Create additional identical rows for every item │
│ indicated in the column provided. │
│ explode-all For any column that have at least one list item, │
│ expand │
│ implode Opposite of explode. Takes the values in column │
│ that will roll up to lists. The other columns by │
│ default take the first value. │
│ keep Delineate the columns to keep. Easy way to │
│ remove unnecessary columns. │
│ normalize Convert a string column using a string mapper, │
│ such as str.lower() │
│ remove Deletes columns. │
│ rename Rename columns │
│ rename-prefix Add a prefix to all of the specified columns │
│ replace Replace all values to a specific value │
│ replace-nulls-with-booleans Replace any nulls with a boolean, for example │
│ False │
│ replace-with-nulls Pass in values that should be converted to None. │
│ Effects the entire input dataframe. │
│ spread Adds additional columns for every top-level │
│ property found as json values. This works by │
│ converting the dataframe to records, then │
│ flatdict which concats the properties in │
│ dot-seperated naming convention. │
│ spread-attendance Same as spread, but with class attendance │
│ responses, re-shapes to avoid sparse dictionaries │
│ to-csv Exports the current dataframe to disk. │
│ to-excel Output dataframe as an excel file. The path has │
│ to end with .xlsx (due to implementation). │
│ transpose Transpose the df. │
│ trim-whitespace Remove whitespace on either left or right side │
╰────────────────────────────────────────────────────────────────────────────────╯
Merge, join, and concatenation
â•â”€ Merge, join, and concatenation ───────────────────────────────────────────────╮
│ add-country Add a new column with the full country name, according to the │
│ column specified. │
│ add-row Manually specify a row to add │
│ api-call Use dataframe as input to build API calls. │
│ api-merge Merge data from entity in an api call into the dataframe by │
│ indicating left column(s) and right column(s) as indexes. │
│ count Count of how many non-null values in each column │
│ csv-merge Read in a csv from a path, merge it with the dataframe. │
│ db-merge Attach data to current dataframe by adding in a table from the │
│ db. │
│ split-on Change the column to a list, splitting on delimiter provided │
│ (comma by default) │
╰────────────────────────────────────────────────────────────────────────────────╯
Selection and Filtering
â•â”€ Selection and Filtering ──────────────────────────────────────────────────────╮
│ drop-columns-nulls Drop all columns that are full of nulls │
│ drop-duplicates Remove duplicates as determined by columns │
│ drop-nulls Filter out any rows from the input dataframe that contain │
│ null in the specified column. │
│ duplicates Remove duplicated items from the df, either throughout or │
│ only those found in certain columns. │
│ extract-duplicates Remove all rows except duplicated items, either throughout │
│ or only those found in certain columns. │
│ filter Select the rows which columns have the value indicated, │
│ using operation (by default, equals). │
│ filters_by Specify multiple filters. See query for a more pythonic │
│ interface. │
│ head Transform to only the first n number of rows │
│ is-false Select rows where passed column are False │
│ is-null Select rows where passed columns are None │
│ is-true Select rows where passed columns are True │
│ not-null Select rows where passed columns are not None │
│ query Query the df with pythonic syntax. Column names are │
│ variables, use backticks for columns with spaces. The @ │
│ symbol has special meaning (evaluate a local variable) so │
│ if you are filtering by email address, use FILTER --column │
│ email --value email@example.com instead. │
│ reindex Change the indexes of the df │
│ reset-index Removes all indexes. │
│ sort Change the order of the df according to the values in │
│ specified columns. │
│ tail Transform to only the first n number of rows │
╰────────────────────────────────────────────────────────────────────────────────╯
Reshaping and pivoting
â•â”€ Reshaping and pivoting ───────────────────────────────────────────────────────╮
│ assign Make a new column with values built from values from the row │
│ itself. Useful for defining a "full name" column, for │
│ example. │
│ extract-on Use regular expression named group. │
│ frequencies Count of each values provided by columns. │
│ group-by-pivot Group by rows and pivot on columns with values. │
│ inc Create a new column that is calculated from simple math on │
│ column of values. │
│ list-to-columns Given the column whose values are a list, create new columns │
│ for each item inside. Specify the name of these new columns. │
│ pivot Reshape to a pivot table using aggregator, including summary │
│ row │
│ product Create a dataframe that includes sequence of dates │
│ set Create a new column with values. │
│ set-index Sets the dataframes indexes to those passed │
╰────────────────────────────────────────────────────────────────────────────────╯
Output
â•â”€ Output ───────────────────────────────────────────────────────────────────────╮
│ print Prints the current dataframe. │
│ send-email Send email with the dataframe as an attachment. Connects to smtp │
│ server, and thus requires credentials. │
│ summary Print the datatable in a small-screen format. Columns across │
│ become rows (transpose), and only the first three records are │
│ shown. Can be used to display steps. Returns the original df; no │
│ transform is passed to next processor. │
│ to-csv Exports the current dataframe to disk. │
│ to-excel Output dataframe as an excel file. The path has to end with .xlsx │
│ (due to implementation). │
╰────────────────────────────────────────────────────────────────────────────────╯
Debugging and Inspection
â•â”€ Debugging and Inspection ─────────────────────────────────────────────────────╮
│ confirm Pauses and waits for the user to confim │
│ info Display columns and number of null values, and by default the values │
│ of the first row. Particularly useful to get sense of the │
│ information. │
│ interact Fire up python and interface with df directly. Useful for │
│ troubleshooting. │
╰────────────────────────────────────────────────────────────────────────────────╯
Miscellaneous
â•â”€ Miscellaneous ────────────────────────────────────────────────────────────────╮
│ add-rows-empty-slots Creates a new dataframe that has a has many rows for │
│ every value in slots for each value in the column │
│ specified by key. Rows that exist in original are then │
│ placed into it. │
│ New rows will be populated with values whenever there │
│ are unique. │
│ pure-python Define a function in code that performs the desired │
│ transformation. Can only be used in pipeline context. │
│ subprocess Using values from the input dataframe, execute a │
│ subprocess for each row. │
│ new-df Create fresh new dataframe. Use it to build report at │
│ end. │
│ to-dict │
│ store Save the current dataframe for later. │
│ restore Select the previously stored dataframe as the current │
│ one. │
│ load-csv Load a csv, dropping the input dataframe │
╰────────────────────────────────────────────────────────────────────────────────╯
Last updated