Extracting
About Extracting
After importing, the data can be queried through the command line. For example:
mbpy EXTRACT from-users PRINT
That will produce screens that display the first 5, and last 5 student records, and every column in the student table for those records.
id role oa_id sb_id account_uid
0 10752543 Admin None None None \
1 10752544 Admin None None None
2 10752545 Advisor None None None
3 10752546 Admin None None None
4 10752547 Advisor None None None
Scroll down to see more columns for these records. (Notice the \
indicates there is wraparound.)
These commands can be used to just get data displayed, but it becomes interesting when paired with available commands to perform manipulations. For example, to find the distribution of users' domains, we could do the following:
mbpy \
EXTRACT \
from-users \
--fields email \
INFO \
--value-count domain \
extract-on \
--column email \
--pattern '^(?P<handle>.+)@(?P<domain>.+)$'
domain count percent
0 eduvo.com 332 97.076023
1 endvo.com 3 0.877193
2 gmail.com 2 0.584795
3 icio.us 1 0.292398
4 mediafire.com 1 0.292398
This school is supposed to have only eduvo.com
accounts. Looks like we have a few users with wrong email addresses! Let's find them:
mbpy \
EXTRACT \
from-users \
--fields id,email,role,archived \
PRINT \
query \
--where 'not email.str.endswith("@eduvo.com")'
Let's see what we got:
id email role archived
0 10752543 [email protected] Admin True
34 10752652 [email protected] Student False
50 10752674 [email protected] Admin True
51 10752675 [email protected] Student False
145 10947491 [email protected] Parent True
166 10947512 [email protected] Parent False
167 10947513 [email protected] Parent False
174 10947520 [email protected] Parent True
175 10947521 [email protected] Parent True
530 10947916 [email protected] Student True
531 10947917 [email protected] Student True
...
Let's limit to only those that are active (not archived), and let's give ourselves a link we can click:
mbpy \
EXTRACT \
from-users \
--fields email,profile \
--where archived b= false \
PRINT \
query \
--where 'not email.str.endswith("@eduvo.com")'
Command-click to load the url in the browser!
email profile
17 [email protected] https://demo.managebac.com/teacher/users/10752652
26 [email protected] https://demo.managebac.com/teacher/users/10752675
58 [email protected] https://demo.managebac.com/teacher/users/10947512
59 [email protected] https://demo.managebac.com/teacher/users/10947513
149 [email protected] https://demo.managebac.com/teacher/users/10947922
150 [email protected] https://demo.managebac.com/teacher/users/10947923
151 [email protected] https://demo.managebac.com/teacher/users/10947924
152 [email protected] https://demo.managebac.com/teacher/users/10947925
153 [email protected] https://demo.managebac.com/teacher/users/10947926
199 [email protected] https://demo.managebac.com/teacher/users/10947975
523 [email protected] https://demo.managebac.com/teacher/users/10948377
524 [email protected] https://demo.managebac.com/teacher/users/10948386
525 [email protected] https://demo.managebac.com/teacher/users/10948389
530 [email protected] https://demo.managebac.com/teacher/users/10948406
The structure of these commands are the following:
mbpy <MODE> <extractor/streamer> <LOADER> [<chain> <chain> ...]
<MODE>
Either EXTRACT
or STREAM
In extraction mode, it loads all the information into memory. In stream mode, it loads by chunks.
<extractor>
from-students
, from-classes
, etc.
mbpy EXTRACT --help
to find full list
<streamer>
See "Streaming" section
mbpy STREAM --help
to find full list
<LOADER>
print
or pprint
or csv
mbpy EXTRACT from-students --help
to see full list of Loaders.
<chain>
Peform manipulations after extraction.
Optional.
To learn more about how to query with extractors, continue to Querying Commands:
Querying CommandsLast updated
Was this helpful?