Using command-line tools
My team already makes fun of me about my little shell-script-crusade.
7 command-line tools for data science (7cltfds)
When coming across
7 command-line tools for data science
I liked the article. Installing all the different tools was bit of a
pain as it required python and nodejs to be setup. I hesitated to
install the tools and start playing as I didn’t have a cool idea or
problem to solve.
Leaving the comfort zone
One day, when turning to lead-qualification for our sales-team in Brazil, I
looked at a website, which offered a lot of content about online-shops.
My first approach on scraping the site for leads was building a ruby
solution was to utilize a webdriver to scrape and click. Building this with
mechanize was pretty
Then I got lost in refactoring of throw-away code because it was so
ugly. I wasn’t happy with maintaining another project to do future
lead-qualification of other sources, this smelled to me.
So I thought about how to shell-script it, now I had a reason
to come back to installing the tools mentioned in 7cltfds.
Instead of using the pagination of the site, I utilized the search as
an api to search for all shops ^^.
Although the shell-script looks like a mess, creating the shell script
line by line with temporary results in text files (a very nice
feedback-loop), it’s been straight forward to create.
Knowledge transfer of shell scripts
The best side-effect is, I’ve been able to go show the shell-script to
our business intelligence. Besides scraping future
lead-qualification sites, BI had a huge learning curve to solve
recurring tasks when generating reports from other apis learnign more
Learn more shell commands
So what are the next steps to shell-script mastery?
Get familiar with the main operators
| # the pipe
<,> # redirecting in and out
() # firing off subshells
cat # stream content
find # find files
cut # split functionality
grep # search
xargs # smash multiple lines into one and map over the elements
wc # word count
sed # gsub
awk # inline editor, crazy shit
Additional key tools
xml2json # nice to access attributes out of html documents
jq # key utility to manipulate/view json
scrape # command line scraper
json2csv # to feed graphing libraries
What to learn next
ls /usr/bin | less # yields few commands :)