Software Carpentry Workshop
Minnesota State University Moorhead – Software Carpentry Workshop
Reservation code: 680510823 Reservation for: Plamen Miltenoff
Hagen Hall – 600 11th St S – Room 207 – Moorhead
pad.software-carpentry.org/2017-10-27-Moorhead
http://www.datacarpentry.org/lessons/
https://software-carpentry.org/lessons/
++++++++++++++++
Friday
Jeff – certified Bash Python, John
https://ntmoore.github.io/2017-10-27-Moorhead/
what is shall and what does it do. language close to computers, fast.
what is “bash” . cd, ls
shell job is a translator between the binory code, the middle name. several types of shells, with slight differences. one natively installed on MAC and Unix. born-again shell
bash commands: cd change director, ls – list; ls -F if it does not work: man ls (manual for LS); colon lower left corner tells you can scrool; q for escape; ls -ltr
arguments is colloquially used with different names. options, flags, parameters
cd .. – move up one directory . pwd : see the content cd data_shell/ – go down one directory
cd ~ – brings me al the way up . $HOME (universally defined variable
the default behavior of cd is to bring to home directory.
the core shall commands accept the same shell commands (letters)
$ du -h . gives me the size of the files. ctrl C to stop
$ clear . – clear the entire screen, scroll up to go back to previous command
man history $ history $! pwd (to go to pwd . $ history | grep history (piping)
$ cat (and the file name) – standard output
$ cat ../
+++++++++++++++
how to edit and delete files
to create new folder: $ mkdir . – make directory
text editors – nano, vim (UNIX text editors) . $ nano draft.txt . ctrl O (save) ctr X (exit) .
$ vim . shift esc (key) and in command line – wq (write quit) or just “q”
$ mv draft.txt ../data . (move files)
to remove $ rm thesis/: $ man rm
copy files $cp $ touch . (touches the file, creates if new)
remove $ rm . anything PSEUDO is dangerous Bash profile: cp -i
*- wild card, truncate $ ls analyzed (list of the analyized directory)
stackoverflow web site .
+++++++++++++++++
head command . $head basilisk.day (check only the first several lines of a large file
$ for filename in basilisk.dat unicorn.dat . (making a loop = multiline)
> do (expecting an action) do
> head -n 3 $filename . (3 is for the first three line of the file to be displayed and -n is for the number)
> done
for doing repetitive functions
also
$ for filename in *.dat ; do head -n 3$x; done
$ for filename in *.dat ; do echo $filename do head -n 3$x; done
$ echo $filename (print statement)
how to loop
$ for filename in *.dat ; do echo $filename ; echo head -n 3 $filename ; done
ctrl c or apple comd dot to get out of the loop
http://swcarpentry.github.io/shell-novice/02-filedir/
also
$ for filename in *.dat
> do
> $filename
> head -n 10 (first ten files ) $filename | tail -n 20 (last twenty lines)
$ for filename in *.dat
do
>> echo $filename
>> done
$ for filename in *.dat
>> do
>> cp $filename orig_$filename
>>done\
history > something else
$ head something.else
+++++++++++++
another function: word count
$ wc *.pdb (protein databank)
$ head cubane.pdb
if i don;t know how to read the outpun $ man wc
the difference between “*” and “?”
$ wc -l *.pdb
$
wc -l *.pdb > lenghts.txs
$ cat lenghts.txt
$ for fil in *.txt
>>> do
>>> wc -l $fil
by putting a $ sign use that not the actual text.
++++++++++++
$ nano middle.sh . The entire point of shell is to automate
$ bash (exectubale) to run the program middle.sh
rwx – rwx – rwx . (owner – group -anybody)
bash middle.sh
$ file middle.sh
$path .
$ echo $PATH | tr “:” “\n”
/usr/local/bin
/usr/bin
/bin
/usr/sbin
/sbin
/Applications/VMware Fusion.app/Contents/Public
/usr/local/munki
$ export PATH=$PWD:$PATH
(this is to make sure that the last version of Python is running)
$ ls ~ . (hidden files)
$ ls -a ~
$ touch .bach_profile .bashrc
$history | grep PATH
19 echo $PATH
44 echo #PATH | tr “:” “\n”
45 echo $PATH | tr “:” “\n”
46 export PATH=$PWD:$PATH
47 echo #PATH | tr “:” “\n”
48 echo #PATH | tr “:” “\n”
55 history | grep PATH
wc -l “$@” | sort -n ($@ – encompasses eerything. will process every single file in the list of files
$ chmod (make it executable)
$ find . -type d . (find only directories, recursively, )
$ find . -type f (files, instead of directories)
$ find . -name ‘*.txt’ . (find files by name, don’t forget single quotes)
$ wc -l $(find . -name ‘*.txt’) – when searching among direcories on different level
$ find . -name ‘*.txt’ | xargs wc -l – same as above ; two ways to do one and the same
+++++++++++++++++++
Saturday
Python
C and C++. scripting purposes in microbiology (instructor). libraries, packages alongside Python, which can extend its functionality. numpy and scipy (numeric and science python). Python for academic libraries?
going out of python $ quit () . python expect beginning and end parenthesis
new terminal needed after installation. anaconda 5.0.1
python 3 is complete redesign, not only an update.
http://swcarpentry.github.io/python-novice-gapminder/setup/
jupyter crashes in safari. open in chrome. spg engine maybe
https://swcarpentry.github.io/python-novice-gapminder/01-run-quit/
to start python in the terminal $ python
>> variable = 3
>> variable +10
several data types.
stored in JSON format.
command vs edit code. code cell is the gray box. a text cell is plain text
markdown syntax. format working with git and github . search explanation in https://swcarpentry.github.io/python-novice-gapminder/01-run-quit/
hackMD https://hackmd.io/ (use your GIthub account)
PANDOC – translates different data formats. https://pandoc.org/
print is a function
in what cases i will run my data trough Python instead of SPSS?
python is a 0 based language. starts counting with 0 – Java, C, P
atom_name = ‘helium ‘
print(atom_name[0]) string slicing and indexing is tricky
print(atom_name[0:6])
print(atom_name[7]) python does not know how to slice it
print(atom_name[::-1])
muillyreb muihtil muileh
len (atom_name) 6 . case sensitive
method applied it is an attribute to data that already exists. – difference from function
/Users/plamen_local/anaconda3/lib/python3.6/site-packages/pandas/__init__.py
import pandas
data = pandas.read_csv(‘/Users/plamen_local/Desktop/data/gapminder_gdp_oceania.csv’ , index_col=’country’)
data.loc[‘Australia’].plot()
plt.xticks(rotation=10)
GD plot 2 is the most well known library.
xelatex is a PDF engine. reST restructured text like Markdown. google what is the best PDF engine with Jupyter
four loops . any computer language will have the concept of “for” loop. In Python: 1. whenever we create a “for” loop, that line must end with a single colon
2. indentation. any “if” statement in the “for” loop, gets indented