Searching for "python"

python to clean data

7 Simple Python Functions to Clean Your Data

Fábio Neves  Jan 9

python

  • Merging all files from a specific folder
  • Edit every file in the same folder and re-save them again
  • Cleaning the header of your datasets
  • Split dataframe columns into two or more columns
  • Filter specific dataframe columns based on their column names
  • Calculate the number of days between two dates
  • Calculate number of weeks/months/years between two dates

++++++++++++++++
more on python in this IMS blog
http://blog.stcloudstate.edu/ims?s=python

Code in Place free Python course

Stanford University’s Computer Science department is holding a unique MOOC called ‘Code in Place.’ This is a free course to learn python. It is a live class environment and not a typical video-based curriculum. from r/programming

https://compedu.stanford.edu/codeinplace/announcement/

+++++++++++++
more on Python in this IMS blog
http://blog.stcloudstate.edu/ims?s=python

Software Carpentry Workshop at SCSU Python

Registration is now open for the workshop: 

>>>>>>>>>  https://ntmoore.github.io/2018-06-02-stcloud/ <<<<<<<<<<<<<

Syllabus:

The Unix Shell

  • Files and directories
  • History and tab completion
  • Pipes and redirection
  • Looping over files
  • Creating and running shell scripts
  • Finding things
  • Reference…

Programming in Python

  • Using libraries
  • Working with arrays
  • Reading and plotting data
  • Creating and using functions
  • Loops and conditionals
  • Defensive programming
  • Using Python from the command line
  • Reference…

@software carpentry @scsu #python getting ready w @Gaurav Vaidya and @John Liu

Posted by InforMedia Services on Saturday, June 2, 2018

 

#Python Programming from @Software Carpentry at St. Cloud State University

Posted by InforMedia Services on Saturday, June 2, 2018

https://en.wikipedia.org/wiki/GNU_nano

https://swcarpentry.github.io/shell-novice/03-create/

http://pad.software-carpentry.org/2018-06-02-stcloud

Jupyter is IDE https://en.wikipedia.org/wiki/Integrated_development_environment

https://searchcloudcomputing.techtarget.com/definition/Infrastructure-as-a-Service-IaaS

JSON file format where Jupiter data is stored. HMTL and Markdown (simplified HTML).

Panda: https://pandas.pydata.org/

React OS (JS) https://en.wikipedia.org/wiki/ReactOS

 

#git and #github

Posted by InforMedia Services on Sunday, June 3, 2018

+++++++++++++++++++
more on Software Carpentry workshops on this iMS blog
http://blog.stcloudstate.edu/ims/2017/10/26/software-carpentry-workshop/

Python or R at SCSU

Dear Colleagues,

Software Carpentry (https://software-carpentry.org/about/) is coming to SCSU campus.

Want to learn basic computer programming skills specifically tailored for academia?
Please consider a FREE two-day workshop on either on Python or on R.

Python is a programming language that is simple, easy to learn for beginners and experienced programmers, and emphasizes readability. At the same time, it comes with lots of modules and packages to add to your programs when you need more sophistication. Whether you need to perform data analysis, graphing, or develop a network application, or just want to have a nice calculator that remembers all your formulas and constants, Python can do it with elegance. https://www.python.org/about/

R (RStudio) is a language and environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques. R can produce well-designed publication-quality plots, including mathematical symbols and formulae. https://www.r-project.org/about.html

Both software packages are free and operate on MS Windows, MAC/Apple and GNU/Linux OS.
Besides seamless installation on your personal computer, you can access both software in SCSU computer labs or via SCSU AppsAnywhere.

https://appsanywhere.stcloudstate.edu/vpn/index.html

In an effort to accommodate as many faculty as possible, please indicate whether you want Python or R and check your availability using these Doodle polls:

Python

https://doodle.com/poll/fgf7mn5mze9knaps

R

https://doodle.com/poll/mzirw2nc4kfv9whs

Questions? Suggestions? Please do not hesitate to ask:

zliu@stcloudstate.edu
pmiltenoff@stcloudstate.edu

For more information:
https://blog.stcloudstate.edu/imshttp://blog.stcloudstate.edu/ims/2018/02/16/python-or-r-at-scsu/
https://www.facebook.com/InforMediaServices/
https://twitter.com/SCSUtechinstruc?lang=en  #SoftwareCarpentry

 

IPython notebook

Library Juice Academy

course_intro

I also encourage students to download and install Python on their own systems. Python is a
mature and robust language with a great many third party distributions and versions, such as Ipython.
One I recommend is Active State Python. Active State produces refined and well supported
distributions with easy to use installers. Their basic, individual distribution is free. You can find it at
http://www.activestate.com/activepython/downloads
https://host.lja-computing.net:8888/notebooks/profile_intro_programming_p1/Intro_Programming_Lesson1_pmiltenoff.ipynb
  • Integers: A signed or unsigned whole number running from -32,768 to 32,768 or from 0 to 65,535 if not signed. Integers are used anytime something needs to be counted.
  • Long Integer: Any whole number outside the above range. Python doesn’t distinguish between the two though many languages do. Practically, Python’s integers range from −2,147,483,648 to 2,147,483,648 or 0 to 0 to 4,294,967,295. Most of us will be very happy with this many whole numbers to choose from.
  • Real and Floating Point Numbers: Real numbers are signed or unsigned numbers including decimals. The numbers 2,3,4 are Integers and Real Numbers. The numbers 2.1, 2.9,3.9 are Real Numbers, but not Integers. Real Numbers can include representations of irrational numbers such as pi. Real numbers must be rational, that is a decimal number that terminates after a finite number of decimals. You will sometimes encounter the term Floating Point Numbers. This is a technical term referring to the way that large Real Numbers are represented in a computer. Python hides this detail from you so Real and Floating Point are used intercangeably in this language.
  • Binary Numbers: And Octal and Hexadecimal. These are numbers used internally by computers. You will run into these values fairly often. For instance, when you see color values in HTML such as “FFFFFF” or “0000FF”,
Hexadecimal and Octal are used because humans can read them without too much trouble and they are compromise between what computers process and what we can read. Any time you see something in Octal or Hexadecimal, you are looking at something that interfaces with the lower levels of a computer. You will most commonly use Hexadecimal numbers when dealing with Unicode character encodings. Python will interpret any number which begins with a leading zero as binary unless formatting commands have been used.
Numbers such as 7i are referred to as complex. They have a real part, the 7, and an imaginary part, i. Chance are you won’t use complex numbers unless you’re working with scientific data.
A String consists of a sequence of characters. The term String refers to how this data type is represented internally. You store text in Strings. Text can by anything, letters, words, sentences, paragraphs, numbers, just about anything.
Lists are close cousins to Strings, though you may never need to think of them that way. A list is just that, a list of things. Lists may contain any number of numbers or any number of strings. List may even contain any number of other lists. Lists are compared to arrays, but they are not the same thing. In most uses, the function the same so the difference, for our purposes, is moot. Strings are like lists in that, internally, the computer works with strings in an identical manner to lists. This is why the operations on Strings are so different from numbers.
The last main data type in the Python programming language is the dictionary. Dictionaries are map types, known in other languages as hashes, and in computer science as Associative Arrays. The best way to think of what the dictionary does is to consider a Library of Congress Call Number(something this audience is familiar with). The call number is what’s called a Key. It connects to a record which contains information about a book. The combination of keys and records, called values, comprises a dictionary. A single key will connect to a discrete group of values such as the items in this record. Dictionaries will be touched on in the next lesson in some detail in the next course. These are fairly advanced data structures and require a solid understanding a programming fundamentals in order to be used properly.

Statements, an Overview

Programs consist of statements. A statement is a unit of executable code. Think of a statement like a sentence. In a nutshell, statements are how you do things in a program. Writing a program consists of breaking down a problem you want to solve into smaller pieces that you can represent as mathematical propositions and then solve. The statement is where this process gets played out. Statements themselves consist of some number of expressions involving data. Let’s see how this works.

An expression would be something like 2+2=4. This expression, however is not a complete statements. Ask Python to evaluate it and you will get the error “SyntaxError: can’t assign to operator”. What’s going on here? Basically we didn’t provide a complete statement. If we want to see the sum of 2+2 we have to write a complete statement that tells the interpreter what to do and what to do it with. The verb here is ‘print’ and the object is ‘2+2’. Ask Python to evaluate ‘print 2+2’ and it will show ‘4’. We could also throw in subject and do something a bit more detailed: ‘Sum=2+2’. In this case we are assigning the value of 2+2 to the variable, Sum. We can then do all sorts of things with Sum. We can print it. We can add other numbers to it, hand it off to a function and so on. For instance, might want to know the root of Sum. In which case we might write something like ‘print sqrt(sum)’ which will display ‘2’.

A shell is essentially a user interface that provides you access to a system’s features. Normally, this means access to an Operating System. In cases like this, the shell provides you access to the Python programming environment.

Anything preceed by a “#” is not interpreted or executed by the programming shell. Comments are used widely to document programs. One school of programming holds that code should be so clear that comments are uncessary.

Operations on Numbers

Expressions are discrete statements in programming that do something. They typically occupy one line of code, though programmers will sometimes squeeze more in. This is generally bad form and can really make your program a mess. Expressions consist of operations and data or rather data and operations on them. So, what can you do with numbers? Here is a concise list of the basic operations for integers and real numbers of all types:

Arithemetic:

  • Addition: z= x + y
  • Subtraction: z = x – y
  • Multiplication: z = x * y. Here the asterisk serves as the ‘X’ multiplication symbol from grade school.
  • Division: z = x/y. Division.
  • Exponents: z = x ** y or xy, x to the y power.

Operations have an order of precedence which follows the algebraic order of precedence. The order can be remembered by the old Algebra mnenomic, Please Excuse My Dear Aunt Sally which is remeinds you that the order of operations is:

  1. Parentheses
  2. Exponents
  3. Multiplication
  4. Division
  5. Addition
  6. Subtraction

Operations on Strings

Strings are strange creatures as I’ve noted before. They have their own operations and the arithmetic operations you saw earlier don’t behave the same way with strings.

Putting Expressions Together to Make Statements

As I noted earlier, all computer languages, and natural languages, possess pragmatics, larger scale structures which reduce ambiguity by providing context. This is a fancy way of saying just as sentences posses rules of syntax to make able to be comprehended, larger documents have similar rules. Computer Programs are no different. Here’s a break down of the structure of programs in Python, in a general sense.

  1. Programs consist of one or more modules.
  2. Modules consist of one or more statements.
  3. Statements consist of one or more expressions.
  4. Expressions create and/or manipulate objects(and variables of all kinds).

Modules and Programs are for the next class in the series, though we will survey these larger structures next lesson. For now, we’ll focus on statements and expressions. Actually, we’ve already started with expressions above. In Python, statements can do three things.

  • Assign a variable
  • Change a variable
  • Take an action

Variable Names and Reserved Words

Now that we’ve seen some variable assignments, let’s talk about best practices. First off, aside from reserved words, variable names can be almost any combination of letters, numbers and punctuation marks. You, however, should never ever, use the following punctuation marks in variable names:

      • +
      • !
      • @
      • ^
      • %
      • (
      • )
      • .
      • ?
      • /
      • :
      • ;

*

These punctuation marks tends to be operators and characters that have special meanings in most computer languages. The other issue is reserved words. What are “reserved words”? They are words that Python interprets as commands. Pythons reservers the following words.:

  • True: A special value set aside for boolean values
  • False: The other special value set aside for boolean vaules
  • None: The logical equivalent of 0
  • and: a way of combining logical conditions
  • as: describes how modules are imported
  • assert: a way of forcing something to take on a certain value. Used in debugging of large programs
  • break: breaks out of a loop and goes on with the rest of the program
  • class: declares a class for object oriented design. For now, just remember not to use this variable name
  • continue: returns to the top of the loop and keeps on going again
  • def: declares functions which allow you to modularize your code.
  • elif: else if, a cotnrol structure we’ll see next lesson
  • else: as above
  • except: another control structure
  • finally: a loop control structure
  • for: a loop control structure
  • from: used to import modules
  • global: a scoping statement
  • if: a control structure/li>
  • in: used in for each loops
  • is: a logical operator
  • lamda: like def, but weird. It defines a function in a single line. I will not teach this becuase it is icky. If you ever learn Perl you will see this sort of thing a lot and you will hate it, but that’s just my personal opinion.
  • nonlocal: a scoping command
  • not: a logical operator
  • or: another logical operator
  • pass: does nothing. Used as placeholder
  • raise: raises an error. This is used to write custom error messages. Your programs may have conditions which would be considered invalid based on our business situation. The interpreter may not consider them errors, but you might not want your user to do something so you ‘raise’ an exception and stop the program.
  • return: tells a function to return a value
  • try: this is part of an error testing statement
  • while: starts a while loop
  • with: a context manager. This will be covered in the course after the next one in this series
  • yield: works like return
Variable names should be meaningful. Let’s say I have to track a person’s driver license number. explanatory names like ‘driverLicenseNumber’.

  • Use case to make your variable names readable. Python is case sensitive, meaning a variable named ‘cat’ is different from named ‘Cat’. If you use more than one word to name variable, start of lower case the change case on the second word. For instance “bigCats = [‘Tiger’,’Lion’,’Cougar’, ‘Desmond’]”. The common practice used by programmers in many settings is that variables start with lowercase and functions(methods and so on) start with upper case. This is called “Camel Case” for its lumpy, the humpy appearance. Now, as it happens, there is something of a religious debate over this. Many Python programmers prefer to keep everything lower case and join words in a name by underscores such as “big_cats”. Use whichever is easiest or looks the nicest to you.
  • Variable names should be unique. Do not reuse names. This will cause confusion later on.
  • Python conventions. Python, as with any other programming language, has culture built up around it. That means there are some conventions surrounding variable naming. Two leading underscores, __X, denote system variables which have special meaning to the interpreter. So avoid using this for your own variables. There may be a time and place, but that’s for an advanced prorgramming course. A single underscore _X indicates to other programmers that this a fundamental variable and that they mess with it at their own peril.
  • Avoid starting variable names with a number. This may or may not return an error. It can also mislead anyone reading your program.
  • “A foolish consistency is the hobgoblin of little minds”. But not to programming minds. Consistency helps the readability of code a great deal. Once you start a system, stick with it.

Statement Syntax

Putting together valid statements can be a little hard at first. There’s a grammar to them. Thus far, we’ve mainly been workign with expressions such as “x = x+1”. You can think of expression as nouns. We’ve clearly defined x, but how do we look inside? For that we need to give it a verb, the print command. We would then write “print x”. However we can skip the middle statement and print an expression such as “print x + 1”. The interpreter evaluates this per the order of operations I laid out earlier. However, once that expression is evaluated, it then applies the verb, “print”, to that expression.

Print is a function that comes with the Python distribution. There are many more and you can create your own. We’ll cover that a bit in next lesson. Let’s look at little more at the grammar of a statement. Consider:

x = sin(b)

Assume that b has been defined elsewhere. x is the subject, b is the object and sin is the verb. Python will go to the right side of the equal sign first. It will then go to the inside of the function and evaluate what’s there first. It then evaluates the value of the function and finishes by setting x to that value. What about something like this?

x=sin(x+3/y)

Python evaluates from the inside out according to the rules of operation. Very complex statements can be built up this way.

x = sin(log((x + 3)/(e**2)))
Regardless of what this expression evaluates to (I don’t actually know), Python starts with the innermost parentheses, then works through the value of e squared then adds 3 to x and divides the result by e squared. With that worked out, it takes the logarithm of the result and takessthe sine of that before setting x to the final result.What you cannot do is execute more than one statement on a line. No more than one verb on a line. In this context, a verb is an assignment, or a command acting on an expression
markdown cell
code cell

Call up your copy of Think Python or go to the website at http://www.greenteapress.com/thinkpython/html/. Read Chapter 2. This will reiterate much of what I’ve presnted here, but this will help cement the content into you minds. Skip section 2.6 because IPython treats everything as script mode. IPyton provides you with the illusion of interactive, but everything happens asynchronously. This means that any action you type in will not instantaneously resolve as it would if you were running Python interactively on your computer. You will have to use print statements to see the results of your work.

Your assignment consists of the following:

  • Exercise 1 from Chapter 2 of Think Python. If you type an integer with a leading zero, you might get a confusing error:
    <<< zipcode = 02492

    SyntaxError: invalid token
    Other numbers seem to work, but the results are bizarre:
    <<< zipcode = 02132
    <<< zipcode
    1114
    Can you figure out what is going on? Hint: display the values 01, 010, 0100 and 01000.

  • Exercise 3 from Chapter 2 of Think Python.Assume that we execute the following assignment statements:
    width = 17
    height = 12.0
    delimiter = ‘.’
    For each of the following expressions, write the value of the expression and the type (of the value of the expression).

    width/2
    width/2.0
    height/3
    1 + 2 5
    delimiter
    5

  • Exercise 4 from Capter 2 of Think Python. Practice using the Python interpreter as a calculator:
    1. The volume of a sphere with radius r is 4/3 π r3. What is the volume of a sphere with radius 5? Hint: 392.7 is wrong!
    2. Suppose the cover price of a book is $24.95, but bookstores get a 40% discount. Shipping costs $3 for the first copy and 75 cents for each additional copy. What is the total wholesale cost for 60 copies?
    3/ If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at easy pace again, what time do I get home for breakfast?

In your IPython notebook Create a markdown cell and write up your exercise in there. Just copy it from the textbook or from the above write up. Next ceate a code cell and do your work in there. Please, comment your work thoroughly. You cannot provide too many comments. Use print statements to see the outcome of your work.

Python as the programming language at SCSU

 

From: scsu-announce-bounces@lists.stcloudstate.edu [mailto:scsu-announce-bounces@lists.stcloudstate.edu] On Behalf Of Rysavy, Sr. Del Marie
Sent: Tuesday, November 12, 2013 12:50 PM
To: scsu-announce@stcloudstate.edu
Subject: [SCSU-announce] course in programming for beginners

Our beginning programming course, CNA 267, is now using Python as the programming language.  Students learn to work with decision and loop control structures, variables, lists (arrays) and procedures, etc.  Python is becoming one of the most widely-accepted languages for business professionals and scientists.
Please inform your students (who need to learn programming) of this course.  It is being offered during spring semester, as well as next fall.

Sr. Del Marie Rysavy

ECC 254

CSIT Department

telephone: 308-4929

Library Technology Conference 2019

#LTC2019

Intro to XR in Libraries from Plamen Miltenoff

keynote: equitable access to information

keynote spaker

https://sched.co/JAqk
the type of data: wikipedia. the dangers of learning from wikipedia. how individuals can organize mitigate some of these dangers. wikidata, algorithms.
IBM Watson is using wikipedia by algorythms making sense, AI system
youtube videos debunked of conspiracy theories by using wikipedia.

semantic relatedness, Word2Vec
how does algorithms work: large body of unstructured text. picks specific words

lots of AI learns about the world from wikipedia. the neutral point of view policy. WIkipedia asks editors present as proportionally as possible. Wikipedia biases: 1. gender bias (only 20-30 % are women).

conceptnet. debias along different demographic dimensions.

citations analysis gives also an idea about biases. localness of sources cited in spatial articles. structural biases.

geolocation on Twitter by County. predicting the people living in urban areas. FB wants to push more local news.

danger (biases) #3. wikipedia search results vs wkipedia knowledge panel.

collective action against tech: Reddit, boycott for FB and Instagram.

Mechanical Turk https://www.mturk.com/  algorithmic / human intersection

data labor: what the primary resources this companies have. posts, images, reviews etc.

boycott, data strike (data not being available for algorithms in the future). GDPR in EU – all historical data is like the CA Consumer Privacy Act. One can do data strike without data boycott. general vs homogeneous (group with shared identity) boycott.

the wikipedia SPAM policy is obstructing new editors and that hit communities such as women.

++++++++++++++++++

Twitter and Other Social Media: Supporting New Types of Research Materials

https://sched.co/JAWp

Nancy Herther Cody Hennesy

http://z.umn.edu/

twitter librarieshow to access at different levels. methods and methodological concerns. ethical concerns, legal concerns,

tweetdeck for advanced Twitter searches. quoting, likes is relevant, but not enough, sometimes screenshot

engagement option

social listening platforms: crimson hexagon, parsely, sysomos – not yet academic platforms, tools to setup queries and visualization, but difficult to algorythm, the data samples etc. open sources tools (Urbana, Social Media microscope: SMILE (social media intelligence and learning environment) to collect data from twitter, reddit and within the platform they can query Twitter. create trend analysis, sentiment analysis, Voxgov (subscription service: analyzing political social media)

graduate level and faculty research: accessing SM large scale data web scraping & APIs Twitter APIs. Jason script, Python etc. Gnip Firehose API ($) ; Web SCraper Chrome plugin (easy tool, Pyhon and R created); Twint (Twitter scraper)

Facepager (open source) if not Python or R coder. structure and download the data sets.

TAGS archiving google sheets, uses twitter API. anything older 7 days not avaialble, so harvest every week.

social feed manager (GWUniversity) – Justin Litman with Stanford. Install on server but allows much more.

legal concerns: copyright (public info, but not beyond copyrighted). fair use argument is strong, but cannot publish the data. can analyize under fair use. contracts supercede copyright (terms of service/use) licensed data through library.

methods: sampling concerns tufekci, 2014 questions for sm. SM data is a good set for SM, but other fields? not according to her. hashtag studies: self selection bias. twitter as a model organism: over-represnted data in academic studies.

methodological concerns: scope of access – lack of historical data. mechanics of platform and contenxt: retweets are not necessarily endorsements.

ethical concerns. public info – IRB no informed consent. the right to be forgotten. anonymized data is often still traceable.

table discussion: digital humanities, journalism interested, but too narrow. tools are still difficult to find an operate. context of the visuals. how to spread around variety of majors and classes. controversial events more likely to be deleted.

takedowns, lies and corrosion: what is a librarian to do: trolls, takedown,

++++++++++++++vr in library

Crague Cook, Jay Ray

the pilot process. 2017. 3D printing, approaching and assessing success or failure.  https://collegepilot.wiscweb.wisc.edu/

development kit circulation. familiarity with the Oculus Rift resulted in lesser reservation. Downturn also.

An experience station. clean up free apps.

question: spherical video, video 360.

safety issues: policies? instructional perspective: curating,WI people: user testing. touch controllers more intuitive then xbox controller. Retail Oculus Rift

app Scatchfab. 3modelviewer. obj or sdl file. Medium, Tiltbrush.

College of Liberal Arts at the U has their VR, 3D print set up.
Penn State (Paul, librarian, kiniseology, anatomy programs), Information Science and Technology. immersive experiences lab for video 360.

CALIPHA part of it is xrlibraries. libraries equal education. content provider LifeLiqe STEM library of AR and VR objects. https://www.lifeliqe.com/

+++++++++++++++++

Access for All:

https://sched.co/JAXn

accessibilityLeah Root

bloat code (e.g. cleaning up MS Word code)

ILLiad Doctype and Language declaration helps people with disabilities.

https://24ways.org/

 

+++++++++++++++++++

A Seat at the Table: Embedding the Library in Curriculum Development

https://sched.co/JAY5

embedded librarianembed library resources.

libraians, IT staff, IDs. help faculty with course design, primarily online, master courses. Concordia is GROWING, mostly because of online students.

solve issues (putting down fires, such as “gradebook” on BB). Librarians : research and resources experts. Librarians helping with LMS. Broadening definition of Library as support hub.

1 2 3