Posts Tagged ‘Python’

python to clean data

7 Simple Python Functions to Clean Your Data

Fábio Neves  Jan 9

python

  • Merging all files from a specific folder
  • Edit every file in the same folder and re-save them again
  • Cleaning the header of your datasets
  • Split dataframe columns into two or more columns
  • Filter specific dataframe columns based on their column names
  • Calculate the number of days between two dates
  • Calculate number of weeks/months/years between two dates

++++++++++++++++
more on python in this IMS blog
https://blog.stcloudstate.edu/ims?s=python

free classes

View this post on Instagram

Skill acquisition to a person is the equivalent of a system update for a device. You always come out of it more performant. Google 'Online Learning at Harvard' to see their full catalog of all 62 free online classes they offer. Each online class range between 6-8 weeks in length, so you'll be really going in-depth on the topic you choose to learn. Let's seize this opportunity and come out of this quarantine as an upgraded person. We got this! #skills #skill #programming #python #java #harvard #programmer #machinelearning #ai #artificialintelligence

A post shared by WORLDWIDE ENGINEERING 🌍 (@worldwide_engineering) on

https://online-learning.harvard.edu/catalog/free

Harvard Free Classes

Code in Place free Python course

Stanford University’s Computer Science department is holding a unique MOOC called ‘Code in Place.’ This is a free course to learn python. It is a live class environment and not a typical video-based curriculum. from r/programming

https://compedu.stanford.edu/codeinplace/announcement/

+++++++++++++
more on Python in this IMS blog
https://blog.stcloudstate.edu/ims?s=python

Software Carpentry Workshop at SCSU Python

Registration is now open for the workshop: 

>>>>>>>>>  https://ntmoore.github.io/2018-06-02-stcloud/ <<<<<<<<<<<<<

Syllabus:

The Unix Shell

  • Files and directories
  • History and tab completion
  • Pipes and redirection
  • Looping over files
  • Creating and running shell scripts
  • Finding things
  • Reference…

Programming in Python

  • Using libraries
  • Working with arrays
  • Reading and plotting data
  • Creating and using functions
  • Loops and conditionals
  • Defensive programming
  • Using Python from the command line
  • Reference…

 

https://en.wikipedia.org/wiki/GNU_nano

https://swcarpentry.github.io/shell-novice/03-create/

http://pad.software-carpentry.org/2018-06-02-stcloud

Jupyter is IDE https://en.wikipedia.org/wiki/Integrated_development_environment

https://searchcloudcomputing.techtarget.com/definition/Infrastructure-as-a-Service-IaaS

JSON file format where Jupiter data is stored. HMTL and Markdown (simplified HTML).

Panda: https://pandas.pydata.org/

React OS (JS) https://en.wikipedia.org/wiki/ReactOS

 

+++++++++++++++++++
more on Software Carpentry workshops on this iMS blog
https://blog.stcloudstate.edu/ims/2017/10/26/software-carpentry-workshop/

Reproducibility Librarian

Reproducibility Librarian? Yes, That Should Be Your Next Job

https://www.jove.com/blog/2017/10/27/reproducibility-librarian-yes-that-should-be-your-next-job/
Vicky Steeves (@VickySteeves) is the first Research Data Management and Reproducibility Librarian
Reproducibility is made so much more challenging because of computers, and the dominance of closed-source operating systems and analysis software researchers use. Ben Marwick wrote a great piece called ‘How computers broke science – and what we can do to fix it’ which details a bit of the problem. Basically, computational environments affect the outcome of analyses (Gronenschild et. al (2012) showed the same data and analyses gave different results between two versions of macOS), and are exceptionally hard to reproduce, especially when the license terms don’t allow it. Additionally, programs encode data incorrectly and studies make erroneous conclusions, e.g. Microsoft Excel encodes genes as dates, which affects 1/5 of published data in leading genome journals.
technology to capture computational environments, workflow, provenance, data, and code are hugely impactful for reproducibility.  It’s been the focus of my work, in supporting an open source tool called ReproZip, which packages all computational dependencies, data, and applications in a single distributable package that other can reproduce across different systems. There are other tools that fix parts of this problem: Kepler and VisTrails for workflow/provenance, Packrat for saving specific R packages at the time a script is run so updates to dependencies won’t break, Pex for generating executable Python environments, and o2r for executable papers (including data, text, and code in one).
plugin for Jupyter notebooks), and added a user interface to make it friendlier to folks not comfortable on the command line.

I would also recommend going to conferences:

++++++++++++++++++++++++
more on big data in an academic library in this IMS blog
academic library collection data visualization

https://blog.stcloudstate.edu/ims/2017/10/26/software-carpentry-workshop/

https://blog.stcloudstate.edu/ims?s=data+library

more on library positions in this IMS blog:
https://blog.stcloudstate.edu/ims?s=big+data+library
https://blog.stcloudstate.edu/ims/2016/06/14/technology-requirements-samples/

on university library future:
https://blog.stcloudstate.edu/ims/2014/12/10/unviersity-library-future/

librarian versus information specialist

 

academic library collection data visualization

Finch, J. f., & Flenner, A. (2016). Using Data Visualization to Examine an Academic Library Collection. College & Research Libraries77(6), 765-778.

http://login.libproxy.stcloudstate.edu/login?qurl=http%3a%2f%2fsearch.ebscohost.com%2flogin.aspx%3fdirect%3dtrue%26db%3dllf%26AN%3d119891576%26site%3dehost-live%26scope%3dsite

p. 766
Visualizations of library data have been used to: • reveal relationships among subject areas for users. • illuminate circulation patterns. • suggest titles for weeding. • analyze citations and map scholarly communications

Each unit of data analyzed can be described as topical, asking “what.”6 • What is the number of courses offered in each major and minor? • What is expended in each subject area? • What is the size of the physical collection in each subject area? • What is student enrollment in each area? • What is the circulation in specific areas for one year?

libraries, if they are to survive, must rethink their collecting and service strategies in radical and possibly scary ways and to do so sooner rather than later. Anderson predicts that, in the next ten years, the “idea of collection” will be overhauled in favor of “dynamic access to a virtually unlimited flow of information products.”  My note: in essence, the fight between Mark Vargas and the Acquisition/Cataloguing people

The library collection of today is changing, affected by many factors, such as demanddriven acquisitions, access, streaming media, interdisciplinary coursework, ordering enthusiasm, new areas of study, political pressures, vendor changes, and the individual faculty member following a focused line of research.

subject librarians may see opportunities in looking more closely at the relatively unexplored “intersection of circulation, interlibrary loan, and holdings.”

Using Visualizations to Address Library Problems

the difference between graphical representations of environments and knowledge visualization, which generates graphical representations of meaningful relationships among retrieved files or objects.

Exhaustive lists of data visualization tools include: • the DIRT Directory (http://dirtdirectory.org/categories/visualization) • Kathy Schrock’s educating through infographics (www.schrockguide.net/ infographics-as-an-assessment.html) • Dataviz list of online tools (www.improving-visualisation.org/case-studies/id=5)

Visualization tools explored for this study include Plotly, Microsoft Excel, Python programming language, and D3.js, a javascript library for creating documents based on data. Tableau Public©

Eugene O’Loughlin, National College of Ireland, is very helpful in composing the charts and is found here: https://youtu.be/4FyImh2G7N0.

p. 771 By looking at the data (my note – by visualizing the data), more questions are revealed,  The visualizations provide greater comprehension than the two-dimensional “flatland” of the spreadsheets, in which valuable questions and insights are lost in the columns and rows of data.

By looking at data visualized in different combinations, library collection development teams can clearly compare important considerations in collection management: expenditures and purchases, circulation, student enrollment, and course hours. Library staff and administrators can make funding decisions or begin dialog based on data free from political pressure or from the influence of the squeakiest wheel in a department.

+++++++++++++++
more on data visualization for the academic library in this IMS blog
https://blog.stcloudstate.edu/ims?s=data+visualization

1 2