Getting Familiar with OpenRefine
Contributing to an Open Source platform as an Outreachy applicant.
Oi, I made it to Outreachy's contribution phase. Which requires that I pick an Open Source project and contribute to it based on my skill for a month.
If you have no idea what Outreachy is, you can find everything you need to know about them here Outreachy
This post is meant to share my first week's experience contributing to OpenRefine as a UI/UX designer.
Come with me:
Before I made it as a contributor, I had gone through the project list a couple of times and because information at the time about the projects was limited, I could not settle on a particular project despite the tons of design projects available.
After the announcement and realizing I made it, my anxiety hit high heavens. I struggled internally with myself because I was fixated on picking the right project for me . I had questions like, "how do I pick the right project? What if I fail to grasp the basics of the project?What if there's too much competition and I fall off?"
I eventually marked out a couple of projects, *emailed the mentors... *Then I read the documentation of each projects. Although a lot of them seemed vague at the time but I found myself more focused on Openrefine.
It wasn't quite long until the mentor replied to my email (the only mentor who wrote back).
So it made sense to contribute to the project.
However, as someone with no knowledge of data analysis and who constantly runs away from every excel material, The OpenRefine app felt very alien to me. I struggled with understanding what facets meant, how to clean up data, what the numerous options and actions meant and how to execute them. At some point, it felt like I had an information overload and my head was literally on freeze mode.
Luckily, the mentors in the Openrefine forum provided us with guidelines and encouraged us to read the documentation and manuals, as well as recommended YouTube videos to help understand the basics of OpenRefine before contributing to issues on GitHub.
I think I've read more articles in this one week than I've read the last two months 😅
After taking the guidelines thru provided into account , reading the Openrefine manual/tutorial and watching a series of YouTube videos,, (which had incredibly reduced my anxiety), these are the basics of OpenRefine that I have gotten familiar with in one week and believe will help new contributors and users of the app.
1. Understanding OpenRefine.
What is OpenRefine ?
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, commonly known as data wrangling. With its spreadsheet-like interface and powerful data parsing capabilities, it's the ultimate tool for anyone who works with data.
Why do we need OpenRefine ?
Data cleaning: OpenRefine allows you to easily clean and transform your data, removing errors, duplicates, and inconsistencies. This is essential for ensuring that your data is accurate and reliable.
Data exploration: OpenRefine's faceting feature allows you to quickly explore your data and gain insights into patterns and trends that might not be immediately apparent.
Data transformation: OpenRefine can transform your data into different formats, making it easy to export your data to other tools or platforms.
Open source: OpenRefine is an open-source tool, which means that it's free to use and can be modified and extended by anyone. This makes it an accessible and flexible tool for a wide range of users.
2. Getting Familiar with OpenRefine's Interface
Creating a project
Creating a project in OpenRefine is easy. You can import data from a variety of sources, including files on your computer, data from a web address, or even data copied from your clipboard.
The options include:
Uploading a file from the computer
Downloading data from a web address (URL)
Pasting raw data from your clipboard into a plain text box
Importing a public Google Spreadsheet through its URL
Previewing data
Once your data is uploaded, you can preview it and customise how OpenRefine reads it.
There is another The boxes all the way to the bottom right allow you to further customise how OpenRefine reads-in your data. This is where you would be able to specify to the program if your columns of data have no headers (titles) or if there are any lines you would like the program to skip.
Viewing your projects
Once your project is created, you can view it at any time by clicking on "Open Projects" on the left sidebar. It will open up a list of your projects that are already on OpenRefine.
3. Using OpenRefine
Rows and Records
Rows will display your data in individual lines.
Records will show your data based on relationships, meaning that related data elements will display across multiple lines.
Text faceting
One of the most useful features of OpenRefine is text faceting.
This feature allows you to organise unique items in a specific column of your data by name and count how many rows or records possess that item name.
To use text faceting, you simply select the column you want to facet, hover over "Facet" in the dropdown menu, and then choose "Text facet." From there, you can edit or include the entries that meet your criteria.
When hovering your mouse over one of the entries, you will be shown the options to edit or include.
Selecting include will cause the main viewer to display only the entries you have included. the entries included will now be coloured red and will be bolded
Undo/Redo
If you make a mistake or want to undo an action, OpenRefine makes it easy. Just click on the "Undo/Redo" tab in the upper left-hand corner of the screen, and you can see the changes you have made since creating the project.
Editing Columns
You can manipulate your data by removing columns, or even collapse columns to declutter your data.
To remove that column, you would navigate to the dropdown arrow beside the "ObjectId" header. In that dropdown, you would select "edit column," and then "remove this column."
Collapsing a column in OpenRefine is similar to Hiding a column in an Excel or CSV file. For instance, let's say we want to collapse the Facility column in our dataset.
To collapse that column, you would navigate to the dropdown arrow beside the "Facility" header. In that dropdown, you would select "View," and then "Collapse this Column."
In conclusion, OpenRefine is an indispensable tool for anyone who works with data and I honestly believe the next few weeks contributing to this project and learning to navigate its interface will be absolutely amazing.