<aside> <img src="/icons/command-line_purple.svg" alt="/icons/command-line_purple.svg" width="40px" /> The code in this article is the first step in a data pipeline that ingests and transforms Energy Performance Certificate (EPC) data from the EPC open data API. The output of this step is a dictionary of JSON data collected from the API.
The next data processing step is data transformation.
The complete Jupyter notebook for this pipeline can be found on the Data Guidance GitHub Repository. This article covers PART 1: Data Ingestion.
</aside>
Here we cover the basics of APIs, including what they are, their types, how to find and use them, and how to work with JSON data.
To skip the theory and go straight to where we explain how to interface with the EPC open data API, go to understanding API parameters and usage and follow from there.
<aside> <img src="/icons/video-camera_purple.svg" alt="/icons/video-camera_purple.svg" width="40px" /> Video Walkthrough
Here is a video walkthrough of the the Data Ingestion section of the Jupyter Notebook containing the example code.
</aside>
<aside> 👉 APIs are instructions that define the way different software must communicate with each other in order for them to exchange data
</aside>
API stands for Application Programming Interface and is a set of rules and protocols for creating and interacting with software applications or components of a data architecture. APIs determine how these various software components should interact and communicate with each other. In the context of data ingestion, APIs simplify the process of pulling data from various sources, enabling real-time or batch ingestion of data directly into a data storage system or database.
Using APIs for data ingestion brings several advantages. It can automate the data collection process, reducing manual effort and builds trust in data by reducing opportunities for human error. It also allows for real-time (aka ‘streaming’) use of data, which is crucial for use cases like real-time analytics and watching on-demand television immediate decision-making.
<aside> <img src="/icons/command-line_purple.svg" alt="/icons/command-line_purple.svg" width="40px" /> Why is it better
There are other methods to call or retrieve data like web scraping, direct database access, file transfer and cloud storage services.
API calls are commonly preferred for retrieving data because they are standardised, secure, and efficient. They allow for real-time data access, support automation of data retrieval processes, and enable easy integration with other services and applications.
APIs also provide a structured way to request and receive data, making it easier to manage and use the data across different platforms and technologies.
</aside>
There’s more than one type of API. APIs come in various forms like Web APIs for internet services, Open API or Public APIs for developers, and Internal APIs for company processes. The choice of API depends on business needs and its use. Different API types operate under specific architectures for effective functionality.
The architecture of an API determines the method of data transmission and communication between systems. REST and SOAP are the two most common types of API architectures; they address different needs and situations:
REST (Representational State Transfer)
REST APIs (also known as ‘RESTful’) are designed to use standard HTTP methods and are considered stateless, meaning each call can be made independently without needing the server to remember previous requests.
They use standard HTTP methods (i.e. GET, POST, PUT, DELETE) and return data in JSON, XML, or other formats.
REST APIs are widely used because of their simplicity, scalability, and flexibility.
SOAP (Simple Object Access Protocol)
SOAP APIs are highly structured and use XML for their message format as well as other standards like Web Services Description Language (WSDL) for describing the API itself.
They are known for their security features and transactional reliability, making them suitable for enterprise-level applications.