Build a Livewire CSV Importer

Sign in Get started

This episode is for members only

Sign up to access "Build a Livewire CSV Importer" right now.

Already a member? Sign in to continue

Playing

15. Chunking through records

Episodes

0%

Your progress

Total: 2h 20m
Played: 0m
Remaining: 2h 20m

Join or sign in to track your progress

01. Introduction and demo

02. Fresh app and design

03. Customer model, seeding and exporting

04. Using and toggling the importer

05. Accepting and validating a file

06. Drag to drop upload

07. Reading the CSV headers

08. Setting the columns to import

09. The UI to map columns

10. Setting required import columns

11. Improving the validation for required columns

12. Tracking imports with a model

13. Fetching the CSV records

14. A quick lesson on PHP Generators

15. Chunking through records

16. Setting up for queues

17. Batching CSV imports

18. Upserting records

19. Updating the processed rows

20. Marking the import as complete

21. Tweaking the import button

22. Listing imports

23. Scoping imports

24. Updating the import progress UI

25. Resetting state

Transcript

00:00

So by the end of this episode, we'll have created a generator that can, where we can iterate over batches of CSV records, whether that's a hundred, a thousand, it doesn't really matter, but they're not going to be all loaded into memory, which is what we looked at in the last episode.

00:15

So we're going to go ahead and upload a file. The first thing that we're going to do is look at what we get from this. Now remember we used this to go ahead and count on them CSV records, so that will give us the total, but let's actually see what we get when we use this statement and create method,

00:31

and we process the CSV that we've read. So under the import section, I'm just going to go ahead and do this up here, and I'm going to die dump on the result we get from back from this, just so we can see what we've got here. So I'm going to give this a refresh, head over to import, we'll

00:46

import the same file here, and let's just see what we get when we import this. So let's choose and map up all of these, hit import, and there we go. So we get back a result set. Now the records, which we're going to use a method on to grab the records, has

01:00

an iterator. So this iterator allows us to use that count method to tell us how many we have in here, but what we're going to do is we're going to iterate over these, build up a generator with these chunked, and again they won't all be loaded into

01:14

memory. We imagine we have a million records in a CSV file. If we were to create an array of a thousand chunks, and we had a huge amount of different chunks inside of that array, we're going to run it run into a memory issue.

01:27

So let's go ahead and just get started on doing this. I'm going to pull over some pre-made code, I'm going to walk you through it, but we're not going to go into the specifics of generators, because that's for another lesson. So let's first of all just have a look at

01:40

this for each on what we get back from this. So we can actually reference the CSV records property here. So let's say for each this CSV records as record, and with this what we need to do is to

01:58

actually get them records that we saw over in the browser inside of that object. We just need to use the get record. So I'm just going to var dump out on, in fact let's go ahead and yeah we'll var dump and see what happens. I don't think this is actually going to appear,

02:12

but we'll go ahead and do this anyway and then we'll just die dump here. Okay so let's die dump on something and let's come over and try this again just so we're clear how this is working and what we're getting back. So again let's just map all of these up and hit import and yeah we don't

02:28

actually see anything. So essentially what's going to happen is as we iterate over these, each of these records is going to be that CSV record as an array, but it's only going to give us an individual record. What we want are chunks of records, we

02:42

don't want to create a job just to process a single CSV item. That's kind of the point here. Okay so let's just get rid of this, we could always do that another way, but hopefully that makes sense. And here we're going to go ahead and use a chunk iterator

02:56

to go and grab the iterator that we've got, the iterator instance we've got, build up a chunk using a generator. So we're going to go ahead and create our class in the root directory. Let's just create a helpers folder, it doesn't really matter too much for now, and I'm going to create

03:11

a chunk iterator file. Now I'm going to pull over the code for this, but we're going to go ahead and look through this properly, and I think I put this inside of utilities before, so let's change that to helpers.

03:22

Okay so this chunk iterator takes in an iterator which we can in php iterate over, and it takes the amount of chunks that we want to iterate over. So the way that this is going to work in code looks a bit like this. So let's go ahead and say iterator, or

03:38

let's just call this chunks. We're going to new up a chunk iterator that we've just created, we're going to pass through the records, so it will be this csv records get records, which we just saw, and the second argument to this

03:55

is going to be the amount that we want batched by. Now we've only got 100 in total, so let's batch this by 10, which obviously means we're going to get 10 chunks of 10 records, which of course make up 100, and then from this we're going to go ahead and call this

04:10

get method. So let's say get. So I wrote this code out previously, so you can go ahead and grab this from the resources if you need to just copy and paste it over into your app. So what's happening here then? Well when we pass in the iterator, and we pass in the amount of chunks

04:25

that we want to iterate by, we're going to go ahead and create out an array. So we're still returning an array per chunk, but what we're doing is we are iterating over the items that we have passed in, which are just all of the csv records. Then what we're going ahead and doing is

04:40

setting the chunk, so setting or appending to this array, the current item in this iterator. So imagine we add 100 records, the first one would be represented by current, then we call next, go on to the next

04:55

iterator item, and then that gives us the next one, and we build this up into this chunk over and over again. So this will run 10 times, and then obviously we don't want it to run more than 10 times because we only want chunks of 10.

05:07

So what's going to happen there then is if the count equals the chunks, we're going to go ahead and yield the entire chunk, which we did back in the last part. Remember we yielded the item, so this will just kind of pop off, if you like, the chunk that we built up, and then it

05:23

will reset the chunk. And then down here we just go ahead and yield the chunk anyway, if we have a positive count on that. So essentially what this is doing is iterating 10 times, building up a chunk, kind of returning it within this generator,

05:37

then it's going on to the next batch of 10, returning it, kind of, it's not actually returning it, but basically building that up. So this whole chunk thing that we're building up, this whole collection of chunks, are never always going to be all in

05:51

memory at the same time. So now that we've got this and we can call this get method, let's go and see what we get from this. So let's just die dump on chunks here, and let's go ahead and go through the whole import process. So we'll import this.

06:04

Remember we're only working with 100 records here, you wouldn't actually need to chunk these necessarily, but we want to do this anyway in case we get a huge amount of records. So let's pull in the email, hit import, and there we go. So we get

06:17

this generator returned back to us, which is what we looked at in the last episode. We can't see a collection of the items here because it's a generator, they're going to be given to us as we iterate over them. So what we could do now

06:28

is actually iterate over them. So just to kind of demonstrate this with Livewire, I'm going to go ahead and just create out a kind of fake array here, and we're going to go ahead and iterate over the chunks as chunk,

06:43

and we're going to go ahead and set this to the chunk that we get back, and then we're going to go ahead and die dump on abc. So we're just building up a kind of new array of each of these chunks that we're iterating through with the generator, so hopefully that

06:58

makes sense. Let's go and import this, and what we should get is an array of 10 arrays, which each have 10 items in. So again, let's go ahead and map these up to each of the columns, and there we go. So we've got 10,

07:12

10, 10, 10, all of these contain all of the items from the CSV, but each of these are pulled out when they're needed. We don't build up initially an entire array of each of these chunks because then we would run into memory issues,

07:26

so hopefully that's starting to come together. If you're still confused about generators, I highly recommend you read up on them, and that's just going to go ahead and give you a little bit more background, but hopefully that makes sense. The whole goal

07:37

here is that we can have as many records as we want in this CSV, up to millions, and it's never going to run out of memory because we're chunking through them using a generator, returning each of them chunks, and they're never all loaded into

07:50

memory at once. So I'm going to go ahead and get rid of this silly example here, but hopefully that has demonstrated this well. The point now is we've got our chunk iterator which will iterate over, and for each of these chunks, we can then create a job to process them,

08:04

and you can bump this number up as high as you want. So if you're always expecting, you know, millions of records, maybe bump that up to a thousand. You could even process 5,000 at once. I'm going to leave this at 10 for now because obviously with our

08:16

slightly smaller CSV, we still want to test this out and make sure each of these batches, these jobs, get put into this batch. So there we go. We've built up our chunks. Let's head over and start to create out some queues to handle these.

Episode summary

In this episode, we're taking a big step forward in handling large CSV uploads efficiently! The main focus is on chunking through CSV records—which means breaking up potentially massive sets of CSV data into smaller, more manageable batches (or "chunks") that can be processed one at a time, without ever loading the whole file into memory. This is super useful if your users might upload files with hundreds of thousands or even millions of rows, where loading everything at once would be a nightmare for your server.

We start by looking at the CSV import process and seeing firsthand what kind of data we get back after importing. We play around a bit with iterator methods, and I show you how each record can be looped over one at a time—cool, but not quite what we want, especially if we need to process lots at once without blowing up memory usage.

The hero of this episode is our new ChunkIterator class. We go through building it out, and then use it to create a generator that will yield chunks of records (like groups of 10, 100, or 1000 at a time, whatever you choose). This way, you're never hanging on to more records in memory than you need. We run some tests to see this in action with a sample CSV file, and verify that chunks are being created exactly as expected.

Finally, we talk about how you can tweak the chunk size for your own needs, and why this approach is ideal for processing super large CSVs with background jobs. By the end, you've got the code necessary to split any CSV import into safe, memory-friendly batches—setting the stage for queuing up each chunk as a job in the next episode. If you’re still wondering about how generators work, now's a good time to give them a quick read, as they’re at the core of this chunking approach!

Episode discussion

No comments, yet. Be the first!