Before we start to batch and process any CSV records,
00:03
we are going to have a quick lesson on PHP generators. And this is crucial because when we're reading such a large file with a huge amount of rows in this CSV, if we don't take the generators into account,
00:17
we're going to run into some memory issues. And this will just prime you for the next episode where we're going to be creating another class to again use PHP generators
00:27
but to batch our CSV records into 100,000 rows that we can process with a job. So let's just take a look at what a PHP generator is if you've never heard of one and how these can help.
00:41
So I've created a really simple route here. It's completely empty at the moment and I've got this open in the browser. We're just going to go ahead and iterate over
00:49
maybe a million or even as big as a billion records and just dump them out on the page. So let's imagine that we had, let's just say a million rows in our database.
01:02
Now we can kind of simulate this with the range function in PHP, not in the database in the CSV file. So we can simulate this with the range function. So let's go ahead and use range to generate an array.
01:16
So this will give us an array of a million records starting at one and ending at one million. So what we can do with this is iterate over it. That's kind of what we're going to be doing
01:26
as we start to process each of the rows inside of the CSV. So I'm just going to call this as number. Now let's go ahead and just do a var dump on the number that we get out here
01:38
and let's head straight over to the browser and just give this a refresh. So you get a huge amount of records here. It looks like this has worked really nicely
01:47
but as we start to bump this number up, remember these are just very simple numbers as well. There's not really much in here and we give this a refresh.
01:55
What we'll eventually find is we have exhausted the memory usage. So of course there are a couple of solutions. We could bump up the memory usage,
02:05
probably not ideal because we really shouldn't be dealing with huge amounts of memory like this or we can use PHP generators.
02:13
So what we're going to do is we're going to build out a function that allows us to iterate over this amount of numbers and keep in the back of your mind
02:20
that this could be CSV records. Remember that could contain a huge amount of data on their own and we're going to do this using PHP generators instead.
02:29
So we're going to build out a function for this. So let's just create out a function and we'll just say range of numbers. Of course, we're just dealing with numbers here.
02:36
This isn't a real life scenario that you would use this for and we're going to take the number in here. We don't need to necessarily do that
02:45
but this is kind of what we're going to be doing when we create out the class to batch these CSV records together. So we're going to do exactly the same thing here.
02:52
We're still going to have a for each loop. So let's go ahead and for each over this. We're not, however, going to use the range function because that's going to completely
03:00
defeat the point of this. The range function is going to build up this array in memory and that's what's causing the error that we see here. So what we're going to do instead
03:08
is we're going to use a for loop and we're going to go ahead and start this out with just initialize this to one, for example, and let's just call this I.
03:18
Tends to be a more usual example and we're going to say while I is less than or equal to and we'll take the same number so we can prove that we can iterate
03:26
over the same amount of records but using a PHP generator and then I'm going to explain to you what a PHP generator is actually doing in the background.
03:34
Now in this function, what we would normally do is maybe go ahead and return a number. We'd maybe build up an array in here but we're not actually going to do this.
03:42
This function is going to become a generator by yielding I and that's pretty much all we need to do. So this is a really super simple example of a PHP generator
03:53
but it's going to demonstrate that we won't run out of memory here and I'll explain exactly what's happening here in a minute. So now what we can actually do
04:00
is iterate over the result of this function. So let's go ahead and say num again and we'll say var dump. Now PHP won't run out of memory
04:10
but it's likely that my browser might because we're of course going to be generating a huge amount of records and let's actually just take this number here
04:17
and pass this through as the argument. It doesn't matter either way. Okay, so we've built up a new function which uses yield instead.
04:24
That's going to go ahead and call this iterate over it. Now let's go over and take a look. I'm going to give this a refresh and you can see that the actual PHP file itself
04:34
or the script itself did not crash. However, my browser is probably about to so I'm just going to stop this processing because there's a huge amount of output here.
04:42
So that has got rid of the memory issue and that's exactly what we're going to be doing when we're processing through potentially millions of CSV rows.
04:50
If we load all of them CSV rows into memory at once what's going to happen? Well, we're going to probably exhaust the memory limit of PHP.
04:58
So let's just talk about what's going on here. When we iterate over this, what it's not doing is building up this entire array in memory.
05:08
What it's doing is as we iterate over it, it's figuring out using an iterator what the next thing is going to be. So this is how the package,
05:17
the league CSV package that we've pulled in actually works. It gives us this generator that we can then use to iterate over records but we can't iterate over them in chunks.
05:28
What we have to do is do them per file. Now what we're eventually going to do is create out a job maybe called process CSV records, something like that.
05:39
And what we don't want to do is take into this as an argument, just one record. So we don't want to take a single record.
05:46
We want to take multiple records. Now with the package that we've pulled in, what we're going to have to do is create our own generator
05:54
that we can use to iterate over a batch of these. So we're going to load in all of the records or load in the iterator or the generator that we get from the league CSV package.
06:05
And then we're going to go ahead and create a batch of these. So the first batch will contain 100. The second batch will contain 100 and so on.
06:13
But again, we're not going to load all of these batches into memory in a single array. We're going to do this pretty much what we've done, but return a batch each time we iterate.
06:23
And that means that we can build this up. So that's pretty much how a generator in PHP works, a memory efficient way of iterating over something. The example that we've just looked at
06:33
is not a good example because it has no real use in the real world. But in the next episode, we're going to look at how we can actually use this
06:40
to efficiently batch over or iterate over all of these records and start to process them. So let's head over to the next episode and get that done now.
25 episodes•2 hrs 20 mins•2 years ago
Overview
Let's build a powerful CSV importer with Livewire, completely from scratch. This can handle millions of rows, be reused for multiple models, and by using job batches, doesn't require the browser to be open.
This course is for you if:
You need a robust importer component for your models that you have full control over
You want to brush up on some advanced Livewire concepts
You want to learn about job batching and queues in Laravel