A while ago I had benchmarked pytorch against numpy for fairly basic matrix operations (broadcast, multiplication, inversion). I didn't run the benchmark for a variety of sizes though. It seemed that pytorch was markedly faster than numpy, possibly it was using more than one core (the hardware had a dozen of cores). Is that a general rule even if constraining pytorch to a single core?
As I'm trying to learn pytorch, I notice this heavy focus on this tensors as a datatype, but I'm not really clear what it's advantages are over numpy arrays. After all, numpy arrays can be 0,1, 2, and even 3 dimensional, so I'm just unclear on the advantage of tensors, and I wanted to ask you guys, "When and why do we use tensors instead of numpy arrays?"
Videos
Hello, I’m about to start learning pytorch soon but I’ve read it’s a lot like numpy. Do you need a deep understanding of numpy before using it? I already know how to do general purpose python and have used it for data science so I’m comfortable with the language, but just haven’t used numpy a lot. Is knowing how to build a neural network in numpy a prerequisite for learning pytorch? Or in general a deep understanding of numpy?
I've recently started appreciating ML in Python more since I began looking at the concepts from the ground up.
For example, I took a closer look at the basics of classification neural networks, and now I have a better understanding of how more complex networks work. The foundation here is logistic regression, and understanding that has really helped me grasp the overall concepts better. It also helped me implementing the code in Numpy and in PyTorch.
If you're also interested in Machine Learning with Python and sometimes feel overwhelmed by all the complicated topics, I really recommend going back to the basics. I've made a video where I explain logistic regression step by step using a simple example.
The video will be attached here: https://youtu.be/EB4pqThgats?si=Z-lXOjuNKEP5Yehn
I'd be happy if you could take a look and give me some feedback! I'm curious to hear what you think of my approach and if you have any tips on how to make it even clearer.
I want to implement the code with the same functionality ( by numpy and torch). I don't know how to support both numpy and torch with only once implemention.
For example, I want to implement these:
def fun_torch(a):
return torch.sin(a) + torch.cos(a)
def fun_np(a):
return np.sin(a) + np.cos(b)I want to implement them in only one function, but I dont want this:
def func(a):
if isintance(a, torch.Tensor):
return torch.sin(a) + torch.cos(a)
elif isinatance(a, np.ndarray):
return np.sin(a) + np.cos(b) Using NumPy’s random number generator with multi-process data loading in PyTorch causes identical augmentations unless you specifically set seeds using the worker_init_fn option in the DataLoader. I didn’t and this bug silently regressed my model’s accuracy.
How many others has this bug done damage to? Curious, I downloaded over a hundred thousand repositories from GitHub that import PyTorch, and analysed their source code. I kept projects that define a custom dataset, use NumPy’s random number generator with multi-process data loading, and are more-or-less straightforward to analyse using abstract syntax trees. Out of these, over 95% of the repositories are plagued by this problem. It’s inside PyTorch's official tutorial, OpenAI’s code, and NVIDIA’s projects. Even Karpathy admitted falling prey to it.
For example, the following image shows the duplicated random crop augmentations you get when you blindly follow the official PyTorch tutorial on custom datasets:
You can read more details here.
Hi r/MachineLearning! Let's discuss PyTorch best practices.
I recently finished a PyTorch re-implementation (with help from various sources) for the paper Zero-shot User Intent Detection via Capsule Neural Networks, which originally had Python 2 code for TensorFlow.
I'd like to request perhaps a critique on the code I've written so far (it's not perfect, yet!) and any suggestions if there are best practices specifically in PyTorch, for implementing directly from research papers as well as converting them from other frameworks.
Some thoughts I had while programming (feel free to raise more!):
I've been implementing a Dataset class and custom batch functions for every dataset I've been working with. Is this the PyTorch best practice?
Where is the optimal place to shift
Tensorsto.cuda()? I've been doing this in the training loop, just before feeding it into the model.How to manage the use of both
numpyandtorch, seeing as PyTorch aims to reinvent many of the basic operations innumpy?
If you're a fellow PyTorch user/contributor please share a little!
So you might or might not know, I was working on HyperLearn --> a faster optimized ML package designed to make everything at least 50% (I hope) faster.
Thanks so much for all the support Redditors for HyperLearn! https://github.com/danielhanchen/hyperlearn [Made it to the Trending Github list for Jup Notebooks!! yayy!]
Anyways, I didn't update the code a lot, but that's because I was busily testing and finding out which algos were the most stable and best.
Key findings for N = 5,000 P = 6,000 [more features than N near square matrix]
-
For pseudoinverse, (used in Linear Reg, Ridge Reg, lots of other algos), JIT, Scipy MKL, PinvH, Pinv2 and HyperLearn's Pinv are very similar. PyTorch's is clearly problematic, having close to over x4 slower than Scipy MKL.
-
For Eigh (used in PCA, LDA, QDA, other algos), Sklearn's PCA utilises SVD. Clearly, not a good idea, since it is much better to compute the eigenvec / eigenval on XTX. JIT Eigh is the clear winner at 14.5 seconds on XTX, whilst Numpy is 2x slower. Torch likewise is slower once again...
-
So, for PCA, a speedup of 3 times is seem if using JIT compiled Eigh when compared to Sklearn's PCA
-
To solve X*theta = y, Torch GELS is super unstable. Like really. If you use Torch GELS, don't forget to call theta_hat[np.isnan(theta_hat) | np.isinf(theta_hat)] = 0, or else results are problematic. All other algos have very similar MSEs, and HyperLearn's Regularized Cholesky Solve takes a mere 0.635 seconds when compared to say using Sklearn's next fastest Ridge Solve (via cholesky) by over 100% (after considering matrix multiplication time) --> HyperLearn 2.89s vs 4.53s Sklearn.
So to conclude:
-
HyperLearn's Pseudoinverse has no speed improvement
-
HyperLearn's PCA will have over 2 times speed boost. (200% improvement)
-
HyperLearn's Linear Solvers will be over 1 times faster. (100% improvement)
Help make HyperLearn better! All contributors are welcome, as this is truly an overwhelming project... https://github.com/danielhanchen/hyperlearn
Lower Time == betteri read the hands on machine learning book (the tensorflow one) and i am a first year student. i came to know a little later that the pytorch one is a better option. is it possible that on completing this book and getting to know about pytorch the skills are transferrable.
sorry if this might sound stupid or obvious but i dont really know
Anybody else agree? At this point, I don’t even care if it doesn’t support expression templates for performance. A library like that allows you to be SO MUCH more productive when doing neural network stuff, computer vision, pre-processing and post-processing data. It takes years to standardise something like mdspan and that’s miles off numpy. We are literally going to have to wait 100 years.