Repository logo
 
Publication

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

dc.contributor.authorMarques, José
dc.contributor.authorFalcao, Gabriel
dc.contributor.authorAlexandre, Luís
dc.date.accessioned2020-01-09T09:45:41Z
dc.date.available2020-01-09T09:45:41Z
dc.date.issued2018
dc.description.abstractConvolutional Neural Networks (CNNs) have shown to be powerful classi cation tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times|the computational complex part|that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are o ered by several frameworks dedicated to neural network training, such as Ca e, Torch or TensorFlow. However, these techniques do not take full advantage of the possible parallelization o ered by CNNs and the cooperative use of heterogeneous devices with di erent processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from 60-90% of global processing time. The paper analyzes the in uence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without a ecting the classi cation performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and 500 and 1500 kernels, respectively, best speedups achieve 3:28 using four CPUs and 2:45 with three GPUs. Modern imaging datasets, larger and more complex than CIFAR-10 will certainly require more than 60-90% of processing time calculating convolutions, and speedups will tend to increase accordingly.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.1080/08839514.2018.1508814pt_PT
dc.identifier.urihttp://hdl.handle.net/10400.6/8141
dc.language.isoengpt_PT
dc.peerreviewednopt_PT
dc.titleDistributed Learning of CNNs on Heterogeneous CPU/GPU Architecturespt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.endPage844pt_PT
oaire.citation.issue9-10pt_PT
oaire.citation.startPage822pt_PT
oaire.citation.titleApplied Artificial Intelligencept_PT
oaire.citation.volume32pt_PT
person.familyNameFalcao
person.familyNameAlexandre
person.givenNameGabriel
person.givenNameLuís
person.identifier1483922
person.identifier.ciencia-id251F-BD6A-8DF9
person.identifier.ciencia-id2014-0F06-A3E3
person.identifier.orcid0000-0001-9805-6747
person.identifier.orcid0000-0002-5133-5025
person.identifier.ridP-9142-2014
person.identifier.ridE-8770-2013
person.identifier.scopus-author-id17433774200
person.identifier.scopus-author-id8847713100
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationf9be499e-6059-41dc-983e-5fe9022ea0db
relation.isAuthorOfPublication131ec6eb-b61a-4f27-953f-12e948a43a96
relation.isAuthorOfPublication.latestForDiscovery131ec6eb-b61a-4f27-953f-12e948a43a96

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
1712.02546(1).pdf
Size:
826.73 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: