Thứ Tư, 27 tháng 3, 2019

Do generative adversarial networks always converge?

In terms of theory, this is an open question.
In terms of practice, no they don’t always converge. On small problems, they sometimes converge and sometimes don’t. On large problems, like modeling ImageNet at 128x128 resolution, I’ve never seen them converge yet.
This is probably the most important question about GANs, both in terms of theory and practice. In terms of theory, it would be great to derive a set of conditions under which they converge or don’t converge. In terms of practice, it would be great to modify them in a way that makes them converge consistently.
This paper gives some conditions under which simultaneous gradient descent on two player’s costs will converge: http://robotics.eecs.berkeley.ed...
GANs never satisfy those conditions because the Hessian of the generators costs is all zeros at equilibrium. However, the conditions in this paper are sufficient conditions, not necessary conditions, so it’s possible that GANs can converge anyway.

what’s the difference between machine learning training and inference?

In machine learning, “training” usually refers to the process of preparing a machine learning model to be useful by feeding it data from which it can learn. “Training” may refer to the specific task of feeding that model with the expectation that the resulting model will be evaluated independently (e.g., on a separate “test” set), or it might refer to the general process of feeding it data with the intention of using it for something.
I’ve seen “inference” used in the context of machine learning in two main senses. In one sense of the word, “inference” refers to the process of taking a model that’s already been trained (as above) and using that trained model to make useful predictions, as in NVidia’s page on the topic. Here, training and inference are completely distinct activities, and “inference” refers to the process of inferring things about the world by applying your model to new data.
In another sense of the word, especially in the subfields of Bayesian modeling and statistical learning, researchers may use the phrase “inference” to refer to the process of learning model parameters or random variables from data, as in Hastie and Tibsherani. You can think of this sense of inference as more closely aligning with the phrase “making inferences about the data” that are then reflected by the model parameters you’ve learned from the data. Here “inference” is more closely related to “training” above. “Training” in these fields is often used to refer to the process of estimating model parameters or random variables, though typically “training” is used in this context when it’s done to make predictions with the resulting model or to evaluate it rather than to draw conclusions about data using the model (other phrases might be “fitting” a model,” “estimating parameters,” “learning parameters,” and so on.

Thứ Sáu, 15 tháng 3, 2019

[Bayesian Learning] Model uncertainty

The models above can be used for applications as diverse as skin cancer diagnosis from lesion images, steering in autonomous vehicles, and dog breed classification in a website where users upload pictures of their pets. For example, given several pictures of dog breeds as training data—when a user uploads a photo of his dog—the hypothetical website should return a prediction with rather high confidence. But what should happen if a user uploads a photo of a cat and asks the website to decide on a dog breed?
The above is an example of out of distribution test data. The model has been trained on photos of dogs of different breeds, and has (hopefully) learnt to distinguish between them well. But the model has never seen a cat before, and a photo of a cat would lie outside of the data distribution the model was trained on. This illustrative example can be extended to more serious settings, such as MRI scans with structures a diagnostics system has never observed before, or scenes an autonomous car steering system has never been trained on.
A possible desired behaviour of a model in such cases would be to return a prediction (attempting to extrapolate far away from our observed data), but return an answer with the added information that the point lies outside of the data distribution. We want our model to possess some quantity conveying a high level of uncertainty with such inputs (alternatively, conveying low confidence).

Thứ Năm, 14 tháng 3, 2019

[OS] Dynamix Review



PAPER REVIEW: DYNAMIX

Summary

As multi-devices technology has been developing drastically, sharing resources are paramount and available in host and client devices. However, the ways of sharing data between devices have disadvantages such as limited storage coverage, complex programming efforts or huge inter-device network traffic.
Dynamix is a framework supporting efficient cross-device resource sharing. First, Dynamix maximizes resource coverage by integrating CPU, memory and I/O resources. Second, it mitigates efforts of programming by move data sharing details from app layer to lower layer. Third, the framework minimizes inter-device network traffic by dynamic task redistribution.
Existing sharing mechanisms support I/O request forwarding, but they support only I/O resources for resource sharing, require carefully designed abstraction layers to support single-device applications and can suffer from severe network overheads.
Code Offloading and Distributed Computation utilize remote computation resources (e.g., CPU, memory) by offloading performance-critical code regions to more powerful devices. However, they support only computation resources for cross-device sharing, which leaves I/O resources to be wasted. The migrated tasks should eventually go back to the requesting device, which restricts the scope of performance-critical task redistributions.
Distributed Programming Platform is limited to computation resources for cross-device sharing for resource coverage and leaves the burden of difficult multi-device programming to application developers.

There are three goals of Dynamix: High Resource Coverage, Single-device Application Support and Resource-aware Task Redistribution.
To do so, three key points of implementation proposed by authors are Resource Integrator, Thread Migrator and a Master Daemon.
The framework is evaluated on Google Nexus smartphones, an in-house Samsung Smart TV and a photo classifier using TensorFlow library and gains impressive results: e.g., Dynamix achieves 8.3-x higher throughput than RF (Request Forwarding), while paying only 11% performance drop from the maximum throughput for 1080p or 8.2 FPS to 24 FPS on the home theater.
Criticism
Comment
Dynamix should be evaluated using a sufficient platform or testbed, not just on popular platforms.
Dynamix should consider the scenario that we share resource with untrusted devices.
Question
Dynamix just cover a sub-field of problem, what if we want to cover everything of device not just CPU, memory and I/O? Maybe we need a more comprehensive platform.
Is master daemon keep tracking of resource availability on all devices? What if it fails on a device to perform operations? Any back-up solutions?