There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.
You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.
Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.
See also: Base91 and Base122
Answer from hobbs on Stack OverflowI want to get a json file from a binary file to make it more readable and usable further. However, there's a limitation of adf that binary files must have the source and sink as binary when using copy activity.
Is there any other activity or azure service that i can leverage to achieve the same. Has anyone ever worked on this problem statement before?
How to convert JavaScript array to binary data and back for WebSocket? - Stack Overflow
javascript - binary encoding for JSON? - Stack Overflow
Convert Binary file To JSON or CSV
Javascript from Buffer to JSON - Stack Overflow
There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.
You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.
Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.
See also: Base91 and Base122
I ran into the same problem, and thought I'd share a solution: multipart/form-data.
By sending a multipart form you send first as string your JSON meta-data, and then separately send as raw binary (image(s), wavs, etc) indexed by the Content-Disposition name.
Here's a nice tutorial on how to do this in obj-c, and here is a blog article that explains how to partition the string data with the form boundary, and separate it from the binary data.
The only change you really need to do is on the server side; you will have to capture your meta-data which should reference the POST'ed binary data appropriately (by using a Content-Disposition boundary).
Granted it requires additional work on the server side, but if you are sending many images or large images, this is worth it. Combine this with gzip compression if you want.
IMHO sending base64 encoded data is a hack; the RFC multipart/form-data was created for issues such as this: sending binary data in combination with text or meta-data.
» npm install js-binary
First of all, Browsers treat binary data differently than NodeJS. In browser, binaries can be seen as Blob or ArrayBuffer, but in NodeJS, it is seen as Buffer doesn't understand ArrayBuffer. I won't go in too deep into this, but you need to handle data differently between browser and nodeJS.
When using WebSocket at the browser side, data are transmitted as either string or binary, if binary will be used, then you have to specify BinaryType, and in this particular case, I will use ArrayBuffer.
As to string to buffer, I suggest to use the standard UTF-8 as there are 2 ways of encoding UTF-16. For example '\u0024' in UTF-16 will be stored as 00 24 in UTF-16BE, and in UTF-16LE, it is stored as 24 00. That is, if you are going to use UTF-16, then you should use TextEncoder and TextDecoder. Otherwise you can simply do this
strToAB = str =>
new Uint8Array(str.split('')
.map(c => c.charCodeAt(0))).buffer;
ABToStr = ab =>
new Uint8Array(ab).reduce((p, c) =>
p + String.fromCharCode(c), '');
console.log(ABToStr(strToAB('hello world!')));
For UTF-16, the browser code should be something like:
const ENCODING = 'utf-16le';
var ws = new WebSocket('ws://localhost');
ws.binaryType = 'arraybuffer';
ws.onmessage = event => {
let str = new TextDecoder(ENCODING).decode(event.data),
json = JSON.parse(str);
console.log('received', json);
};
ws.onopen = () => {
let json = { client: 'hi server' },
str = JSON.stringify(json);
console.log('sent',json);
//JSON.toString() returns "[object Object]" which isn't what you want,
//so ws.send(json) will send wrong data.
ws.send(new TextEncoder(ENCODING).encode(str));
}
At the server side, data is stored as Buffer and it more or less does everything natively. You however need to specify Encoding unless it is UTF-8.
const ENCODING = 'utf-16le';
//You may use a different websocket implementation, but the core
//logic reminds as they all build on top of Buffer.
var WebSocketServer = require('websocket').server,
http = require('http'),
//This is only here so webSocketServer can be initialize.
wss = new WebSocketServer({
httpServer: http.createServer()
.listen({ port: 80 })});
wss.on('request', request => {
var connection = request.accept(null, request.origin);
connection.on('message', msg => {
if (msg.type === 'binary') {
//In NodeJS (Buffer), you can use toString(encoding) to get
//the string representation of the buffer.
let str = msg.binaryData.toString(ENCODING);
console.log(`message : ${str}`);
//send data back to browser.
let json = JSON.parse(str);
json.server = 'Go away!';
str = JSON.stringify(json);
//In NodeJS (Buffer), you can create a new Buffer with a
//string+encoding, and the default encoding is UTF-8.
let buf = new Buffer(str, ENCODING);
connection.sendBytes(buf);
}
});
});
Try it:
Sending data example:
var data = [{
id: 1,
name: "test",
position: [1234, 850], //random position on the map
points: 100 //example points
}];
var data2 = new Uint16Array(data);
socket.send(data2);
In your event onMessage websocket try it:
function onMessage(event) {
if (event.data instanceof window["ArrayBuffer"]){
var data3 = JSON.parse(String.fromCharCode.apply(null, new Uint16Array(event.data)));
};
};
» npm install typed-binary-json
If gzip doesn't compress well enough, chances are your binary format won't either, especially if you wan't to be able to decode it via javascript within a reasonable amount of time.
Remember that the unzipping when using gzip is done natively by the browser and is orders of magnitude faster than anything you can do in javascript.
If you feel that the JSON deserialization is too slow, because you are supporting older browsers like ie7 which doesn't decode JSON natively but depends on eval for the job, consider going away from JSON to a custom encoding based on string splitting, which is much much faster to deserialize.
For inspiration try to read this article:
http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/
Check out BSON
BSON, short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
Find a good explanation here http://kaijaeger.com/articles/introducing-bison-binary-interchange-standard.html